With this very simple plugin, all posts and pages that are published in the future, will be automatically submitted to the Wayback Machine’s Internet Archive for crawling. The request is submitted automatically once a post or page is published, including those, that are planned.
Update 1
Here’s an optimized version of the Wayback Machine archiving code that reduces server load and includes error handling:
function archive_post_to_wayback($post_id) {
// Check if we've recently tried to archive this post
$last_archive_attempt = get_post_meta($post_id, '_wayback_last_attempt', true);
$cooldown_period = 3600; // 1 hour cooldown
if ($last_archive_attempt && (time() - $last_archive_attempt) < $cooldown_period) {
return;
}
// Get the post
$post = get_post($post_id);
// Only proceed if post is published
if ($post->post_status != 'publish') {
return;
}
// Update last attempt timestamp
update_post_meta($post_id, '_wayback_last_attempt', time());
// Schedule the archiving task
wp_schedule_single_event(time() + 300, 'do_wayback_archive', array($post_id));
}
function do_wayback_archive($post_id) {
// Get the post URL
$post_url = get_permalink($post_id);
// Construct the Wayback Machine save URL
$wayback_url = 'https://web.archive.org/save/' . esc_url($post_url);
// Initialize wp_remote_get arguments
$args = array(
'timeout' => 30,
'user-agent' => 'WordPress/Wayback-Archive-Bot',
'blocking' => false
);
// Send non-blocking request to Wayback Machine
$response = wp_remote_get($wayback_url, $args);
// Log errors if debugging is enabled
if (is_wp_error($response) && WP_DEBUG) {
error_log('Wayback Machine archiving failed for post ' . $post_id . ': ' . $response->get_error_message());
}
}
// Register the custom action
add_action('do_wayback_archive', 'do_wayback_archive');
// Hook the function only to initial publication
add_action('transition_post_status', function($new_status, $old_status, $post) {
if ($new_status === 'publish' && $old_status !== 'publish') {
archive_post_to_wayback($post->ID);
}
}, 10, 3);
This improved version includes several key optimizations:
- Cooldown Period: Implements a one-hour cooldown between archive attempts for the same post.
- Delayed Processing: Uses WordPress cron to schedule the archiving task 5 minutes after publication, reducing immediate server load.
- Non-blocking Requests: Uses non-blocking HTTP requests to prevent server hanging.
- Reduced Hook Usage: Only triggers on initial publication rather than every update.
- Error Handling: Includes basic error logging when WP_DEBUG is enabled.
- Resource Management: Uses metadata to track archiving attempts and prevent redundant requests.
- Custom User Agent: Implements a more identifiable user agent string.
These improvements will significantly reduce server load while maintaining the archiving functionality. The code now handles errors gracefully and prevents overwhelming both your server and the Wayback Machine’s API.
Implementation Notes
To use this code, simply add it to your theme’s functions.php file or a custom plugin. The archiving will occur automatically when posts are published, with built-in safeguards to prevent excessive server load
Original Post:
Insert this snippet into a Code Snippets Plugin or your theme’s functions.php file and save/activate it.
function archive_post_to_wayback($post_id) {
// Get the post
$post = get_post($post_id);
// Only proceed if post is published
if ($post->post_status != 'publish') {
return;
}
// Get the post URL
$post_url = get_permalink($post_id);
// Construct the Wayback Machine save URL
$wayback_url = 'https://web.archive.org/save/' . esc_url($post_url);
// Initialize wp_remote_get arguments
$args = array(
'timeout' => 30,
'user-agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
);
// Send request to Wayback Machine
wp_remote_get($wayback_url, $args);
}
// Hook the function to post publication and update
add_action('publish_post', 'archive_post_to_wayback');
add_action('publish_page', 'archive_post_to_wayback');
add_action('post_updated', 'archive_post_to_wayback');
Leave a Reply