Import Tumblr to WordPress, including Featured Images

Posted By on Jun 14, 2016 | 0 comments


I don’t post much, I see. I use this area mostly as notes for myself, anyway, so… here’s some notes for myself.

Recently I had to transfer a Tumblr blog to WordPress. There is a plugin for this, and it’s a little feature light. The primary problem I ran into is that it does not transfer images and rewrite links to them, but instead leaves the image src tags pointing at the Tumblr blog. Furthermore, none of the images are recognized as attachments, and none of the images wind up as the post’s ‘Featured Image’.

No es bueno.

Downloading the images

I tried multiple different applications to download image content from Tumblr, most of which didn’t work, or worked poorly. So far I haven’t found one that will download all images from a post, if more than one image is inserted. So, this means I’m still having to go back to the Tumblr blog and do a lot of right-clicking and downloading. GRRRR.

The workstation I was on is Windows 7, here are the apps I tried:

  1. DownloadTumbr (only downloaded a single image from each post, but at least it worked)
  2. Tumblr Image Downloader (also only downloaded a single image from each post, but at least it worked)
  3. TumbleOne, TumbleTwo, TumbleThree (none worked)
  4. Litchi-tumblr (didn’t work)
  5. TumblRipper (didn’t work)
  6. Some Chrome plugin, didn’t work

I used the Add from Server plugin to add the images into the Media Library, and then as I downloaded missed images, I just put them into a ‘missed’ folder, and then added them incrementally.

Update the references to the images to match my local media library

Next I had to edit the database entries so that the tumblr image references were no longer pointing at tumblr, and instead were pointing local. I used the Search Regex plugin, as it allows for regex, as well as a test run, and has a great display of results to allow you to make fixes before you totally mess up everything.

<rant>

BACK UP YOUR DATABASE. I can’t stress this enough. Use PhpMyAdmin or use one of the many WordPress database plugins (I like Updraft Plus), but for godsakes, save yourself a thousand headaches and learn how to backup and restore your database!!

</rant>

I am not a RegEx master, but I’m getting there thanks to Regular Expressions 101. This site allows you to copy/paste a block of your test text in, and then monkey around trying to write a regex expression that singles out what you’re looking for in real time with a great user interface. It has a little lesson block on the right that explains what you’re doing, as well as references. I have started using regex for a lot of stuff, like complex search/replace in IntelliJ thanks to this site.

Since I knew I was searching for any image hosted on Tumblr’s server, the following regex worked for me:

/http:\/\/[0-9a-z.\/]+(\/tumblr_[a-zA-Z0-9_]+.[jpngtifbmp]{3})/

That essentially selects the whole string, then copies just the image name part of it into a capture group (the text surrounded by parenthesis). In the replace area of the Search Regex plugin, I then entered this:
/wordpress/wp-content/uploads$1
The $1 portion places whatever was stored in the capture group. I checked the Regex box below, and hit Replace to do a test. I looked through a bunch of results to make sure I got what I wanted, then hit Search and Replace. Voila. Images are now pointing to my server.

Force the first image to become an attachment, and then a featured image

My requirement was to only have the first image become an attachment, not every image in the post. I wrote the following functions by copying from a lot of stack overflow articles, WordPress articles, and some hours of trial and error. I opened my wp-admin area, clicked on the Posts section, and set the view options to be able to see every post in the system (I had 291).

I then placed the following functions into my theme functions.php file and refreshed the Posts listing ONCE ONLY.

I then deleted the functions from my functions.php file, as I did notice there were some strange things going on afterwards.

But first, the code:

function pr($str) {
   echo '<pre>';
   var_dump($str);
   echo '</pre>';
}

// Get URL of first image in a post
function catch_that_image($first_only=true) {
   global $post, $posts;

   ob_start();
   ob_end_clean();
   $output = preg_match_all('/<img[altimage=\"\W]+src=[\'"]([^\'"]+)/i', $post->post_content, $matches);

   if( empty($matches[1]) || is_null($matches) ) {
      return false;
   } else if ($first_only === true) {
      return $matches[1][0];;
   } else {
      return $matches[1];
   }
}

// get an attachment based on its src
function get_attachment_id_from_src($image_src) {

   global $wpdb;
   $query = "SELECT ID FROM {$wpdb->posts} WHERE guid='$image_src'";
   $id = $wpdb->get_var($query);
   return $id;

}

function add_image_as_attachment_then_as_featured() {
   global $post;

   $featured_image_exists = has_post_thumbnail($post->ID);

   if (!$featured_image_exists) {
      // First get first image if it exists
      $matches = catch_that_image();

      //echo "<br>------<br>matches:<br>";
      //pr( $matches );

      if( !$matches ) {
         return false;
      } else {

         // There's a first image. Check if it's attached.
         $attached_images = get_attached_media( 'image', $post->ID );
         $attach_exists = false;
         $attach_id = null;

         $matches = 'http://localhost:8888' . $matches;

         if( !empty( $attached_images ) ) {

            //echo "attached image array<br>";
            //pr( $attached_images );

            foreach( $attached_images as $attached_image ) {

               //echo "attachment image src:<br>";
               //pr( wp_get_attachment_image_src( $attached_image->ID, 'full' ) );

               // check if our first image is already an attachment
               if( strrpos( $matches, wp_get_attachment_image_src( $attached_image->ID, 'full' )[ 0 ] ) !== false ) {
                  //echo "<br>match exists in attachment. Move on to feature attach";
                  $attach_exists = true;
                  $attach_id = $attached_image->ID;
                  break;
               }
            }
         }

         // If attachment doesn't exist, attach it.
         if( $attach_exists === false ) {

            //echo "<br>matches exists, and we have no attachment<br>";

            $attach_id = get_attachment_id_from_src( $matches );

            //echo "existing image ID:<br>";
            //pr( $attach_id );

            $attach_a = array();
            $attach_a[ 'ID' ] = $attach_id;
            $attach_a[ 'post_parent' ] = $post->ID;
            wp_update_post( $attach_a );
         }

         // at this point we should have an attach id
         //echo "<br>attach_id:<br>";
         //pr( $attach_id );

         // Finally, set first attachment to be post thumbnail
         set_post_thumbnail( $post->ID, $attach_id );
      }
   } else {
      return false;
   }
}
add_action('the_post', 'add_image_as_attachment_then_as_featured');

That basically forced the first image to become an attachment, using an image found on the server, and then assigned it as the featured image.

One thing I noticed that was odd: at random, several posts suddenly duplicated themselves. I started with 291 posts, suddenly had 301 or so. I went through and deleted the duplicates and I did NOT spend a lot of time trying to debug. Finding and removing the duplicates was easy enough, and I wasn’t tasking myself with writing a plugin or something like that that would continue to live in this functions.php file. This was a fire-once-and-forget type of deal.

However, if you’re stuck trying to figure out how to get your Tumblr blog into wordpress, as well as your images, with featured images to boot, well, this should be a good starting point.

Submit a Comment

Your email address will not be published. Required fields are marked *