I messed up, big time. However, I think I can fix some of it – and hopefully this helps you not make the same mistake.
For a long time, I hosted my blog site on WordPress.com. When I created blog posts that had images or screenshots – it just magically worked. I didn’t/don’t really understand where those images go nor how they are managed. I thought though, incorrectly, that WordPress “handled it”!
Meanwhile, I moved to Azure and I now host my own WordPress site. This has been up and running for a little over a year. So recently, I shut down (deleted) the old WordPress site. Before I did, I carefully “exported” the… and let me quote them “Entire contents of your blog” – to an XML file.
I didn’t think much of it. I assumed the images were included as Base64-type attachments. It was a big file. Again, I didn’t give it much thought. I simply assumed it was handled. I mean, especially since it says the “entire contents” of my blog.
Meanwhile, today I was referencing an old post and I noticed the picture links are broken. When I hover over them, they point to a URL like:
Uh oh. I did some research and when you delete your blog content, WordPress also deletes all your old images and attachments too. Crap.
What are my options?
Well, the first thing that came to mind is I still have that XML file. So, I can at least make my job easier. So, I hacked up a little program to parse through the XML file and at least show me which blog posts are affected.
That is Part I – find out how many and which posts I need to fix. Part II is I need to actually go back out and fix those old posts.
The problem of course is that some of those had little graphics I made up in Photoshop or were pictures that I don’t have anymore (like when doing an unboxing or product review). So, this is not going to be perfect. However, I will start going through the old posts and clean up what I reasonably can. Some of those posts were point-in-time posts anyhow about things that are irrelevant now, so maybe I can just delete the old posts – or simply remove the old image links? I don’t know, we’ll have to see.
So what does this program do? Here is a look at some of the output (and the miserable work I have in front of me! 283 broken image links)
I’m simply dumping out the post name, date, and the image references that were in the blog post. The intent here is that I can just work through this list as time-permits and work my way back in time, cleaning up what I can.
Here’s how I find just the posts (as opposed to comments, etc) in the XML file:
note that when elements have an XML namespace, you must use an XmlNamespaceManager and put in the prefixes and URI’s that are used. if you don’t, the XML parser won’t “see” those elements.
Then, here is how I’m parsing through a single post and get the <img /> tags from the post content:
Again, not-pretty but it gave me what I needed. So now, onto the harder part of actually going back into some of these older posts and seeing if I can replace or clean out the old image references.
Meanwhile, if you see an old post with broken images – sorry about that, I’m working on it!