To know me is to know how much I really dislike WordPress. Yes, it could very well be from my own ignorance of the product – but it just seems like it’s the most “thrown-together” product, and it uses an endless supply of horribly-written WordPress Plug-ins. This may be a self-feeding cycle, because that makes me not want to learn about the product – which just ends up frustrating me when it breaks.
Wait, so why do I use WordPress for this blog site?
Well, it does have a ton of plug-ins and it is already has Search Engine Optimization (SEO) built-in. I would feel very compelled to write my own blogging platform – but then I wouldn’t have SEO done right, and it is somewhat tricky to get it to work with products like LiveWriter, which I use for writing blog posts.
OK so today, I’m minding my own business, I haven’t touch my web server or installed anything – nothing has changed.
Meanwhile, “good guy Google” sends me an e-mail saying my site is down:
I check and sure enough, this blog site just spins and spins – no error message, it just times out after a couple of minutes. Where do you start?
Since this doesn’t happen everyday, and since this site is hosted in Azure, I thought I would write down some of the major things I did – both for Future Robert, and potentially others who might run into such a predicament.
Idea: Is the virtual machine up?
I host several sites on there, I navigated to www.division42.com and that was up – so the virtual machine isn’t crashed.
Idea: Is this just http or https?
I have http://blog.robseder.com and https://blog.robseder.com – and they both show the same symptoms, no error but the browser just spins and spins. Those are kind of like two different sites, so this likely isn’t an IIS problem – but let’s continue.
Idea: What is in the event viewer?
I log into the server. The event viewer shows no errors at all in the System nor Application log. Nothing IIS or MySQL related either – just nothing out of the ordinary.
Idea: Maybe restarting the website will help?
I tried restarting the website, recycling the app pool, and also stopping and restarting IIS – when the site was back online, it would still just hang.
Idea: Maybe restarting the database server will also help?
Under “Services”, MySQL runs as a service called MySQL56
After it came back up – the site still acted the same.
Idea: Maybe rebooting will help?
After a reboot, still had the same symptoms.
Idea: After a fresh reboot, look back into the event viewer for new errors
Still nothing. No errors at all – and no messages related to IIS nor MySQL.
Idea: Shutdown MySQL and hit webpage, see if we get a database connection error.
Good! I got a message saying error establishing connection to the database. This means that it was definitely talking to the database. That means that MySQL was up and listening and there was no firewall rule, for example, which was stopping the connection. This really starts pointing the finger at MySQL being the problem.
Idea: My last blog post was Wednesday, why not restore from Thursday mornings backup?
First, for my virtual machine – I have my main disk which is the C: drive for the virtual machine. But then I had an additional drive connected via my storage account:
This second drive is used to do nightly backups. So, I do full bare-metal backups every day onto this drive. So, I can go back and restore from within Computer Management:
It’s pretty easy to use, choose “Recover” from the Actions pane on the right:
from the wizard, you just pick the restore and pick what you want restored:
I shut down IIS and MySQL during the restore and then rebooted – I was surprised to still see the problem exist. Not sure what’s up with that.
Idea: Try logging into MySQL as my blog user account
Let’s look into the WordPress config file and see what account/password we use to connect. Let’s try to connect using Workbench. We can!
Idea: Look at the MySQL “Client Connections”
At this point, this is what WordPress should be doing – so it’s not a credential problem. Let’s look at the Client Connections window. There, I see pages of an error message ”waiting for table metadata lock”. I know from other RDBMS’ that when there is a table lock, that usually means that other operations are blocked. Since the lock was not going away, I killed the process and all of the connections after, resolved immediately.
I go back to my browser and refresh and voila, my site works again. Now, this is what the client connection tab looks like:
Just a couple of connections, like normal – and no more locks.
Solution and Recovery:
In the end it was that “waiting for table metadata lock” that was stopping other queries from running. Knowing this, that restore from earlier was unnecessary. So, I re-restored the server back to this morning’s backup. Then, went to see if those locks came back. They didn’t, so everything is fine now.
As a final step, I rebooted one more time and updated WordPress (plugins, themes, etc.). After verifying everything is fine, I did a manual backup to take a snapshot of a known-good time as of right now.
My strategy here was divide-and-conquer. See if I can rule something in or rule something out. Each one of these truths affected the next thing I investigated:
- Is the server up?
- Is IIS/the website up?
- Can the site connect to the database?
- Once in the database, any problems which would prevent queries from running?
This approach helped narrow the problem down relatively quickly – and then I of course just stumbled across the answer when I saw that Client Connections screen in MySQL Workbench.
These sorts of failures don’t come around very often, so I thought it would be useful to show some of the troubleshooting steps. I still have no idea what caused that initial locking problem, but if I see my website in a hung state like that again, I at least know where to look now.