Ninja migrations from VMware to KVM using vmdksync

Published May 15th, 2012 by Barney Desmond

We recently made the decision to pay off some of our technical debt by eliminating the VMware servers we built when we first started our Virtual Private Server (VPS) offering. VMware is a commercial vendor platform so it’s not exactly trivial to jump ship, but it is possible with some time and effort. Forcing a few hours downtime on our customers for business reasons is not cool, so we had to find a better way.

Background and rationale

When we first started offering virtual servers the software landscape was very different. After comparing what was available at the time we settled on VMware ESX for our virtual private server product – the right features, suitable for a VPS product, secure and manageable enough, sufficiently mature and reliable, and a nominal level of support.

Things changed over time and VMware wasn’t doing what we needed, and we’ve since switched to using KVM for our new VPS deployments. The VMware servers are still ticking over but they’re not without issues, particularly with the dedicated Windows machine we’re forced to use to access the console and management tools.

Thus, the (rather easy) decision was made to migrate all the VMs to our KVM-based infrastructure. As a bonus we get to move them to shinier Dell hardware. The only question was how?

Challenges

We’ve done plenty of migrations between VPS servers before, it’s really just swapping one set of virtual hardware for another. Linux doesn’t mind, but Windows will frequently have a fit, and that’s where a bunch of the problems lie.

In addition, downtime needs to be kept to a minimum. You can mount up a VMware disk image and pump data across the network, but that’s slow. At least if we were dealing with a Xen or KVM system as the source then we could use lvmsync to dodge the problem, but no such luck here.

Enter vmdksync

This last point is what made our Matt Palmer itchy – why can’t we apply a binary diff of disk images? VMware has snapshots just like LVM after all.

VMware looks mostly like an RHEL3 system when you login, and VM data is stored in a VMFS filesystem with some special access semantics. The VM disks are just raw files, so far so good, but the filesystem doesn’t show up when you run df or mount, and VMware seems to hold exclusive locks on the .vmdk files. You can’t even use file or hexdump on an in-use image.

With some experimentation, Matt found that locks only apply to files opened for writing – taking a snapshot releases the lock on the original (“flat”) file, and locks the snapshot (“delta”) instead. Hallelujah! We can make a baseline copy of the disk from the unlocked flat file while the VM is running, then apply the changes in the delta file, which should be very quick.

That’s exactly what vmdksync does. After nutting out the perverse description of sparse extents in the Virtual Disk Format 5.0 Technote, Matt put together a little ruby script to merge a VMware snapshot over a target device. That’s an LVM logical volume in our case, but it could be any sort of disk image that you like.

Step by step

End-to-end, the procedure looks something like the following. Depending on how you prefer to manhandle VMware, Matt has some convenient command-line examples in his README over at the repo.

  1. Tell puppet to setup a blank VPS on your new KVM server
  2. Take a snapshot of the victim on the VMware host
  3. Use dd and netcat to copy the flat file across the network and onto the empty logical volume (LV) on the new server
  4. Shutdown the victim on the VMware host, releasing the locks on the snapshot delta file
  5. Push the delta across the network to the KVM server
  6. Use vmdksync to apply the delta to the LV
  7. Fire up the VPS on the KVM server, you’re done!

With judicious use of scripting and quick hands, you can do all this with as little as about 90 seconds of observed downtime. That’s not much longer than it takes for a reboot.

Dealing with Windows

It’s not all kittens and rainbows when it comes to Windows, as mentioned earlier. The massive changes in (virtual) hardware often cause problems when booting after migrating the VPS, so we use a commercial tool called ShadowProtect to inject the necessary drivers into the installation before bringing it to life again.

Sometimes you have a problem with the bootloader... Yep, that's broken!

ShadowProtect is also a very fast way to fix bootloader issues that sometimes crop up during migrations. Once successfully booted, the network interfaces will need to be reconfigured and the system reactivated, thanks to the hardware changes.

Start-to-finish, a Windows systems takes about 20-30min to get up and running again, which is quite respectable when you consider that regular Windows Updates can take as long to apply. We also remove the remnants of VMware tools and drivers to keep things tidy.

Wrap up

This was an overwhelmingly successful process that saw us sweep several dozen VMs onto new servers over the course of a couple of weekends. Planning the work and contacting all the affected customers probably took more time than actually doing the hands-on work.

If you’ve run into similar sorts of fun when dealing with VMware we’d love to hear about it. Likewise, if you have any questions just leave a comment.

0
Comments

Getting a lead on Phil

Published May 15th, 2012 by Barney Desmond

You’ve read about some of our fresh sales blood at Anchor, now meet one of our salty dogs, Phil Pace.

Been here since the dawn of time eh?

I started as the first person in a sales role here. I’m now an Account Manager. When I first started it was Keiran (now CEO) and Andrew (co-founder) doing the sales, and pretty much everything else.

Most of the people at Anchor seem to be from a technical background, no matter what they do at Anchor, is that true of you?

Anchor is my first sales role; prior to this I was working as a lead developer for a digital agency in London. Not many companies will give a developer a chance in sales, but a developer background comes in handy in all sorts of ways – when I started working at Anchor, one of the first things I did was write my own application to track my sales leads.

How did you come to be in the UK?

My dad was in IT sales at Microsoft and we moved there when I was 17, after I finished high school.

And then you studied computer science?

No, I’m a self-taught developer. I spent some time with ISPs at first, learning basic server admin and early CMS products. Now I like to work with the LAMP stack when I can.

So why sales?

I wanted to get out of that developer mindset, didn’t want to become introverted with my head full of code all the time. I wanted to challenge myself a bit and learn to do new things.

Your skin is a strange colour, is that pigmentation due to… exposure to the sun?

I like the outdoors. I like to snorkel and fish, at Clovelly or Narrawallee down the coast. I get in a few gym sessions a week, and do a bit of golf, tennis, squash and indoor climbing.

Last movie seen: Puss In Boots 2

Best album ever: AC/DC’s Back In Black

Playing: Metallica, on my black Ibanez R90 3 pickup electric guitar

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

AR Drone flys away!

Published May 11th, 2012 by Barney Desmond

A home has been found for our AR Drone, drawn from entries taken at The Internet Show in Melbourne – congratulations to Ben Crowe from XCOM Media, you’ll be hearing from us shortly.

By which we mean you might hear the buzzing of the drone as it comes to find you… :)

0
Comments

Sachi King: Our woman in America

Published May 10th, 2012 by Barney Desmond

We took a big step forward recently with the appointment of Sachi King, our first full-time support person based in the US. This follows the recent launch of our Los Angeles point of presence early in the year and growth of the US client base, which includes Testflight and Bang the Table.

Our US beachhead is growing fast, so Sachi will be on the ground providing Level 2 support during US business hours. Based in Lawrence, Kansas, she’s an experienced networking and support professional and is an important addition to our team.

What’s your key role at Anchor?

I’m there to keep an eye on the servers overnight Australian time, so we can give better support to our growing base of United States customers. It’s really exciting to be a part of this expansion by Anchor – I’m helping them go global!

How has the training been?

It’s been great. They flew me over here [Sydney] for a month while I go through the full Anchor training, and there’s certainly a lot to take in. I’m hoping to start taking phone calls soon.

What’s your background?

I’ve been a Linux person since I was like, 14, and so networking and technology has always been a passion of mine. Before Anchor I was working for a company that was a turnkey provider for clusters, and I did everything from assembly of parts, to software set-up and support, to actually putting the servers in the rack and wiring them.

I think it’s been a good grounding for working at Anchor.

What do you do in your spare time?

I’m a bit of a motoring enthusiast. I bought a Honda Civic a few months ago and I’m looking forward to playing with it and doing it up.

I’ve also got a motorcycle, a Kawasaki Ninja 500R, which I really enjoy riding on weekends. It’s a sweet bike!

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

Backups for Postgres that don’t suck

Published May 9th, 2012 by Barney Desmond

PostgreSQL is awesome in many, many ways. One of these ways is in its use of a write-ahead log, or WAL, that enables:

  • Atomicity
  • Durability
  • Replication (that isn’t insane, like MySQL’s)
  • Backups
  • And much much more! </televisionSalesman>

We’re interested in backups today, because textual dumps for backups (the lowest common denominator for mysql and pgsql) puts a lot of load on the server and tends to be very slow.

We started looking into something like mylvmbackup for postgres, but then discovered that there was no need – it’s already built in!

We’ve made a easy guide to trying it out for yourself. It’s fairly detailed and hands-on, so we’ve split things out to a separate page:

Better PostgreSQL backups with WAL archiving

If you’re a Postgres user, we’d love to know what you think. If you’re still using MySQL, why not make the switch to greener pastures? :D

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

Back from The Internet Show in Melbourne

Published May 8th, 2012 by Barney Desmond

A quick shout to everyone who dropped by to see us last week, thanks for coming along and saying Hi. The Anchor Squad is now home and returns to work in full force.

Keiran presenting his talk on scaling for e-commerce sites

We hope you all had fun, we certainly did! Look out for a post in the near future for the lucky winner of our ARdrone quadrotor.

Minh demonstrating one of the quadrotors

0
Comments

Announcement of PHP security vulnerability (CVE-2012-1823)

Published May 7th, 2012 by Barney Desmond

One of our sysadmins picked up the disclosure of this PHP vulnerability last week. It’s kind of important, so we thought we’d share it with you.

Eindbazen PHP-CGI advisory (CVE-2012-1823)

It’s interesting because a default mod_php installation isn’t vulnerable, but a fairly common deployment technique using php-cgi is (because it’s sane and not a gaping security hole the size of a hallway).

Details are still up in the air. There’s a couple of official updates for PHP, but it’s been shown that they do not fix the problem. As a result, people have come up with a few mitigation techniques that will dodge this bullet until it’s properly fixed.

To clarify our position, Anchor uses php-cgi and immediately took steps to mitigate the threat. We’ve written about our use of php-cgi in the past, you should have a read if you’re interested in the how-and-why.

If you have any questions about your hosting and what this means for you, feel free to drop us a mail at support@anchor.net.au, or call us on 1300-883-979 (+61-2-8296-5111 from outside Australia).

0
Comments

We have an iPad winner!

Published May 3rd, 2012 by Barney Desmond

Congratulations to James, he’s the winner of our recent customer support feedback survey.

James dropped into our booth at The Internet Show in Melbourne to pick up his prize, it's always great to meet face-to-face.

Many thanks to everyone who responded. We got some great answers from the feedback, which we’ll be using to decide how to improve our offerings, and create fantastic new ones.

0
Comments

We don’t usually post puns…

Published April 30th, 2012 by Barney Desmond

…but when we do, it’s like getting smacked in the face by a tsunami.

Noone’s owned up to posting these on the beer fridge. It’s probably for the best, lest they receive a terrible punishment.

Tags:
Posted in WTF

 Leave a comment

0
Comments

DHL? Shoulda flown itself here instead

Published April 27th, 2012 by Barney Desmond

Ooh, a package from Amazon! That must be the 100 copies of Unix Network Programming that we ordered to give away to underprivileged children.

It's huge! We also ordered a few CR2032 batteries, but this box is too small to be them.


Aww man, it’s just an AR Drone. :( This must be the one we’re giving away at The Internet Show next week.

Minh tells me this thing can turn on a dime, Macross Zero-style!


That’s pretty cool I guess. As a bonus, Amazon also include nine metres of packing paper. Our Windows sysadmins are going to paint it up into a giant nyancat and string it around the office.

Too short to make a good Longcat :(

0
Comments