A virtual hit for MLB Advanced Media
- 03 January, 2008 07:54
December is a relatively slow time of year at MLB Advanced Media, the company that brings you the official Major League Baseball Web sites. From pitch-by-pitch accounts of games to streaming audio and video -- plus news, schedules, statistics and more -- it has baseball covered. Doing so requires serious horsepower, so much so that the company's Manhattan data center is pretty much tapped out in terms of space and power, according to Ryan Nelson, director of operations for the firm. Strategic use of virtualization technology enabled him nevertheless to forge ahead with implementing new products during the 2007 season, and promises to smooth a shift to a new data center in Chicago in time for the 2008 season.
How long have you been using virtualization technology?
It's all pretty new. We are a homogeneous Sun shop, so we're not really touching a lot of the VMwares of the world. One of the big features of Solaris 10 is Solaris Containers and Zones. We started using Solaris Zones in the last year to actually split off server environments, development environments and [quality assurance] environments.
During the 2007 season we got hit with a big new challenge we didn't find out about until the All-Star break, which was to add a chat product. There was pressure to get it lit up before September so fans could chat about the playoff races and use it during the playoffs. But it was a big, ambitious project and I didn't have any rack space or spare power and [there was] no time to order new machines. So, we worked with a company called Joyent in California that provides hosting using virtual zones and virtual storage.
We said to Joyent, 'We need 30 machines; 10 in a development cluster and two more gangs of 10 as big chat clusters.' And so the MLB chat client was basically turned up in a couple of days vs. a month or two that it would have taken us to get somebody to ship and install all these machines. And then we developed like crazy for about a month, tested for another three weeks, then launched it.
At launch time we asked for another 16G bytes of RAM in each server. It scaled very well. When the playoffs and World Series came around, we ordered up 15 more machines and got twice as much memory and processors installed on them, as well as on the ones we already had. Joyent dials all this up and down. As soon as the World Series is over, we call and say, 'Thanks, that was great. Let's scale down to a skeleton crew of these machines.' So, when I have a need for it, we pay for the utilization. When we don't, we don't. We can turn it up and down as we need to.
We can respond to new projects really quickly, and it also lets us try out new products. If our chat product had been a huge failure, we could've turned the whole thing off and it wouldn't have been a big deal. It makes it easy to try new things. We don't have to sign a contract, get approvals and all that.
We can also respond to the seasonal load changes. And we can also respond to differences in the season that we know are coming. In April, we're focusing on registering new users and selling new products. On draft day, I might need to really beef up my stat resources because people are querying our minor-league stats engine to see who this guy is they just drafted. In the middle of July I may need an additional 10 machines to be generating the CAPTCHA images and processing All-Star balloting. All-Star balloting is about four days of crazy database load, and then it goes back to nothing.
Give us a sense of the MLB.com infrastructure.
In terms of Web servers, we have roughly 100 at our New York data center, and we have a second data center in Chicago that is just about to go online that has 130 servers. So, by the time we get cooking on the 2008 season, we'll have in production about 180 of those.
So you're just wrapping up the new center?
We've had it for about a year, but it's been in build-out phase. Part of the reason we're interested in virtualization is because of the power, space and data-center-capacity pain -- we've certainly felt that. We were actually in a facility in Chicago and outgrew it before we got in production, and so moved to another facility from the same company. We knew we would need more floor space and more power. We're finishing it this off-season. Once Chicago comes online, we're going to take much of the New York data center offline and rebuild it.
I can't resist -- so this is a rebuilding year?
Right. We'll upgrade servers to Solaris 10, upgrade our [storage-area network] infrastructure and replace some older hardware with newer, thinner models that use less power and generate less heat. That data center is in Manhattan, where the cost per square foot is just ridiculous. So, driving up utilization and squeezing everything you can out of every last square inch of rack space is important to us.
We'll move all the services we have running in New York to our data center in Chicago. Migration services is one of the features of virtualization in general, but Solaris Zones specifically. You can do things like clone a zone or migrate a zone. We can move a virtual machine from rack to rack around a single data center, and actually move these services to a virtual machine in a different city.
Also, in addition to seasonal traffic shifts, our load characteristics change drastically during the day. If I have 10 games starting at 7 p.m., there's a huge influx of traffic right at 7 p.m. If we have a bunch of day games, people use their high-speed Internet connections at work, reloading the scoreboard page a lot or watching our flash Gameday product, which has [pitch by pitch updates], or watching the streaming video online. So the ability to slide computing resources around is pretty handy for us.
How else are you using virtualization?
All the services in our new data center will be put into containers, to get the manageability and security benefits -- if there's a security issue, all they've broken into is one virtual machine. Even if a machine has just one service running on it, say one Web server, that's running in a virtualized container. Should the day come when I need to move that service to another piece of hardware, I can just move the virtualized container. My pain-point is really low.
It also lets us accommodate developers who are in a pinch because our season starts this year on March 25 -- the [2007 World Series Champion Boston] Red Sox are opening in Japan against Oakland. That day is hard and fast. Previously, as a security guy, it was my job to say no to developers who wanted to log into a production machine and look at something because they were trying to debug a problem. Especially in the age of [Payment Card Industry] compliance and all that, we need to secure operational access to production machines. But now I can snap off an exact copy of the production machine and hand that to the developer, or I can give him access to a different Solaris Zone running on the same machine. So it let's us draw interesting security lines.
What were the biggest challenges when you were implementing virtualization technology initially?
For every application we run, you end up with some assumptions, such as it will always use this IP address or this much memory. We need to make sure these assumptions are kept to a minimum or at least abstracted out into a different layer or into config files that can be then transformed as part of the virtualized-host boot scripts.
Wrapping our heads around this extra layer of abstraction from an administration perspective is a challenge. If I've got 100 hosts, that's an administration challenge already. If each of those hosts has one or two or three virtual hosts running inside of them, I need to keep track of those as well. And they move around a lot, so you need to be very careful. It seems like we've had to buy three times the number of white boards we use just to keep track of all this stuff.
Right now we're doing most of the management by hand with scripts that we've written ourselves because we've only got, not a toe but maybe most of a foot into the virtualization pool. But we need to get a handle on it before it gets out of control. We're quickly going to outgrow the point where we can manage an army of virtual machines like we can manage a smaller army of hardware because we're doubling our data center capacity on real physical hardware in a couple of months.
In the off-season we also have regular employee turnover, and it's interesting trying to hire people who have virtualization experience, especially big-enterprise virtualization experience. You can't really go out and say, 'I need to hire three guys who have been using iSCSI and Solaris Zones for large scale Web infrastructure' because they're just not out there. So, we're learning on our own, basically, and we're working with Sun Professional Services quite a bit. I can imagine if this had happened five years ago, the Zones feature in Solaris would have been an extra license. Now it's all free, and it's really cool, but where they really want to make their money is helping us on the services side.
What other applications do you see for the technology?
We're tasked with transcoding a huge library of archived ball games. I can see where we would take a rack of machines that are used during the season to serve up files and reconfigure them to run a virtual instance of Windows to become a Windows Media encoder. We can take those servers and say, 'Today you're going to be 20 Windows machines,' and throw batch jobs at them and have them transcode stuff as fast as possible. The Sun server has an Intel chip inside and can be a Windows machine when it needs to be. And if you have a good management console, you can just say, 'Install Windows on these 30 machines or boot Windows on these 30 machines.' That's pretty interesting. Virtualization lets us slosh resources around seasonally.
Sun also just announced xVM based on Xen. So Sun's got Solaris Zones, which is kind of a virtualized user environment -- one kernel with a bunch of virtual computing environments underneath it -- and then there's the Xen piece, which is actually booting multiple kernels on big-enterprise hardware. That's in partnership with Microsoft, so it supports things like Windows. I would imagine that that's the technology we would end up using to do projects like I just described.
Have you found any sorts of applications that do not lend themselves well to virtualization?
We haven't even considered running our database stuff on a virtualized host. For all of our databases, we really need high-performance storage and lots of dedicated hardware. That database includes our Major League Baseball stats, fantasy-team data, all the newsletters customers subscribe to, and what subscription audio products they've purchased, and so on. With virtualization, you do add a lot of extra abstraction. The big challenge for people who are inventing these new virtualization technologies is to make the overhead as low as possible, but it's still there. For really high-performance computing, if you need one big monolithic machine, virtualization doesn't help.
Have you been able to determine your ROI on these virtualization efforts?
Not really, but I know it's very good. It's nice when someone comes up with an ambitious new project and my default answer isn't 'no.' It used to be, 'You'd like to give a free taco to everybody in the country? That's going to take X number of servers. And you need them up by Friday? I just can't do it.' Now I can say, 'Yes, you can do that and here's what it will cost. And if you have a big surge in traffic, I can double the number of your servers and it's going to cost this much.' And if they're going to make three times that much on the product, they'll say, 'That's fine.' So it lets us get to yes very easily. And the time from a decision to delivery is very fast.
What have been the most pleasant surprises about virtualization?
I'd say it's not as hard as we once thought. If you think back to the days of mainframes, you actually had to write [code] for a compute grid or to spread your application around. When the developers use their instances of applications or of servers, they don't necessarily know that they're even running on virtual machines. They just ask for access to a machine to test something and we give them logon information and ask if they need root access on the box, which blows their minds sometimes. But once you're in a virtualized environment, it's very familiar to people. It's more administration work on the outside, but we don't have to train people much to use the resources that are presented to them in a virtual way.
Any big disappointments with the technology?
Not yet. But we're just getting into it.