Before we begin, this has taken a long time to plan and I’m excited to share this story, provide all the tech specs and details we geeks love, and blast out pictures and video.
Why the new cage?
Thanks to all our clients over the 15yrs we have continued to grow and demand more space and particularly more power from our data centers. In Atlanta, we had two older cages from yesteryear that began on 110v power and reached their capacity in power long before space. In addition servers have changed alot since those cages were first setup, when we started with towers then but today and for many years every new server has been a new lower power consumption (though much more powerful) rackmount server.
The big objective was to get all our clients updated to our latest colo build, 208v, and specific pdu’s and switches that our staff and systems love and have grown accustom to but also require more capacity. So we found ourselves in the process over years of trying to get rid of old machines, towers, older cpu’s in exchange for the newer ones.
We use two data centers currently in Atlanta, AtlantaNap where we first opened our Atlanta presence and also ColoAT. We found ourselves in a unique situation with AtlantaNap the newest half of the facility received massive injections of upgrades and updates to become a premier health care facility. If you know anything about health care, you know they require all the perks for data. Our data center also needed space in the older area to improve upon its lacking power infrastructure.
So we gave up our older cages in exchange for new ones in the new space and took this chance to get all clients over to the 208v pdu, HP/Cisco routing/switching, and that promised backend we began but has taken a while to build out at all our locations for all clients. This is our project: setup a new cage with all new gear, fix any existing patchwork along the way, physically move all rackmount systems from our older cages to this newer one and to move all clients on a tower based system to a newer rackmount one in the new cage.
What kind of preparation goes into this?
An insane amount of planning has taken place for this from June 2012 up to and including last minute changes in Jan 2013. We have to coordinate with all our vendors to have the cage setup on time, power provided on time, distributed as we want, all our equipment and servers arrive ready to be racked on time, and get into a smooth “grind” of how to move nearly a thousand servers at this one location, from spot A to spot B.
First, what are we moving? We audit all our servers to prepare an accurate accounting of what was where, and where is it now, so throughout the process techs, clients, and NOCC admins can all find what they need. So we build a master list and begin to sort it via VLAN, comparing power draw @110v in the old cage to power @208v in the new cage, which actually runs from private pdu’s @ 3phase 480v. Anyway that’s nerd talk for lots of power. Multiple 100kva pdu’s are needed to fill this cage.
Ok, so when we know how many we are moving, we determine where and what power that will draw. Then we build our power plan on how to reach peek deficiency. This includes a mix of what type of server are we moving, is a small 1 drive 1u, or a mid 8xdrive 2u, or one of these 16 disk systems. Load each cabinet up heaviest and most power hungry at the bottom thinning to the top until we get our peak server per cabinet & power per cabinet usage. The two are designed to be one and the same. i.e. if you fill our cabinets copying the way each looks from purely aesthetics 2u’s, 1u’s you’ll find you can fill the cabinet fully and reach our ideal range of power consumption.
Well we better get rid of some of these really old things in the process, they will likely be the ones to give us problems anyhow. So we tally up how many potential problems we might have on our hands and order our servers, switches, pdu’s, cabinets, colo & power. The project’s tentative start date is set a few months out. This plan will be revised and scrutinized many times over before the start date.
We’re going to have to safe and cleanly power down each system one at a time, take it out of the rack, cart it over to the new cage, re-rack and re-wire it there. First, we plan to come in and start prewiring everything the first week.We know some systems won’t come back up on boot, they just won’t. Some systems haven’t had a reboot in 500,800,1100 days, we don’t know what will happen for sure with that. We also know through the years of upgrading systems for clients, ourselves, repairing bad hardware, that we may find other anomalies.
Ok, so we’re in Atlanta, we have only just arrived and we are informed we will be behind already. They aren’t done preparing the cage which was due to be handed over Jan 1st. We had just a physical cage.
Day – Zero
They have completed the power runs, cabling trays, most essentials minus the network. We begin to bring in our Dell cabinets, Cisco routers, HP switches, and get to cabling. After a good few days your cage is almost ready, almost.
What other ways does this help the clients?
This expanded power, expanded networking and expanded backend brings more network capacity. We can and did complete the phase to migrate failover systems to the Cloud as a result of this, and semi-dedicateds are being moved now, which will be done in a few weeks. We have managed to get many of our oldest clients on over to all new gear and this new setup without downtime.
And, this takes us to the last phase of a network project aimed at increasing capacity and reducing risk/downtime, adding more diverse carriers. As part of this there is increased backend capacity for offloading all backup processes, combined with the completion of Iderav5 recently, has provided remarkably improved backups.
Which takes us full circle the last bit of projects which include removing the tower systems from our inventory which has been ongoing for some time and will be ongoing for months to come.