- Cisco Employee,
Hall of Fame,
No Food, No Sleep, Yes Network
Jason Davis and I arrived very early on Wednesday June 6. We caught the 6:10 am flight out of Raleigh/Durham through Dallas and into San Diego. Once at the convention center, we wandered around searching for the network crew. Apparently, the show offices are like the train to Hogwart's. You have to know where to look to find them. We finally located the right floor 2 (yeah, there are two of them) and met up with the crew that had been onsite since Sunday.
Like all good NOC offices, there were a sea of laptops, monitors, and networking equipment strewn around Show Management Office E. Jason and I quickly found a workspace amid the chaos and began to pick up where we left off. While Jason had done some amazing work getting vCenter cleaned up with proper ownership notes, logical cluster and VM assignments, and even a message of the day, he still had to bring up some of the other blades and start to balance out the VMs. Last year we had an incident on Wednesday where a provider flap on the east coast cost us about half of the internet. When that started happening, Jason was the only one in the NOC. He didn't have access to all of the equipment. This year, he wanted to make sure that all of the key players knew how to access the devices as well as the applications. To facilitate this he setup this cool open source password management application, TeamPass. This way, we could store all of the infrastructure credentials in a secure location that we could all access. Password management was made even easier as I had setup a pair of Cisco Secure ACS 5.2 servers (VMs, really) to authenticate and authorize access to all network devices. As a nice side-effect of this, we also could authenticate all users into applications like LMS and CUCMS.
While Jason got to work on vCenter, TeamPass, and DNS/DHCP (more on that later), I began to augment some of my Netconfig templates as well as introduce a few small baseline compliance templates in LMS. Why? Well, even though I shared my sample access switch config with the team, a few things were missed. First, we needed to enable QoS DSCP trust on certain access ports to which we would connect IP Phones, digital media players, and other video endpoints. I had also staged the devices with a domain name of "sandiego.ciscolive.com" when the final subdomain was "show.ciscolive.com". No problem. Two simple baseline templates did the job, and I scheduled them to check compliance and deploy periodically.
Submode: interface [#GigabitEthernet.*#]
+ switchport access vlan (201|202|203|351|353|353)
+ auto qos dscp trust
+ ip domain-name show.ciscolive.com
No sooner that we had gotten into the swing of things that it was time for the team kick-off meeting. We all met outside on the convention center terrace. It was the last time some of us would see the afternoon sun. We introduced ourselves, and we laid out the plans for getting the network up and running. Some good news was that most of the IDF switches were in place, we had the core setup with working internet connectivity, and we had a number of helpers to run around to connect cables and fiber. Some local Network Academy students were once again going to help with the network. These guys would prove themselves invaluable. Some bad news was that apparently the Marriott had sold some of our rooms to another event. We wouldn't have access to all of the Marriott floors and rooms until Saturday. Ugh . But our primary focus was going to the convention center.
Our go-live date was Saturday. We had to have enough of the network working by then in order to allow the first round of the onsite CCIE labs to happen. So why are we still meeting?! Let's get back to work.
As the Network Academy and other runners worked their way throughout the convention center they would contact Jason and I back in the NOC office. The big thing we needed to do is configure switch ports to have the right profile for what was being connected to them. To facilitate this, I used the cool new Template Center in LMS 4.0. This feature was made much more useful in LMS 4.2 with the Template Editor. Using this editor I created a number of templates that would deploy the correct port configuration based on a few parameters supplied by the user executing the template. See the screenshots below. I've also attached my access port profile template here for those that want to play a bit more with LMS's Template Center.
The first day was really less stressful that the first day last year. All the pre-staging really paid off. In fact, I was able to eat both lunch and dinner on Wednesday. Plus, I was able to leave at a reasonable time so I could get some rest:
Day 1 Summary:
Day 2 was some more of the same. I arrived around 7:45 am and began to sort through the list of newly available switches. That's a big problem with a network like this. While everything was pre-staged, and all of the available devices were already "known" to LMS, the network in San Diego was not 100% live yet. This meant I had to do a lot of discoveries to continually see what was reachable in the network. I will say one of the great things with LMS 4.2 that made this a breeze was the enhanced discovery capabilities. I was using a credential set to be sure. All of the devices in the show used the same SSH and SNMPv3 credentials. Since many of the devices had some pretty wild hostnames (MM_IDF_for_3rd_floor_Mtg_room, anyone?), the ability to automatically adjust the LMS display name based on sysName was tremendously helpful. My global discovery settings looked like:
There was still one more piece of the puzzle to take care of, though: DNS. With these exotic hostnames, it would be nice if we could access them via direct SSH without needing to know IPs as well as see hostnames in LMS Fault Management. This meant adding the hostnames into Cisco Prime Network Registrar. Sadly, there isn't a GUI way to import a large number of hosts in bulk. However, CPNR comes with a really nice command-line utility called nrcmd (the two CPNR guys there called it "nerk-med", but n-r-c-m-d was fine for me). I exported the DCR device list to a CSV file, then used a simple Perl script to pull out the display name and management IP to create a nrcmd command file to add all of the hosts entries in one shot. This worked perfectly! To get Fault Management going in LMS, I simply disabled Fault Management to purge the existing devices, and then reenabled it to bring them all back in by hostname.
And speaking of DNS, Jason and the CPNR guys were working like crazy to get all eight(!) instances of CPNR up and running. This was a full leading practice deployment of CPNR to do a number of tasks. We had a redundant pair of authoritative servers to serve out our subdomains show.ciscolive.com, noc.ciscolive.com, den.ciscolive.com, and sun.ciscolive.com to the internet. Two servers were caching servers that pointed inward to the show network to serve our attendees. Two were a redundant pair of DHCP servers. One was a regional MoM server, and one would be used to handle the IPv6-only SSID that was doing NAT64. That's right. This year we were 100% IPv6 enabled. Not only were we doing SLAAC on all of the wired and wireless user VLANs, but we had a dedicated SSID for IPv6-only users that did NAT64 to the internet. The good news is that CPNR 8.1 offers a VMWare OVA appliance deployment method, so spinning up these instances was a piece of cake. Adding all of the host records, scopes, and policies was a different story. Jason, Neal, and Peter (the CPNR guys) did a great job there (despite Neal and Peter being Celtics fans).
With LMS humming along, I turned my attention to Unified Operations Manager and CUCMS. That was fairly easy to get going. The show had one CUCM cluster with two CUBE routers. I added those to CUCMS, and I was able to look at some basic health and call statistics. We didn't have a lot of phones in the show, but all of them we did have were 9971 video phones. It was kind of freaky to have people call the NOC help desk and then have their face pop up. Who knows, maybe all TAC calls will be that way at some point...
The next thing I knew, I looked up and it was 11:00 pm. Ugh. Well, at least I had some lunch.
Day 2 Summary:
Day 3 was the day right before the CCIE exams. We were in scramble mode trying to get all of the switches and APs in place to support the lab needs. We had our heads down most of the day troubleshooting fiber patches, testing switch connections, and moving ports around. Because customers were coming onsite, we had to make sure some of the registration infrastructure was ready, too. This was another late night. I remember right before I was going to leave, one of the CCIE lab guys came into the NOC office with a switch they needed provisioned for the morning. I could have left it and done it in the AM, but I figured it would be best to sort things out at night in case something went wrong. The bad news was that after checking everything to make sure it worked, it was around 11:00 pm again (and no lunch this time). What's more, I had to move hotels, and I hadn't checked in to my new hotel. I walked over to the Hilton Bayfront only to find that room service stopped at 10:00. At least we had some junk food in the NOC.
Day 3 Summary:
Sleep: A Little
Day 4 was Saturday, the first day of the CCIE labs. We tried to keep things in that area of the SDCC rather static (as you can imagine). So Jason and I worked on more switch provisioning as well as deploying some strategic IPSLA shadow routers. As I mentioned, we lost half of the internet last year (not our fault), and this year we wanted more heads-up proactive monitoring so we would know if something like that happened in the future. We brought three 2811 routers with us to use as IPSLA shadow routers. We had them deployed near the core in the SDCC as well as one in the Hilton MDF and one in the Marriott MDF. We setup collectors on each router to look at HTTP latency between the show and sites such as www.cisco.com, www.apple.com, www.adobe.com, ciscolive365.com, and discovery.com (the Myth Busters were in town after all!). We also added another VPN UDP jitter collector for monitoring the CCIE lab area. Using EEM, Jason and I would be alerted if there were any problems with these collectors.
The good news is that Day 4 ended on a high note. The CCIE labs went off without a hitch. The bad news is that this was another late night roaming the halls making sure we had all of the digital media players on the right VLAN, and making sure that all of the internet kiosks and registration would be ready for the big opening on Sunday. It all paid off. The network was up. We had our full 2 Gbps in place from CenturyLink, and all four firewall contexts were protecting us from the internet and our data center from the show attendees. It seemed we were ready for show time...