- Cisco Employee,
Hall of Fame,
Each year, one of the things I enjoy most is being able to get back together with an unbelievable team to help build and run the network for CiscoLive. Lately, I have been fortunate enough to get this opportunity in both Europe and the US. While in Europe I mainly help on the network management front, in the US I get to flex my CCIE and run the L2 switch network.
Last year in Orlando, I went in with a strategy to use Cisco Prime LMS to provision switches (mostly Catalyst 3560CG and 3750-X switches) after they had [manually] received a small manageability configlet. Had we done all of the switches before the show in our stage, this would have worked out fine. However -- and I really should have been prepared for this after so many years -- nothing works out as planned, especially at CiscoLive. We ended up needing to provision a lot of switches when onsite. The time it took to take a switch from "zero to hero" was on the order of 30 minutes. Needless to say, our project manager was not happy. I remember the thousand yard stare she gave me the Monday the show started. I knew I had to come up with something better...
I had heard that our operating system technology group was working on a new zero-touch provisioning solution called Plug and Play (PnP). There was a lot of promise in this as it would be very extensible. During our end-of-show NOC session in Orlando, I told the audience I would look into this for next year (if they invited me back ). Unfortunately, when it came time to start prepping for CLUS San Francisco, PnP wasn't quite ready for our switches. On top of that, even if it was ready, we needed a solution that would work on our current switch inventory that could be running code anywhere from 12.2(55)EX to 15.0(2)SE5. So what to do?
A few years ago, Jason Pfeifer and I presented a Cisco Developer Network talk about using Embedded Event Manager (EEM) to build a customized zero-touch provisioning solution. In fact, this solution is built on top of the same technology that PnP will use. You could view it as a precursor to PnP. The advantages of this approach is that it will work with very old Cisco code (anything that supports the Cisco auto-install feature) and it is extremely flexible.
In a nutshell, this EEM solution uses DHCP to allocate an IP address to a new switch (i.e., one that has been write erased). The DHCP offer includes DHCP options 67 and 150 to specify a TFTP server and initial config file name. Okay, that's stock auto-install. Where EEM comes in to play is in that initial config file. This file simply includes an EEM applet. The purpose of that applet is to download and register an EEM Tcl policy that takes care of the majority of the bootstrapping work. This work might involve, for example, communicating with a web server to pass information about the device to a backend database in order to determine the proper configuration and software to load. The overall ladder diagram is as follows.
Knowing that this kind of flexible, programmatic based way of doing a zero-touch deployment existed, I had the start of a plan on how to tackle Cisco Live San Francisco. I worked it out with the events networking team to come out to San Jose for a week ahead of the show to build out the solution and, of course, test it. The goal was to get all of the switches fully bootstrapped and ready to go...but we all know how goals go, especially when it comes to CiscoLive.
Upon arriving in San Jose, I immediately got to work building a FreeBSD server that would host the initial config, EEM Tcl policy, and the web service part of the solution. We configured Cisco Prime Network Registrar so that the device management DHCP scope included the proper options 67 and 150 values. The initial config was very simple. It just handled the copy of the tm_sw_autoconf.tcl EEM Tcl policy. This Tcl policy kicked off after one second of being registered and did the following things:
- Collect Product ID (PID)
- Collect Serial Number
- Collect IOS version
- Send those three data to the webservice running on the FreeBSD server
- Process the output
- If a new image was specified, load the new image
- If a config file was specified, load the config into startup
- Adjust the SDM Profile if needed
- Adjust the boot variable if needed
- Perform a config mem
- Reload the switch
The two most interesting steps here are 7 and 10. The config file is loaded from the FreeBSD TFTP server into startup instead of running because we don't know what version of code the switch is currently running (well, we do, but it may not be the code we want). Some of the configuration parameters may not work. By loading into startup, we avoid the merge. In step 10, the script performs a config mem. I found out via testing that this was required in order to load the VLAN database properly.
(Just an FYI, assuming a microcode upgrade was not needed, this process took around 9.5 minutes to take a 3750X from "zero to hero." If you know anything about how long it takes these switches to boot, that's not a bad amount of time ).
For the webservice backend, I created a MySQL database that mapped physical switches (based on serial number and PID) to logical switch "roles." A role in this case is simply a hostname and IP address. The idea being that a switch was named based on where it was going to be deployed within the show. When a switch phoned home for its config, the serial number and PID would be looked up, and if a mapping existed, the switch's config would be built and sent to the EEM Tcl policy.
The backend code itself was written in PHP. It took the inputs from the EEM Tcl policy, a template config file, then applied substitutions to render a final config for each device. Things like hostname, management IP, management VLAN, VLANs, port profiles, etc. were dynamic based on the device and where it would be located within the show.
As I've already said, things tend to change rather quickly at CiscoLive. We will get new [spare] switches, move switches between logical roles, etc. Having to maintain the inventory and mapping using SQL queries would be a pain. Therefore, I built a quick (and very dirty) web frontend to the database. This frontend provided interfaces to add new physical and logical switches, search for switches, map switches to roles, and export to a CSV file that could then be imported into Cisco Prime Infrastructure 2.1 (which we were using to manage the show's operations).
While that took care of the bootstrap solution, there was still another big pain point to address. Since the show is so dynamic, we end up doing a lot of changes to ports depending on what will get plugged into them. Even with the best preparation, it's often not enough to handle what actually gets deployed. In years past, we would use something like a template in Cisco Prime LAN Management Solution to do these changes. This worked well, but only a few people had access, and thus there was a bottleneck as changes began to pile up. In other words, more wrath came down on us .
This year, we used an Auto Smartports-like feature to reconfigure ports based on the device that plugged into them. Only we didn't use Auto Smartports. Instead, I used two EEM applets that were specifically suited to the devices that we would be using in the show. One applet would listen for a new CDP event, and based on whether or not an access point, camera, phone, session capture codec, or Telepresence unit plugged into the switch, it would apply the necessary VLAN and configuration to the port. The second applet detected the port transitioning to a down state and reset that port to the default configuration. This meant, among other things, that an access point could be plugged in to any switch port, and it would just work. Given we had over 900 APs, this made our wireless team very happy.
While I've been linking to some of my code within this blog, it might be helpful to start at the root of the switch-ztp module in my SVN repository. Here you will find all of the necessary bits, including configuration templates, that you could use to replicate the solution if you're so inclined.
But what about things that didn't send CDP advertisements? We had users, kiosks, and things plugging into ports. How did we handle the necessary changes for them? This year, Jason Davis from Advanced Services put together a port change tool using Cisco Prime Service Catalog and Cisco Process Orchestrator that would pull the switch inventory from Prime Infrastructure and present the user with a friendly form interface that allowed them to select the required port profile for the device in question. They didn't have to know the VLAN, just the intent of that port, and the tool would determine the correct values. On top of that, he gave a lot of people access to it so that there was no bottleneck.
Remember my earlier trip to San Jose? During that trip, I got a preview of a project that Patrick Warichet (last year's lead for routing on the CiscoLive network) was running with the help of the events networking team. The idea was to build Arduino-based PoE devices that could grab a DHCP address and display CDP packet details on a small LED screen. The recipe for building these devices was to be given to each of the Network Academy students that would be helping us with the show. When they were troubleshooting a port issue, they could plug in, verify link and PoE were working, verify DHCP, and provide the CDP details of the upstream switch.
This was very cool, and extremely helpful! A student would call the NOC after grabbing the CDP details. They would relay the switch and port info, and one of us would make the port change. The vendor would be up and running within minutes. This is definitely something we'll repeat at future shows. Patrick has written a blog entry specifically for this, as well as posted his code to GitHub.
That's about it for my view of automation of the Cisco Live San Francisco network. You should note, however, that while automation can make things a bit easier (and certainly cuts down on wrath), it is the people that made up the unbelievable team that delivered the rock-solid network that our attendees, vendors, and staff enjoyed. It is this team, assembled from all across Cisco's organizations and theaters that makes helping to build and run the CiscoLive network so much fun.
Check out our slides for more details on what it took to pull of the entire show network, including the core, wireless, and security facets.