Has anyone experienced issues with a dual 5500 nexus chassis during nx-os upgrades ? (VPC, dual nics for servers & all fancy stuffs)...
when upgrading to 7.0(1) - we had issues with the first switch losing its config (FEX interfaces etc) - which points to a bug... but as the first switch was coming up, all the servers with dual NICs connected to the 2nd switch also lost IP communication.. VPC's were up to the aggregation layer, but some kind of spanning tree disputes cropped up....
N5K ------- N5K
N7K aggregation layer
N7K Core layer
After upgrade, when the 1st switch came up with no FEX configs , we saw these errors in Aggregation 2, and we lost communication to all servers in 2nd switch too:
%STP-2-DISPUTE_DETECTED: Dispute detected on port port-channel8 on VLAN0107.
%STP-2-DISPUTE_DETECTED: Dispute detected on port port-channel8 on VLAN0158.
etc for all VLANs...
Has anyone else encountered this ? Does running 2 different IOS versions in VPC based N5K switch cause any issue ?
This is an old post now but it may prove helpful:
Yeah.. I went through that before .. wasnt that useful..we dont have uni directional links or anything of that sort.
One big change in 7.0 - but not with 5.0 - is that the vpc uplinks have bridge assurance disabled (on 7.0)..
The 2nd switch (which was still on 5.0) had it enabled.. this might cause a type 1 inconsistency , and would it bring down the VPCs on the 2nd switch ? We actually didnt lose the VPC's (according to the logs) , but lost connectivity due to Spanning tree disputes..
Any thoughts ?
From what exact version to what exact version you did an upgrade? I will test it in lab today and will let you know. If possible attach the running configs with it please
We upgraded from 5.0.3.N2.1 to 7.0.1.N1.1
im not sure if i can upload the running config, but its a standard config with VLANs, fiber channels, VSANs etc...
We are hitting CSCul22703 for FEX HIFs losing config
We solved this issue. Kind of complicated, but we got a workaround. For anyone upgrading to 7.0 - they should keep this in mind - because during ISSU if one switch is 7.0 and the other is anything else - like 5.0, 5.2, 6.0 etc - we will have this issue, and the total Zone might be isolated.
Firstly - config missing - is caused by the bug CSCul22703.
Workaround is to upgrade 5.0 to 5.2(1)N1(6), and then to 7.0(0)N1(1)
secondly - STP disputes and losing connectivity to anything connected in 2nd switch
This was due to the Bug # CSCuo74024. Because of this bug – the STP BPDU’s weren’t forwarded from 2nd switch to 1st switch since there was an issue with BPDU’s over LACP Hashed interfaces. The STP instance blocked layer 2 connection to all VLANs on the 2nd switch since N5K was trying to become the root for that VLAN (which is how it should work). During this time only the IP connectivity was disrupted.
Workaround for this problem – was to shut down one of the links in LACP bundle from AGGR to N5K to make sure the BPDU’s don’t go over the Hashed LACP link
By doing this, we didnt lose packets going to 2nd N5K , after the first is upgraded to V7.0.
We had a few other bugs which we encountered during this upgrade:
We strangely had a kernel panic on the first switch when downgrading from 7.0 to 5.0 . It just happened once. We ran downgrades many times ,but couldn’t recreate this issue. With this kernel panic SFP microcode was lost (and defaulted to V0.0.0.0). Due to this all the SFP’s in module 1 went down (vpc peer link, uplinks, FEX links etc). It was due to the bug #CSCuo46284.
That´s a lot of bugs... doesn´t really give me much confidence, since we´re planning for the similar upgrade, from 6.0(2)N1(2) to 7.0(5)n1(1).
I´m thinking of actually leaving the version 6, since we´re only planning to upgrade cause of the MTU statistics problem on the interfaces...
I am less than impressed with the whole nexus upgrade process and having to do multiple upgrade hops to get to the final destination. We saw the exact same type of problems on 5000's when we tried to go from 5 to 7 ... Don't let them tell you its nondisuptive , plan for disruption . You never had these worries when upgrading IOS boxes. This is the latest and greatest ?????
STP disputes are caused when the switches at the two ends of the segment are not agreeing to their STP state.
If the Port Receive state machine receives an inferior RSTP/MST BPDU from a Port that believes itself to be a Designated Port and is Learning or Forwarding it will set disputed , causing this state machine to transition a Designated Port to Discarding.
I would suggest you to troubleshoot STP on that port channel.
We were doing NXOS upgrades on the 1st switch, from 5.0 to 7.0 -> and when the switch came up - we lost all FEX & FC configs.. with OS 7.0 bridge assurance is disabled by default , while we had the other switch in VPC (with 5.0 NXOS) where bridge assurance was enabled...
Now - due to the downtime for servers, we had to quickly reboot and upgrade the 2nd switch to 7.0 - which also lost configs ... and overall caused a good amount of downtime..
Now - we are doing post mortem, and seeing what could have gone wrong .. if the 1st switch lost config, we didnt think the servers connected to 2nd switch will also lose connectivity (thinking we have all redundancy and other fancy stuffs like vpc, dual fabric, dual SAN etc etc)...
There are 2 big flaws here which is kind of unacceptable -> 1) configurations lost because of IOS upgrades (also the fiber channel ports are converted back to ethernet !!!!! )
2) 2nd switch's IP connectivity also went down - due to the complicated way VPC, spanning tree, bridge assurance, etc works...
Sometimes I feel - we were better off with 6500's :)