Just trying to get to the bottom of what this requires to fix. I understand what it''s telling me and I was just going to reset the CIMC but on investigation I am a little confused..
It states in the early paragraphs that once the memory is degraded it will no longer get re-evaluated until changed even if you perform a CIMC reset - but then later states that you can indeed force re-evaluation by resetting the CIMC ??
So are they saying that once you see this error the threshold has been reached and you need new RAM as the current RAM has performed below expectations - or reset the CIMC and see if it breaches the threshold again (as it has been re-set) ?
My worry is that I reset the CIMC but the ECC threshold is no longer being evaluated and the DIMM fails fully.
Just to clarify: reset memory errors does not equal reset CIMC.
Actually resetting CIMC should never be done to clear DIMM errors - doing so is equivalent to sweeping a potential problem under the rug and has a side effect of deleting files in CIMC that may be helpful in investigating the cause of the error.
In 1.3 and earlier firmware resetting CIMC was the easiest way to get UCSM to re-evaluate the DIMM status based on what it was seeing from CIMC (another more impacting method would be to decommision and reack the blade). For errors that do not occur frequently this could result in the DIMM status being reset to operable in UCSM without much impact on the operation of the system but if the error returned what have you accomplished?
This behavior changed in 1.4 firmware and later. In 1.4 and later resetting CIMC has no affect on the DIMM status in UCSM. Once a DIMM goes degraded or inoperable the only way to clear that state in UCSM is to change the FRU information on the DIMM (i.e. replace it), decommision and reack the server (i.e. the server starts over from scratch) or use the reset memory errors functionality.
Reset memory errors was added to 1.4 and later firmware because in 1.3 firmware, UCSM essentially ignored correctable errors. During testing of upgrades from 1.3 to 1.4 it was found that if a system had many correctable errors that occurred long ago, once UCSM was upgraded it would suddenly see all those historical correctable errors as new ones and set the DIMM status to degraded. Reset errors was added to clear that specific condition as well as clear any other false positive DIMM degraded or inoperable status. Use of reset errors outside of this context is similar to resetting CIMC - sweeping a potential problem under the rug.
Regarding your specific problem - if the number of correctable errors continues to increase then yes, the recommended course of action would be as you suggest - replace the DIMM.
Topology & Design:
Two ACI fabrics
Stretching VLANs using OTV
Both fabrics are advertising BD subnets into same routing domain
Some BDs(or say VLANs) are stretched, but some are not.
Endpoints can move betwee...
VMware Trunk Port Group is supported from ACI version 2.1
VMM integration must be configured properly
ASA device package must be uploaded to APIC
ASAv version must be compatible with ACI and device package version
Topology &Design:Traffic flow within same fabric:Endpoint moves to Fabric-2Bounce Entry Times OutTraffic Black-holedSummarySolutionAppendix:
In the Previous articles of ACI Automation, we are using Postman/Newman a...