cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6997
Views
5
Helpful
2
Comments
Zach Seils
Level 7
Level 7

Communication between the various nodes and processes in the ACI Fabric uses Inter-Fabric Messaging (IFM).  IFM uses SSL-encrypted TCP communication.  Each APIC and fabric node has 1024-bit SSL keys embedded in secure storage.  The SSL certificates are signed by Cisco Manufacturing Certificate Authority (CMCA).

Problems with IFM SSL communication can prevent fabric nodes from successfully joining the fabric.  Some of the symptoms include:

  • Fabric Node in Inactive State - As shown in the following output, the fabric node spine-1-B is seen by the APIC, but is in an inactive state.
admin@APIC-B-1:~> acidiag fnvread
      ID             Name    Serial Number         IP Address    Role        State   LastUpdMsgId
-------------------------------------------------------------------------------------------------
     101         leaf-1-B      SAL00000001    10.0.117.160/32    leaf       active   0
     102         leaf-2-B      SAL00000002    10.0.117.191/32    leaf       active   0
     161        spine-1-B      FGE00000003    10.0.117.190/32   spine     inactive   0x100

Total 3 nodes

admin@APIC-B-1:~>

 

  • Fabric Node Stuck in Discovery - When accessing the inactive node through the CLI, the following message appears:
User Access Verification
spine-1-B login: admin
********************************************************************************
     Fabric discovery in progress, show commands are not fully functional
     Logout and Login after discovery to continue to use show commands.
********************************************************************************
spine-1-B# 

 

  • Lack of IFM Connections Established - Normally there are several IFM connections (on TCP ports between 12000 - 13000) established between the various nodes.  In this case, there are none:
spine-1-B# netstat -ant | grep :12
tcp        0      0 10.0.117.190:12439      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12119      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12887      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12151      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12407      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12183      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12440      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12120      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12888      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12152      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12408      0.0.0.0:*               LISTEN     
tcp        0      0 10.0.117.190:12184      0.0.0.0:*               LISTEN     
spine-1-B# 

 

  • Logs Directory Empty - The /var/log/dme/log directory on the fabric node is empty.  While the fabric node is in this state, you can access the logs in /tmps/logs.

Problems with IFM SSL communication are logged in the following logs:

  • Fabric Node - /var/log/dme/log/svc_ifc_policyelem.log
  • APIC - /var/log/dme/log/svc_ifc_appliancedirector.bin.log

For example, looking at the logs on the fabric node shows:

spine-1-B# tail -f svc_ifc_policyelem.log | grep SSL
3910||14-06-19 21:40:11.748+00:00||ifm||DBG4||co=ifm||openssl error during SSL_accept()||../dme/common/src/ifm/./IFMSSL.cc||173
3910||14-06-19 21:40:11.748+00:00||ifm||DBG4||co=ifm||openssl: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca||../dme/common/src/ifm/./IFMSSL.cc||178

 

Likewise if we check the logs on the APIC shows:

admin@APIC-B-1:log> tail -f svc_ifc_appliancedirector.bin.log | grep SSL
32509||14-06-19 06:42:21.707+00:00||ifm||DBG4||co=ifm||openssl error during SSL_connect()||../common/src/ifm/./IFMSSL.cc||173
32509||14-06-19 06:42:21.707+00:00||ifm||DBG4||co=ifm||openssl: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed||../common/src/ifm/./IFMSSL.cc||178

 

In this example there is a problem with the SSL certificate verification.  You can view the details of the SSL communication by using the openssl command from the APIC CLI.  First we'll check the APIC itself by using openssl to connect internally:

admin@APIC-B-1:~> openssl s_client -state -connect 10.0.0.1:12151
CONNECTED(00000003)

< output removed for brevity>

---
Certificate chain
 0 s:/C=US/ST=CA/L=SanJose/O=Insieme Networks/CN=Insieme
   i:/C=XX/L=Default City/O=Default Company Ltd
 1 s:/C=XX/L=Default City/O=Default Company Ltd
   i:/C=XX/L=Default City/O=Default Company Ltd
---
Server certificate
-----BEGIN CERTIFICATE-----
MIICkzCCAXsCCQCGN3jYiLGTgzANBgkqhkiG9w0BAQUFADBCMQswCQYDVQQGEwJY
WDEVMBMGA1UEBwwMRGVmYXVsdCBDaXR5MRwwGgYDVQQKDBNEZWZhdWx0IENvbXBh
bnkgTHRkMB4XDTEzMDcxOTIwNDAzMloXDTE0MDcxOTIwNDAzMlowWTELMAkGA1UE
BhMCVVMxCzAJBgNVBAgMAkNBMRAwDgYDVQQHDAdTYW5Kb3NlMRkwFwYDVQQKDBBJ
bnNpZW1lIE5ldHdvcmtzMRAwDgYDVQQDDAdJbnNpZW1lMIGfMA0GCSqGSIb3DQEB
AQUAA4GNADCBiQKBgQC8y/ZR+pPVKQs69rc9t3/BRq+rLsYjAqhy6lz4rIxBO1sW
2eRXXX8BgvzDeeHqEjQiyjHvTUpybUDZ7ES1tF6sv3lsMgJAJiAuG3ecfWarh7qf
6vKx8U6H/shZDjQP4kCv70oMm1gdC3yonuLwJhwBGK0McHEifmYcSucSUUSavQID
AQABMA0GCSqGSIb3DQEBBQUAA4IBAQA82c2iAKwbpdgxk7PxqVRnREh5C2E5xEO2
ermIPsNjSXtQwEnjmvts003KPzkx7ND1SLatw39fGL5ZksSOjmKc7a303rDqh2qq
nAvjyIpEshBzJxXJdhZyPI6L4j6a+0yMuXYpGMlezfFrRFImrfFs26+W4bcdL+x2
Yo5Ez7KEauvim2yj7TSERJB3s+r3nRUCCGfquPIBfPWrXzbYzj2rXv/QPoR8W/xk
A667jKPHtC7SAzIKWZjyiRwi1yVs4vDkyMyFxCurMiiF3A/gTjx/097L0JYX/MJK
7lkuIod1WG3sKtQbbc+Uh6GdZkHQghAIHukDhzyckPhvqLxmWj48
-----END CERTIFICATE-----
subject=/C=US/ST=CA/L=SanJose/O=Insieme Networks/CN=Insieme
issuer=/C=XX/L=Default City/O=Default Company Ltd

< output removed for brevity>

 In the output above, openssl is used to connect to a listening TCP port using IFM (12151) locally on the same device where the command is being run.  The specific TCP port used isn't important, so long as it is associated with a DME server using IFM SSL (use 'netstat -an | grep :12' to find one).

The output shows that the certificates used by the APIC are not valid CMCA certificates.  Checking the fabric node that is in an inactive state shows:

admin@APIC-B-1:~> openssl s_client -state -connect spine-1-B:12440
CONNECTED(00000003)

<output removed for brevity>

---
Certificate chain
 0 s:/serialNumber=PID:N9K-C9508 SN:FGE00000003/CN=FGE00000003
   i:/O=Cisco Systems/CN=Cisco Manufacturing CA
 1 s:/O=Cisco Systems/CN=Cisco Manufacturing CA
   i:/O=Cisco Systems/CN=Cisco Root CA 2048
 2 s:/O=Cisco Systems/CN=Cisco Root CA 2048
   i:/O=Cisco Systems/CN=Cisco Root CA 2048
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDuzCCAqOgAwIBAgIKGRJjSAAAAAbyczANBgkqhkiG9w0BAQUFADA5MRYwFAYD
VQQKEw1DaXNjbyBTeXN0ZW1zMR8wHQYDVQQDExZDaXNjbyBNYW51ZmFjdHVyaW5n
IENBMB4XDTE0MDUyNjE0MTg0OVoXDTI0MDUyNjE0Mjg0OVowPTElMCMGA1UEBRMc
UElEOk45Sy1DOTUwOCBTTjpGR0UxODE4MEFDWTEUMBIGA1UEAxMLRkdFMTgxODBB
Q1kwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAM8HvlkZVsNPMZ9mwBuiM5yD
YNIlf4Qwm/J01Z2zzPaSFFy4nZweqq0OVWPOgEeadKPShLLsJOLnRPxVYFZLyOU2
gmCmrch3HcGJks0YrLW+s7XbeAzx6JDggumoO03C64eKEy9HdSdzc2VM3j5wVBiF
8+de253WptdRzd6Nj+KPAgMBAAGjggFDMIIBPzAOBgNVHQ8BAf8EBAMCBPAwHQYD
VR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMB0GA1UdDgQWBBTWbUbPa7vySop5
y9capeR1+OgvFjAfBgNVHSMEGDAWgBTQxSImq09GYOyuBZHH3FrRsEf3bDA/BgNV
HR8EODA2MDSgMqAwhi5odHRwOi8vd3d3LmNpc2NvLmNvbS9zZWN1cml0eS9wa2kv
Y3JsL2NtY2EuY3JsMEwGCCsGAQUFBwEBBEAwPjA8BggrBgEFBQcwAoYwaHR0cDov
L3d3dy5jaXNjby5jb20vc2VjdXJpdHkvcGtpL2NlcnRzL2NtY2EuY2VyMD8GCSsG
AQQBgjcUAgQyHjAASQBQAFMARQBDAEkAbgB0AGUAcgBtAGUAZABpAGEAdABlAE8A
ZgBmAGwAaQBuAGUwDQYJKoZIhvcNAQEFBQADggEBAGrKs3J8j8/cNXk9INw2nD/J
74Am55POxP4+ZSDS7v+6U8pRvykQIw8YuPoTWOXzd4JGfli838poVBjhNoMg10Vx
FKZ+UUSH6mYQr6+COAKo7KRVgsRUncla5m+ibf+vJNGKyO3tIcpYPLUPdCUo5u95
QJ0Vs8wYHcXbLkVrDKDm0J6S6oKJg4QE6DLqlXHNArHIcMUwlEtB0TdTaJKgSD7Y
4gOTsSuuFExbtfhbFRooJCOPCR+kwyHR0zQjOvAcDtWNtiKomW5Nij2sWQO+nAJJ
8OarjXW6459k25CYumaF8LXCEdrFGWnmN5iICF4ngDl37wT0mxS8v/gjksAdrjk=
-----END CERTIFICATE-----
subject=/serialNumber=PID:N9K-C9508 SN:FGE00000003/CN=FGE00000003
issuer=/O=Cisco Systems/CN=Cisco Manufacturing CA

<output removed for brevity>

In this output, the inactive fabric node is using certificates signed by the Cisco Manufacturing Certificate Authority, which is expected.  Certificates must meet the following criteria:

  • Valid start-date and end-date
  • System time should be between the certificate start-date and end-date
  • Should be signed by CMCA

In this case there are mismatched certificates between the APIC and fabric node that is inactive.  Incidentally, since there are other fabric nodes properly registered with the fabric in this example, those nodes are also using invalid certificates.  The invalid certificates need to be updated with valid CMCA certificates.

Comments
liguoriariel
Level 1
Level 1

Hi facing same issue in some leafs, 

working ones: Certificate chain
0 s:/serialNumber=PID:N9K-C9396PX SN:SAL18474R7Q/CN=SAL18474R7Q
i:/O=Cisco Systems/CN=Cisco Manufacturing CA
1 s:/O=Cisco Systems/CN=Cisco Manufacturing CA
i:/O=Cisco Systems/CN=Cisco Root CA 2048
2 s:/O=Cisco Systems/CN=Cisco Root CA 2048
i:/O=Cisco Systems/CN=Cisco Root CA 2048


and failed ones:
Certificate chain
0 s:/C=US/ST=CA/L=SanJose/O=Insieme Networks/CN=Insieme
i:/C=XX/L=Default City/O=Default Company Ltd

How can we solve this? Do we need to contact TAC? or any way to solve, runninf 1.2.1m

Tomas de Leon
Cisco Employee
Cisco Employee

You need to open a Cisco TAC Case so the ACI TAC engineer can use this information to generate the new CERT files for your switches in this condition.

Thank you for using the Cisco Support Forum for ACI.

T.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: