05-13-2015 01:26 AM
Hello all.
I'm running an ASR 9010 with XR 5.1.2. I have many iBGP sessions running, and one existing eBGP session running. I am trying to add a second eBGP session. It never comes up, and when I "debug bgp progress" on the session, I get this output:
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-ioct]: 10.15.30.50 went from Idle to Active
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-ioct]: 10.15.30.50 went from Active to OpenSent
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: 10.15.30.50 send message type 1, length (incl. header) 63
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: Send message dump for 10.15.30.50:
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: ffff ffff ffff ffff ffff ffff ffff ffff
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: 003f 0104 166a 00b4 4011 7962 2202 0601
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: 0400 0100 0102 0280 0002 0202 0002 0641
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-iowt]: 0400 0016 6a02 0840 0600 7800 0101 00
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-ioct]: Sending OPEN to 10.15.30.50, version 4, my as: 1779826688, holdtime 180 seconds
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-ioct]: 10.15.30.50 went from OpenSent to Closing
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-event]: 10.15.30.50 went from Closing to Idle
RP/0/RSP0/CPU0:May 13 04:08:30 EDT: bgp[1054]: [default-event]: 10.15.30.50 reset due to Peer closing down the session
I haven't decoded the message dump, but my AS number is not 1779826688. My AS is 5738, so this is nothing close.
Does anyone know what is happening?
ERM
05-13-2015 08:13 AM
hi evan, this is indeed a bit funky, can you share the config for this please and a debug bgp event?
it should tell us why it goes from open to closed.
possibly also from the peer if possible.
cheers!
xander
05-13-2015 12:52 PM
It will take a bit of coordinating to get the debug gathered. I'll work on it. There's nothing to get from the peer; he looked.
Configuration bits:
router bgp 5738
neighbor 10.15.30.50
remote-as 26682
description Customer -- gmc
address-family ipv4 unicast
send-community-ebgp
route-policy bgp-customer-in(as26682-routes, empty-set) in
maximum-prefix 10 75
route-policy bgp-customer-user-routes out
soft-reconfiguration inbound always
!
!
!
route-policy bgp-customer-in($nlriset, $deaggset)
if destination in decent-routes and destination in $nlriset then
delete community in communities-forbidden-customers
if community matches-any (5738:666) then
set next-hop 192.0.2.1
endif
if not destination in $deaggset then
delete community in (5738:35)
endif
if community matches-any (5738:120) then
set local-preference 120
elseif community matches-any (5738:110) then
set local-preference 110
elseif community matches-any (5738:100) then
set local-preference 100
elseif community matches-any (5738:90) then
set local-preference 90
elseif community matches-any (5738:80) then
set local-preference 80
elseif community matches-any (5738:70) then
set local-preference 70
else
pass
endif
endif
end-policy
!
route-policy bgp-customer-user-routes
if community matches-any bgp-customer or community matches-any static then
pass
endif
apply bgp-customer-core-routes
end-policy
!
prefix-set as26682-routes
10.252.222.0/24,
10.114.150.128/26
end-set
ERM
05-13-2015 02:04 PM
hey evan, ah thanks for that. yeah the config on your end looks sane.
the OPEN message looks good, I think the myAS value is just a misrepresentation.
your AS in hex is 166a which is indeed set properly in the BGP message:
003f 0104 166a
You're signaling bgPv4 (duh :) and your AS correct also.
the peer is closing us down, this could be because of mismatched capabilities, hold time, wrong AS number (peer's remote-AS configured incorrectly), but other then that I dont see anything wrong as yet on this open msg.
cheers!
xander
05-14-2015 07:18 AM
I ran the "debug bgp event" and gathered no insight. I don't have a packet capture off the wire, but it really looks like the ASR emits the OPEN message, and then the peer closes the TCP connection.
I decoded the OPEN message, and two items caught my eye:
(1) I've got "soft-reconfig in always" defined, but I'm still announcing route-refresh capabilities. I thought that was suppressed in this case. (see CSCtz06668.)
(2) I'm announcing both old and new route-refresh capabilities. I suspect the old style refresh option may be causing the peer to drop the session. Is there a way to configure the session (or entire router) to use only standard route-refresh capability options?
Thanks.
ERM
05-14-2015 07:34 AM
yeah that is XR behavior that if softreconfig is set for, we still open with the route-refresh capability. if you configure soft reconfig always, then you will not send Route-refresh:
https://supportforums.cisco.com/document/96161/route-refresh-soft-reconfiguration-inbound-ios-xr
see if that "always" helps for you.
yeah since the peer is closing the session we would need to see what he doesnt like. route refresh is not a mandatory option, and we just advertise the old and new capability so that the peer can decide whether he likes that or not.
what could be an issue is that if the peer has for instance ipv4 and vpnv4 AF's configured, and we only have ipv4, if that mismatches it won't open, may want to check that also.
a config of the peer or a debug there will likely help us on this.
cheers!
xander
05-14-2015 07:49 AM
wanted to add, on the ddts you referenced CSCtz06668 it fixes an issue with the acceptance and nego of the old vs new route refresh. some peers indeed dont like old style and may close a session (which is not correct). but either way if you have 423 or 430 onwards you have this fix in place.
the soft reconfig always will mitigate the route-refresh message/option all together by not sending it.
xander
05-14-2015 10:36 AM
Notice that I do have "soft-reconfig in always" configured, yet I'm still sending both (old and new) route-refresh options. This is on 5.1.2. Did something get unfixed?
ERM
05-14-2015 12:21 PM
hi evan, I overlooked that, ok yeah that wont help. I checked with one of our most sr BGP folks on this more and it seems that there are some cases whereby the route refresh is overruled eventhough always is configured. in your case it is the max-prefix config possibly.
In order to identify the issue in this case I think the bgp debugs from the peer and config would be needed to sort this out. this peer just gets the close and it doesnt provide a reason necessarily.
you could try to see if show bgp neigh <addr> detail gives further clue but since we didnt receive anything from this peer except for a close I dont think we have tracked or know any of the peers capabilities yet.
cheers
xander
05-14-2015 01:22 PM
I got some more detail from Jakob, the BGP master I was discussing this with earlier and he had some other suggestions that I wanted to pass on:
Looking more, it said peer closed the session after we sent open.
Therefore, the peer accepted the TCP connection.
Depending on the vendor of the other side, he may or may not have configured this neighbor.
He does not have update source in his config, so we don’t know what his source IP is.
XR won’t even accept the TCP if the IP address is wrong, but redback will accept and closes it later.
Normally if it doesn’t like the OPEN, it would send a NOTIFICATION, but it didn’t. It just closed.
Maybe they have a password set on the other side.
See if that holds any solution?!
cheers!
xander
05-18-2015 07:35 AM
The other side of this connection is a Barracuda Networks NG firewall. This is a connection that has been up and running fine (with BGP) on a 7606 (running IOS 12.2(33)SRE7a) for months.
A debug from the 7600 decodes the OPEN message being sent by the Barracuda, and the difference between the two is actually the graceful-restart capability. There's a "graceful-restart disable" option in neighbor configuration on XR, but it doesn't seem to work. Do you have any further information about it? Do you now any other ways to keep XR from offering graceful-restart to a particular neighbor?
ERM
05-19-2015 11:02 AM
hi evan,
I investigated this some more and while I am not from the RFC police, it needs to be noted that the baracuda is not really following the correct implementation when it comes to the capability negotiation. Assuming you are correct in the assessment that it is the GR causing the peering establishment to fail, I could make you an engineering smu to test out that theory but I would need to know the current version of XR you're running so we can test that out. If that is indeed the culprit then we would need to go with a tac case and a ddts to productize that change.
IOS and XR send their capabilities differently, while still in spec, hence you see this difference in behavior. I would also want to recommend connecting with the baracuda folks and have them implement the the right fix for them also for the longer term approach.
Another thing to try is to enable the NSR functionality on the XR side to negate teh GR advertisement to the peer.
cheers
xander
05-19-2015 01:07 PM
NSR?? That's an idea.
I have understood GR to be the preferred architecture when it is supported. Am I wrong in that belief?
I would appreciate a SMU to verify the behavior. What would the SMU do? I'm running XR 5.1.2 with SP1.
ERM
05-20-2015 06:03 AM
hi evan, the attachment to this reply is a 513 smu for bgp that will skip sending the GR capability in the open message.
it is build on top of the smu lineup, which means that it inherited all previous smu's in BGP also. (it is basically a bgp component revision).
I had to zip it, otherwise the forums wont take the file, so when downloaded you need to unzip and admin install add blabla activate.
I couldnt build it on 512 because I dont have the precise lineup for that available at this minute and since you're having SP1, it would mean that the smu I build for 512 basically negates all the fixes in SP1.
Hopefully you have a test device that you can bring up to 513 to test this smu?
cheers
xander
05-20-2015 06:21 AM
wanted to add one more thing. what I see in teh code base for BGP io/open message is that the graceful restart option is unset when bgp GR is globally enabled for BGP, but disabled for the neighbor. So this config sequence may also omit it from the open message:
router bgp <AS>
bgp graceful-restart
neighbor <addr>
graceful-restart disable
commit
xander
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: