... buggy IOS NTP implementation ?

Unanswered Question
Dec 5th, 2009
User Badges:

... or dumb newbie (the most probable) ?


s.s.s.s are public-servers


e.e.e.1 is edge-router#1 connected to ISP#1
synched with three external public ntp servers: ie: group 1
also peering with e.e.e.2; in case it looses the connection to ISP#1
calendar-valid master 3; in case it looses both the connection to ISP#1 and the connection to edge-router#2


e.e.e.2 is edge-router#2 connected to ISP#2
synched with three external public ntp servers: ie: group 2
also peering with e.e.e.1; in case it looses the connection to ISP#2
calendar-valid master 3; in case it looses both the connection to ISP#2 and the connection to edge-router#1


f.f.f.f is a firewall between edge and core
synched with both edge routers


c.c.c.c is core-router connected to both edge routers
synched with both edge routers
calendar-valid master 4; in case it looses both connections to the edge routers


... so, as can be seen here NTP is configured in hierarchical mode
... and all routers SHOULD stay synched no matter if any (or ALL) the connections come down right ?
... well, it never happens this way:


• if I power down power up all the gear everything synchs as expected (and stays in synch); however,


• if (then) I shut down ISP#1 on edge-router#1 it re-synchs with edge-router#2 as expected (same for the other one)


• if (then) I choose to shut down the remaining ISP#2  (or if I shut down both edge routers) and after, say, 10 minutes I powered them up again,
the edge routers with both ISPes down sometimes fallback to their own master 3 calendars, but more often completely unsynchs
the core router unsynchs from them (expected if they are unsynched) but in the meantime it will NOT fallback as master 4 using its own valid calendar
and more important, it will NOT resync with the edge routers afterwards, and I meaning given it plenty of time:


core#show ntp associations (eg from last Wednesday) ... not synched for 21 hours
      address         ref clock     st  when  poll reach  delay  offset    disp
~e.e.e.2        s.s.s.s      2   12h    64    0     2.6  -18.11  16000.
~127.127.7.1      127.127.7.1       3     -    64    0     0.0    0.00  16000.
~e.e.e.1        s.s.s.s       2   21h    64    0     2.6  -18.93  16000.
* master (synced), # master (unsynced), + selected, - candidate, ~ configured


core#show ntp associations (eg from last Friday) ... not synched for 1 day
      address         ref clock     st  when  poll reach  delay  offset    disp
~e.e.e.2        s.s.s.s         3   17h    64    0     2.5   -4.30  16000.
~127.127.7.1      127.127.7.1       3     -    64    0     0.0    0.00  16000.
~e.e.e.1        s.s.s.s       2    1d    64    0     2.6    0.59  16000.
* master (synced), # master (unsynced), + selected, - candidate, ~ configured


... but if I reload core it synchs at once with the edge:


core#show ntp associations
      address         ref clock     st  when  poll reach  delay  offset    disp
+~e.e.e.2        s.s.s.s      2    22    64    1     3.5   31.82  15875.
~127.127.7.1      127.127.7.1       3     -    64    0     0.0    0.00  16000.
*~e.e.e.1        s.s.s.s       2    34    64    1     2.6   -1.60  15875.
* master (synced), # master (unsynced), + selected, - candidate, ~ configured


... in all situations if I enable debug ntp packets (which I have done a lot) I can see the ntp requests and answers as expected
... even when core refuses to resynch with edge (for whatever unknown reasons) the RCV packets from both edge routers
... have the appropiate reference clocks to the external servers
... there are no problems with network communications, no transmits without receives and the like, etc


... so, as I see it, and (a big AND) if I am doing the things right:


• NTP implementation on IOS is either buggy (at least in some cases like the one described)
• NTP implementation on IOS master command does not work as expected
• NTP implementation on IOS has problems resynching in some situations
• NTP implementation on PIX works as expected; the firewall NEVER looses synch with edge


... but YOU are the experts ... am I right or am I missing something ?




... all routers are C1841s running c1841-advsecurityk9-mz.124-15.T11
... firewall is PIX515 running pix804


... following are detailed related commands for edge-router#1:


clock calendar-valid
ntp master 3
ntp update-calendar
ntp peer e.e.e.2
ntp server s.s.s.11
ntp server s.s.s.12
ntp server s.s.s.13
ntp access-group serve-only 3
ntp access-group peer 2


access-list 2 remark standard access list: the following hosts will be allowed to update the local NTP server service:
access-list 2 permit e.e.e.2
access-list 2 permit s.s.s.11
access-list 2 permit s.s.s.12
access-list 2 permit s.s.s.13


access-list 3 remark standard access list: the following hosts will be allowed to sync to the local NTP server service:
access-list 3 permit c.c.c.c


... following are detailed related commands for edge-router#2:


clock calendar-valid
ntp master 3
ntp update-calendar
ntp peer e.e.e.1
ntp server s.s.s.21
ntp server s.s.s.22
ntp server s.s.s.23
ntp access-group serve-only 3
ntp access-group peer 2


access-list 2 remark standard access list: the following hosts will be allowed to update the local NTP server service:
access-list 2 permit e.e.e.1
access-list 2 permit s.s.s.21
access-list 2 permit s.s.s.22
access-list 2 permit s.s.s.23


access-list 3 remark standard access list: the following hosts will be allowed to sync to the local NTP server service:
access-list 3 permit c.c.c.c


... following are detailed related commands detailed for core-router:


clock calendar-valid
ntp master 4
ntp update-calendar
ntp server e.e.e.1
ntp server e.e.e.2
ntp access-group serve-only 3
ntp access-group peer 2


access-list 2 remark standard access list: the following hosts will be allowed to update the local NTP server service:
access-list 2 permit e.e.e.1
access-list 2 permit e.e.e.2


access-list 3 remark standard access list: the following hosts will be allowed to sync to the local NTP server service:
access-list 3 permit ... not relevant: clients are Windows AD domain servers and the like


... following are detailed related commands for the firewall:


ntp server e.e.e.1
ntp server e.e.e.2

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
lpeet Fri, 01/08/2010 - 12:55
User Badges:

The way I'm reading this, (and if I've got this right) basically your configuration is good, but get rid of:



!--- Where X is the stratum

ntp master X



Basically, the 'ntp master X' line is only to be used for private networks where no external NTP server or other hardware (atomic) clock is available.


And keep:



!--- If no other time-source is available, use calendar as authoritative

clock calendar-valid



Using only the 'clock calendar-valid' command, if a router loses connectivity to its time-source, it will accept its own calendar as authoritative and advertise that across the network to other NTP clients.  When it re-gains connectivity to the external time-source, it will re-sync its clock, update its calendar (which again makes it authoritative), and continue advertising the time to NTP clients.


Also of note: the 'ntp master' command has no effect until the router is rebooted.  So when you turned off your edge routers, when they came back on, they were set as 'ntp master', ignoring the time-source you had them configured to sync to.  The key to this is that the actual stratum of the router is one less than the configured stratum set with the 'ntp master' command.  In your configuration, you used 'ntp master 3', for your edge routers which effectively set their master clock to stratum 2.  Since the NTP source you were trying to sync with was also probably a stratum 2 server, it kept its sync with it's local clock, and did not sync with the external time-source.


The default stratum set for the 'ntp master' command is stratum 8, which sets the router's clock to stratum 7.  This is recommend (or possibly setting it even lower to 10), since the lower the stratum, the more accurate the time, and if your router isn't able to reach an external server, it's relying on its own clock to be accurate - which it may or may not be.  This should have prevented the scenario you saw, since you were using such low (stratum 2, 3) stratums.  You would also be able to tell just by looking at the stratum of the routers that the the time was not in sync with the external time-sources, because of the much higher stratum levels.


Hope this helps.


-Lucas

Actions

This Discussion

Related Content