I feel slightly embarrassed to be asking this but I have to know.
I have a project where the project team has asked me to build a flat network. There will be approximately 10,000 hosts.
Half of those 10,000 hosts will be very low traffic (former 9600 baud serial connections migrated to ethernet). I have NO ability to run packet captures since this is a "proposed" solution.
I am trying to convince the team that even low traffic ethernet interfaces are going create much more arp/dns/whatever traffic than they are thinking they will. The chatter will be enormous.
Can anyone help me with the argument? I have already sent an "CYA" email to my boss saying that I fully expect this network to crash.
The topology will be such that I have about 13 sites spread around a single mode fiber optic ring.
My problem is in convincing this team that the arp traffic alone on a 10,000 node network is going to create a ton of latency for the entire network.
A single subnet with 10000 devices . Don't even know how to answer that , it goes completely against all known network design principals . Most people won't go above 512 users in a single subnet to say nothing about 10000.
If its flat all it would take is a single person connecting 2 switchports together and it could have the potential to take down the whole thing whereas if it was segragated to say a /24 the problem would be relegated to those 254 users and not 10000 .
Don't know if there's anything "concrete" because so much depends on actual network usage which is why one size doesn't fit all. Yet, the issue (which you seem to realize) is going to be Ethernet broadcast traffic (something different from the prior 9600 bps serial connection setup).
A 10,000 node network is more likely to collapse than a smaller network due to handling broadcasts. For instance, when one station ARPs for other station, every other node will see and process the broadcast (although for all but perhaps one station, they will discard the ARP request assuming its not their address).
"have a project where the project team has asked me to build a flat network. There will be approximately 10,000 hosts"
There's the first problem :-). Project teams should be supplying you with a list of requirements and not designing the network for you (or do they have a network designer within the project).
You make no mention of why they want this. That should be the first thing to ask because it may well be that their network understanding is limited and they believe the only way to achieve what they want is with a flat network. Quite often it turns out that projects request certain things because they do not know of other ways of achieving the same thing eg.
i have seen requests for a flat network because DHCP uses broadcasts and therefore it must be L2. Once ip helper-addresses had been explained to them they were fine with it. Not saying it will be that easy but it's worth investigating.
Thanks for the reply.
First, they want this because this is how they have always done it. However, they have never had an implementation this big before.
Second, they want this because, after I explained how we should do this, they said they didn't have any money to buy the two routers I recommended for the installation.
Third, I am a new hire to this company. They have never had a network engineer leading them on projects before. They have always just bought a bunch of switches, plugged stuff in, and away they went.
In this instance, there is no DHCP going on. All the IPs will be static. Most communications will happen between locations that are next to each other, ie, there wont be traffic going all the way around the fiber loop to chat. It will stay within one hop of each control station. This is partly what has convinced them that they can do this.
"Most communications will happen between locations that are next to each other, ie, there wont be traffic going all the way around the fiber loop to chat. It will stay within one hop of each control station. This is partly what has convinced them that they can do this."
As both you and the others have said though things such as arp etc. will go all the way round.
I appreciate it is difficult when you are new at a company to try and tell them that the way they are doing things and have been doing things is just not very sensible. A flat network of that size is just an accident waiting to happen.
How important is the data to them ?
What is the cost involved (if any) if the network is unavailable.
Are there any safety aspects involved if data is lost ?
Budget is indeed one of the main factors in design decisions but the cost of some routers or layer 3 switches against the potential cost of data loss, network unavailability may suddenly become trivial.
Only you can really decide where you draw the line. Perhaps you could ask them why they hired a network engineer if they weren't going to take his advice but maybe not.
If you point all of this out and they still refuse to listen then although this is a crappy thing to say, make sure your objections are recorded somewhere because when the network breaks, and there is a good chance it will, the first person in the firing line will be the network engineer.
your requirement is not clear. Are you looking for a solid factor which you will place before your projects team so that they will be convienced to purchase two routers or create L3 VLANs in this network?
That is exactly it.
The question is how can I prove to them the risks without just saying, "I have been building networks for 10 years and there is no way anyone in their right mind would design something like this. It will fail."
I don't have the ability to sniff a "pilot" of this project. They are simply convinced that because all the traffic is coming from old serial connection that have been converted to IP, the traffic generated will be negligable and therefore can be ignored.
Some work arounds may be to increase the TTL of all the arps done on all the switches an workstations...I feel dirty even saying that!
"The question is how can I prove to them the risks without just saying, "I have been building networks for 10 years and there is no way anyone in their right mind would design something like this. It will fail."
You can't. And it may well work after a fashion but it is not good design which means it is more susceptible to failure/poor performance. As has been pointed out it also depends on how chatty the apps running on these machines are.
I don't know of any definitive documents because it's very hard to generalise in these circumstances. But below are some of the generic problems you can face with flat L2 networks.
I can't emphasize enough though that with these sort of things you need to weigh the costs of data loss/network unavailability against the cost of deploying routers/L3 switches because in the end that will be how you justify purchasing the extra equipment.
Problems with a flat L2 network
1) normal broadcasts as already discussed eg. arp
2) fault isolation (i) - eg. if a NIC on one of the machines is faulty the interruption this could cause to the rest of the network is greatly magnified by having everything at L2
3) fault isolation (ii) - a L2 network can be a lot harder to troubleshoot than a L3 segregated network. Imagine you have an issue with one host out of 10,000 and you have to track that down.
4) Security aspects - a virus infecting one device will propogate far more easily on a flat L2 network.
"They are simply convinced that because all the traffic is coming from old serial connection that have been converted to IP, the traffic generated will be negligable and therefore can be ignored. "
Again, it's just not a question of volume of traffic, but a question of kind of traffic (and the underlying support infrastructure).
What they have would be somewhat like 10,000 people on a phone system, each with little to say. Each conversation is isolated from others unless all 10,000 share a party line. You're moving from "individual" switched lines to a single "shared" (party) line. (Or, consider a walkie-talkie radio where 10,000 people share a single channel.)
Network switching techology will isolate most unicast traffic (after the initial unknown unicast address flood!), but broadcast traffic will not be isolated.
Another point, network devices are designed to be able to contain thousands of MACs, but are your hosts? In principle, each host might want to record the IP to MAC information for every other host on the network segment. Can they?
BTW, believe decreasing TTL won't help since we're considering just L2. If they did, L2 loops wouldn't be such a problem.
A flat network for 10K host will work. It will start falling apart if switchports are in default configurations and that are no provisions for network loop prevention, for example.
I started fresh and managed 5K flat network. Every time some ding-a-ling who would dictate to us how to connect his server or an id10t who'd connect their home-brand switch/router to the network an network loop would occur. Finally the CIO asked the fatal question (he's ignorant about VLANs and stuff so he's very against /24 subnet to each floor or building). And the response was exactly what he didn't like to hear: subnet each floor so broadcast will be contained.
Projects people and/or designers who have no knowledge whatsoever to networks are a pain. In several occasions the only way for me to win an argument or an impasse is to do what they designed and when things break (normally in a few days), fix the issue using my recommendation. This way I get what I want without embarassing anyone PLUS the designers believe that their design is still intact (not!).
Hope this helps.