By David Trossell, CEO and CTO of Bridgeworks
Whenever a network seems to operate too slowly the conversation soon turns to how much bandwidth the network connectivity and infrastructure offers, and then it moves on to how much faster the network could be if more money were made available to ‘resolve’ the problem by buying higher bandwidth network connectivity. The trouble is that increasing your organisation’s bandwidth won’t necessarily equate to higher network performance.
WAN bandwidth is a little bit like the petrol mileage that motor manufacturers claim on their cars. It sounds good, but you never seem to get close to that figure you are expecting. In fact, you are more likely to get closer to the petrol mileage figure than you are to your WAN bandwidth.
Over the past few years organisations have seen a move from so-called small “transactional” type data transfers to the WAN to one that reflects the bulk data transfers that are associated with offsite data backup and cloud use. This can lead to a conflict between the network team and the data team where one blames the other for poor data throughput. I have been involved in so many conversations in which the data guys are moaning about the lack of throughput, while the network guys respond with: “It’s not our problem; you’re not even using all the bandwidth allocated to your WAN – it must be your program”, and so it goes on.
In the end the “Carrier” gets pulled in and “If you want to go faster, add more bandwidth. “The contract is signed; more bandwidth is added the salesman gets his commission and…nothing changes! You achieve only the same throughput! More head scratching and embarrassing questions are being asked by accounts and the CFO why we signed up for more expensive connection with no improvement. Why? The clue is in the poor utilisation figures that the network team is reporting.
Long distances
When organisations transport data over long distances that are typical for WANs, the TCP/IP latency effect rears its ugly head and kills the throughput while it waits for those all-important acknowledgements (ACKs) from the other end. So, throwing more bandwidth at the problem is not going to fix it.
Let me explain with an example: If we have a 1Gb/s WAN with 100ms of latency and we are transferring data in 4MB blocks. We send the block of data and then wait 100ms before we get the ACK back from the receiving end before we send the next block of 4MB. So, in 1 second we can send 10 blocks of 4MB = 40MB/s – not bad but a 1Gb WAN should be capable of transferring more than 100MB/s That’s only 40% utilisation.
So, what happens if we upgrade to 10Gb/s? Does it offer 10 times the performance? That is the perceived wisdom, but don’t forget we still have that 100ms of latency and 4MB blocks. So, we are still only going to transfer 10 x 4MB =40MB/s. Exactly the same as the 1Gb connection. However, the capability of the 10Gb connection is around 1GB/s, so now we have a utilisation of only 4%!
I wish I could say that’s the only problem, but there is yet another performance thief – Packet Loss. What if we lose a few packets along the way…that’s not a great problem, or is it? TCP/IP will resend those that were lost. We may lose a little time, but all the data will get there.
Life is never simple
Unfortunately, life is never simple in the world of data comms and TCP/IP. When TCP/IP sees packet loss, it loses confidence in the connection and shrinks the amount of data it places on the network until it gains confidence in the connection and starts to increase the block size again. Now let’s apply some packet loss to our example, and assume we shrink the data block by 75%. With the 1Gb WAN our performance drops to 10MB/s (10%) and with the 10G we drop the same throughput but now the utilisation is only 1%. That’s going to take some explaining!
So, what is the solution? Latency is governed by the speed of light and until someone finds another method of communication (perhaps quantum entanglement), then we are stuck with it. You can get low latency connections that take the shortest route, but at the end of the day the two end points are still the same distance apart. As for packet loss, you can order dedicated links which should have much lower packet loss, but both of these options add considerably to the costs.
Data optimisation products
SD-WANs are gaining popularity in many organisations and have many advantages in flexibility and cost over traditional WANs, but still suffer from the same latency and packet loss. The traditional workaround is to deploy WAN Optimisation products. These are data optimisation products as they do not optimise the WAN.
These are very effective in improving the user experience with Office-based products and other data applications, where the data can be compressed or be deduped, but they add no benefit if the data is already compressed or encrypted. One of the effects of all the high workload involved in compressing or deduplicating the data restricts the overall throughput capability below many of the WAN bandwidths currently available.
Mitigate latency
To gain control of the WAN and return the performance to the full capability of the WAN we need to first, mitigate the effects of latency and secondly, minimise the effect of packet loss. But how? Firstly, to minimise the effect of latency we take the incoming stream of data and split it up into multiple parts to simultaneously send these over the WAN as separate TCP/IP streams.
By filling the “pipe” it’s possible to drive the throughput up as well as the utilisation ratio. To mitigate the effects of packet loss we can manipulate the number of connections and the size of the data on the WAN. Managing these factors is beyond a network engineer’s ability to constantly tune these. The various other parameters make it impossible too, and that is why within PORTrockIT WAN Data Accelerator, AI is used to manage the whole process constantly by adjusting a myriad of parameters. Typical customers can realise up to 95% of the possible capability of the WAN bandwidth. The beauty of using agentless WAN Data Acceleration such as PORTrockIT is that it can be used in combination with SD-WAN products to give the user the ability to exploit both new technologies.
How does this work in the real world?
Bridgeworks was asked to see if we could help with NetApp SnapMirror replication of 85TB over approximately 2,000 miles across a 10Gb WAN connection. After all other options had failed, my team ran the replication back to back in the data centre and then ran the same replication over the WAN with the exactly same encrypted data set. As you can imagine the data centre replication was fast. However, over the WAN we were only 7MB/s slower.
5 best practice tips for managing bandwidth and network performance
So, here are my 5 top best practice tips for achieving WAN data acceleration, improved use of existing bandwidth and network performance:
- Before blaming the WAN, run the transfer within the data centre and then check the performance and the utilisation of the WAN when transferring data. If it is not in the high 80’s then consider WAN data Acceleration products. If the performance across the WAN is lower than the data centre, and the utilisation is high then consider upgrading the WAN.
- Make sure the solution you use can handle multiple differing protocols and not just file transfer protocols. With the increasing use of the Cloud and remote data centres as part of the Backup and Disaster Recovery strategy.
- Check with different data types to ensure you have the performance you need not only for backup but MORE importantly when you need to restore data. Many of the cloud transfer products use deduplication.
- Consider deploying a WAN Data Acceleration solution such as PORTrockIT to mitigate the effects of latency and packet loss. Add this as a layer onto SD-WANs, too, to achieve greater WAN performance.
- Think, you may not need to replace your existing infrastructure. You’re existing network infrastructure may need a boost, but this doesn’t mean that it should be replaced. However, you should plan for data growth, disaster recovery, etc.
The question of ‘How much bandwidth do I need?’ can often be the wrong question when more utilisation could be gained from an existing network infrastructure. However, data volumes are ever increasing, and the need for disaster recovery as well as service continuity is always something that requires constant attention and planning. One thing truism is that the big vendors are often happy to sell solutions that may not adequately mitigate latency, and so organisations should be wary and look to smaller vendors that are often more innovative – providing solutions that actually do the job.
Recent Comments