6 unexpected bottlenecks I found when I upgraded to 10 GbE in my home lab

Like most things, my upgrade to 10 GbE for my home lab wasn’t a pure win. Sure, when it worked, it was blazing fast, but it also held up a shining torch at the other potential bottlenecks around my network and home lab setup. I did worry that saturating the link might be difficult with my HDD storage array, but some of the other issues I didn’t think about beforehand led to dismay with the upgrade. Still, I’m glad I did it, and really, the biggest bottleneck was in my expectations, which needed to be more grounded in reality.

My hard drives were no longer fast enough

…

My hard drives were no longer fast enough

And I hit some issues with SATA controllers

With the gigabit network links I replaced, I never had to worry about hard drive speed being the bottleneck, because even one of my HDDs could max out the link. Most of my NAS drives are Seagate IronWolf Pro, which can sustain transfer rates of up to 285 MB/s, more than double the practical transfer speeds of a 1GbE connection.

With 10GbE in the equation, the balance of power has shifted the other way, and now I’d need the sustained throughput of three or four drives to saturate the link. That’s simply not going to happen with a six-drive RAID array, at least not while keeping some level of parity of at least one drive (I run two, because I’m my worst enemy in the home lab).

My RAID array was slowing things down

Parity calculations were ruining my transfer speeds

While there are many RAID levels you can use to increase data writing and access speeds, all of them come with some level of hit to the safety of your data. If I had infinite cash for hard drive space, I’d have mirrored vdevs, each with redundancy, so that I can recover from several drive failures at once. But alas, I do not, so I’m using two drive parity on a six-drive array.

Whenever data is written, I lose some theoretical throughput in overhead, as the system slows down while doing parity calculations. Good for my data safety, not so good for data transfer speeds, especially if the software controller is handling the parity calculations. Plus, the SATA controller in my NAS enclosure isn’t great, and it slows down after a while of sustained high usage. I’m fixing this by moving the bulk of my data storage to a server with much better quality parts, but it’ll still be an issue for many NAS users.

Not all of my SSDs had DRAM cache

HMB-only SSDs can throttle speeds substantially with sustained usage

One group of the M.2 NVMe SSDs in the images above is not the same as the rest, and it’s a difference you won’t notice without diving into the specifications sheets. If you knew that all but the Samsung 980 Pro were DRAM-less SSDs, give yourself a cookie. While they inherently have higher latency and lower read speeds, that’s not really the issue. The problem is that once you start doing sustained writes on DRAM-less SSDs, performance tanks.

While the mapping tables are no longer stored directly in NAND flash, DRAM-less SSDs still take up a chunk of system memory for the Host Memory Buffer (HMB). That’s also fine, really, as long as your NAS or server has plenty of RAM, but it runs the risk of data loss if power fails. But there’s one other, arguably worse effect of not having DRAM, higher write amplification, which will wear out those NAND cells faster than a more expensive, DRAM-equipped SSD.

The CPUs in my older devices struggle

They can cope until I’m ready to upgrade them too

One place I noticed instant slowdowns (and it was one of the things I specifically upgraded to 10GbE for), was between my NAS and the rest of the network. I tested the links between my desktops and the managed switch with iperf3, and they were within expectations, but then the last hop to the NAS was always below what I’d expected. I tried switching cables out to make sure that wasn’t the issue, but it turned out to be something I had no control over.

The CPU in my NAS enclosure was not up to the task of sustained 10GbE performance. Even worse, I only had a single 10GbE port, so I couldn’t even use link aggregation to increase throughput so that all my devices had a better chance of performance when transferring data. If I had a second 10GbE port, I probably would have set up SMB multichannel instead, to give my main desktop better bandwidth, but I coudn’t do either. And with the locked-down NAS platform, adding more 10GbE-capable ports would have cost a small fortune, making it beyond my scope (for now).

I hadn’t realized how much bandwidth background processes were taking up

Time to trim the fat and host services away from my storage

One of the parts of my upgrade to 10GbE was a new hardware firewall, which was specifically chosen for high throughput rates when packet inspection was switched on. That was fully expected, and worked without issues once the switchover was done.

What I hadn’t bargained for is how many other things on my network were consuming bandwidth, from self-hosted services on my NAS devices, remnants of tested programs on my server, a litany of cruft on my desktops from too many years of experimenting, and other similar things.

That’s before I noticed how much broadcast traffic my smart TVs and a few other IoT devices were sending out. A few DNS sinkholes later, that issue was solved, but it was taking up a significant chunk of bandwidth before that. I’d just assumed my 1GbE network was slow, or that my Wi-Fi was congested, when it was really (mostly) my fault for layering services on and not switching them off when no longer needed.

One switch had overloading buffers

Thankfully, this was an easy fix

Bufferbloat can quietly wreck the performance of any network, but I hadn’t experienced it on my home network until I jumped up to 10GbE. I wasn’t able to send enough data through the slower network connections to overload the buffers on one of my network switches, which has a mix of 1GbE, 2.5GbE, and 10GbE ports. But the issue reared its ugly head once I upgraded and started sustained transfers to shift file storage to NAS locations.

Granted, it didn’t reduce the throughput of the bulk data transfers by that much, but it messed up real-time communications and other applications that were using the network at the same time. The fix was to properly set up Quality of Service so that the bulk data got deprioritized and some scheduling so that those transfers and device backups all happen outside of office hours when nobody is in video meetings.

It just goes to show that upgrading one specification often introduces issues elsewhere

The jump to 10GbE has been a net positive for my home lab and home network. But I can’t say there were no teething problems in the switchover, and it exposed a few issues that I’d not caught when using my slower network. Would I have caught these issues beforehand if I was a trained professional? Perhaps, but there will always be some edge cases that even the best prepared plan can’t account for, and this will need rectifying once the new network is in place.

My hard drives were no longer fast enough

My hard drives were no longer fast enough

And I hit some issues with SATA controllers

My RAID array was slowing things down

Parity calculations were ruining my transfer speeds

Not all of my SSDs had DRAM cache

HMB-only SSDs can throttle speeds substantially with sustained usage

The CPUs in my older devices struggle

They can cope until I’m ready to upgrade them too

I hadn’t realized how much bandwidth background processes were taking up

Time to trim the fat and host services away from my storage

One switch had overloading buffers

Thankfully, this was an easy fix

It just goes to show that upgrading one specification often introduces issues elsewhere

Similar Posts