Stabilizing the Samsung 990 PRO on Linux: Disabling ASPM and APST to Stop NVMe Disconnects

Or: all that reseating, cleaning, and slot swapping for a kernel parameter fix.

The problem

I have a Samsung 990 PRO 4TB as the boot/rpool NVMe in my homelab server (bastion — ZFS everywhere, MicroVMs, 24/7 uptime). It started randomly disappearing. No warning, no graceful degradation — just gone. dmesg would light up with the NVMe controller giving up, followed by I/O errors on every operation that was in flight:

[204263.471182] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[204283.495338] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[204283.569360] I/O error, dev nvme0n1, sector 2623470496 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[204283.569365] I/O error, dev nvme0n1, sector 3026384936 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[204283.569369] I/O error, dev nvme0n1, sector 3025352880 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[204283.569369] I/O error, dev nvme0n1, sector 2623481976 op 0x1:(WRITE) flags 0x0 phys_seg 11 prio class 2
[204496.790887] systemd[1]: systemd-timesyncd.service: Watchdog timeout (limit 3min)!

Then swap would start failing because the device backing it was gone:

[248903.580981] Read-error on swap-device (254:1:78162624)
[248903.590087] Read-error on swap-device (254:1:78162632)
[248903.599150] Read-error on swap-device (254:1:78162640)
...

ZFS would notice the drive had vanished and mark the vdev as FAULTED. Game over until reboot.

Read more  ↩︎

Temp-Based HDD Fan Control on ASRock Rack X470D4U via IPMI and NixOS

Or: how I spent an hour turning "my drives feel warm" into a nixos module that handles fan control over IPMI, and every dead end I hit along the way.

The problem

I have a homelab server with twelve 16TB HDDs in a ZFS array, plus the usual CPU/NVMe/case suspects. The whole thing lives in a Jonsbo N5 — a NAS case with a two-chamber layout. The bottom compartment holds the PSU, a 12-drive hot-swap backplane, and the fans that cool it. The top compartment holds the motherboard, CPU, and GPU. This means the HDD fans and CPU fans are in completely separate airflow zones, which is great for thermals but means the motherboard's fan curves (tuned for CPU temps) have no business controlling the drives below.

The bottom fans originally were those industrial "24/7 no PWM go brrr" fans that sound like a small jet engine. I replaced them with Noctuas. Much quieter. Possibly too quiet.

So I started wondering: are my HDDs cooking in there?

First check:

for d in /dev/sd?; do
  temp=$(sudo smartctl -A "$d" 2>/dev/null | awk '/Temperature_Celsius|Airflow_Temperature/ {print $10; exit}')
  echo "$d: ${temp:-N/A}°C"
done

Output:

/dev/sda: 33°C
/dev/sdb: 46°C
/dev/sdc: 47°C
/dev/sdd: 46°C
...

33 to 47°C is a wide spread, and 47 under idle-ish load means scrubs could push into uncomfortable territory. The ones in the middle of the drive cage (sdb-sdd, sdg) were clearly getting less airflow than the ones on the edges.

Time to do something about it.

Read more  ↩︎

I refused to give up Apple fonts when I switched to NixOS

I had a really slick KDE setup on Arch — custom theme, everything looking just right — and a big part of that was Apple's fonts. San Francisco for the UI, SF Mono in the terminal. On Arch this was easy: install apple-fonts from the AUR, done, move on with your life.

Then I switched to NixOS. No AUR. No prepackaged Apple fonts in nixpkgs (for obvious licensing reasons). But I was not about to redo my whole theme with different fonts. I needed these.

Apple hosts all of their fonts as DMG downloads on their developer site — no Apple ID required. I found a gist by robbins that already had a working Nix derivation for this. Grabbed it, plugged it in. Easy.

Read more  ↩︎

Easily setup Mullvad as an exit node for Tailscale using Docker

I'm a big fan of Tailscale. It's a great way to quickly and easily set up a VPN between all of my devices.

My home server is configured to only allow SSH connections over Tailscale, and I wanted a way to have that SSH connection running while also having external traffic be private with the help of Mullvad's VPN service.

Docker makes this easy by configuring the tailscale container to share the network stack of the mullvad container.

Read more  ↩︎