Stabilizing the Samsung 990 PRO on Linux: Disabling ASPM and APST to Stop NVMe Disconnects

Or: all that reseating, cleaning, and slot swapping for a kernel parameter fix.

The problem

I have a Samsung 990 PRO 4TB as the boot/rpool NVMe in my homelab server (bastion — ZFS everywhere, MicroVMs, 24/7 uptime). It started randomly disappearing. No warning, no graceful degradation — just gone. dmesg would light up with the NVMe controller giving up, followed by I/O errors on every operation that was in flight:

[204263.471182] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[204283.495338] nvme nvme0: Device not ready; aborting reset, CSTS=0x1
[204283.569360] I/O error, dev nvme0n1, sector 2623470496 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[204283.569365] I/O error, dev nvme0n1, sector 3026384936 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[204283.569369] I/O error, dev nvme0n1, sector 3025352880 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[204283.569369] I/O error, dev nvme0n1, sector 2623481976 op 0x1:(WRITE) flags 0x0 phys_seg 11 prio class 2
[204496.790887] systemd[1]: systemd-timesyncd.service: Watchdog timeout (limit 3min)!

Then swap would start failing because the device backing it was gone:

[248903.580981] Read-error on swap-device (254:1:78162624)
[248903.590087] Read-error on swap-device (254:1:78162632)
[248903.599150] Read-error on swap-device (254:1:78162640)
...

ZFS would notice the drive had vanished and mark the vdev as FAULTED. Game over until reboot.

Read more  ↩︎

Temp-Based HDD Fan Control on ASRock Rack X470D4U via IPMI and NixOS

Or: how I spent an hour turning "my drives feel warm" into a nixos module that handles fan control over IPMI, and every dead end I hit along the way.

The problem

I have a homelab server with twelve 16TB HDDs in a ZFS array, plus the usual CPU/NVMe/case suspects. The whole thing lives in a Jonsbo N5 — a NAS case with a two-chamber layout. The bottom compartment holds the PSU, a 12-drive hot-swap backplane, and the fans that cool it. The top compartment holds the motherboard, CPU, and GPU. This means the HDD fans and CPU fans are in completely separate airflow zones, which is great for thermals but means the motherboard's fan curves (tuned for CPU temps) have no business controlling the drives below.

The bottom fans originally were those industrial "24/7 no PWM go brrr" fans that sound like a small jet engine. I replaced them with Noctuas. Much quieter. Possibly too quiet.

So I started wondering: are my HDDs cooking in there?

First check:

for d in /dev/sd?; do
  temp=$(sudo smartctl -A "$d" 2>/dev/null | awk '/Temperature_Celsius|Airflow_Temperature/ {print $10; exit}')
  echo "$d: ${temp:-N/A}°C"
done

Output:

/dev/sda: 33°C
/dev/sdb: 46°C
/dev/sdc: 47°C
/dev/sdd: 46°C
...

33 to 47°C is a wide spread, and 47 under idle-ish load means scrubs could push into uncomfortable territory. The ones in the middle of the drive cage (sdb-sdd, sdg) were clearly getting less airflow than the ones on the edges.

Time to do something about it.

Read more  ↩︎

I refused to give up Apple fonts when I switched to NixOS

I had a really slick KDE setup on Arch — custom theme, everything looking just right — and a big part of that was Apple's fonts. San Francisco for the UI, SF Mono in the terminal. On Arch this was easy: install apple-fonts from the AUR, done, move on with your life.

Then I switched to NixOS. No AUR. No prepackaged Apple fonts in nixpkgs (for obvious licensing reasons). But I was not about to redo my whole theme with different fonts. I needed these.

Apple hosts all of their fonts as DMG downloads on their developer site — no Apple ID required. I found a gist by robbins that already had a working Nix derivation for this. Grabbed it, plugged it in. Easy.

Read more  ↩︎

Easily setup Mullvad as an exit node for Tailscale using Docker

I'm a big fan of Tailscale. It's a great way to quickly and easily set up a VPN between all of my devices.

My home server is configured to only allow SSH connections over Tailscale, and I wanted a way to have that SSH connection running while also having external traffic be private with the help of Mullvad's VPN service.

Docker makes this easy by configuring the tailscale container to share the network stack of the mullvad container.

Read more  ↩︎

Patching ACPI DSDT to enable S3 sleep on the Eluktronics MECH-15 G3

The Eluktronics MECH-15 G3 (Ryzen 9 5900HX / RTX 3070) ships with Windows-oriented S0ix "modern standby" as the only sleep mode in its ACPI tables. This is a problem on Linux, because modern standby is not really sleep — it's more like your laptop pretending to be asleep while it slowly drains your battery in its bag. I closed the lid, threw it in my well-insulated laptop bag, and pulled out a scorching hot potato with 40% less battery an hour later. Not ideal.

A quick primer on the sleep states, from worst to best:

S0ix/s2idle ("modern standby") — system stays in S0 but enters low-power idle substates where the SoC powers down components while keeping RAM alive. In theory this gives fast wake and background network activity. In practice, on Linux, it often falls back to plain s2idle (freeze userspace, idle CPU, pray) and drains like the laptop is awake.

S3 ("suspend-to-RAM") — everything powers off except RAM. The classic. 3-5% battery drain overnight. This is what we want.

S4 ("hibernate") — state saved to disk, full power off. Survives pulling the battery. Zero drain.

The 5900HX supports S3 just fine, but the stock DSDT (the ACPI table that tells the OS what the hardware can do) simply doesn't advertise it. The firmware vendor only bothered to declare S0ix because that's all Windows uses on modern laptops. Linux looks at the DSDT, sees no S3, and gives you s2idle. Thanks.

The fix: lie to the kernel. Patch the DSDT to add the S3 declaration that should have been there all along. Rasmus Moorats documented the same technique for the RedmiBook 16 — his writeup covers the full background and an alternative loading method via cpio/initrd for non-GRUB bootloaders like systemd-boot.

The steps below are for GRUB2, which is what I was using at the time.

Read more  ↩︎