Thursday, November 24, 2016

"A start job is running for dev-disk-by" and other horrors

My desktop system at home is currently running OpenSUSE Tumbleweed.  It used to run Debian until I got caught in an upgrade version-locked disaster.  Then it ran OpenSolaris until Oracle made that an unlikely proposition.

I've been reasonably happy with OpenSUSE until I made the mistake of trying to do a "zypper dup" recently, and the reboot showed me this:

[***   ] A start job is running for dev-disk-by\x2duuid-1a0dc1c5\x2d26cc\x2d45ff\x2da7b1\x2d1f827c971ff9.device (15s / no limit)

As long as one might care to sit there and watch, it never completed whatever task it was trying to perform.

I was able to boot up with a rescue CD (thank goodness I downloaded that first), and was able to mount the disks with no trouble.  But no amount of fooling around would make it boot.  It was not a happy evening.

After quite a bit of fooling about, I discovered that there were two serious problems that I had to fix manually, and I'm writing this for those who might have run into similar problems:

1. dracut is missing bits

Crucial kernel drivers go missing when dracut builds a new initrd image, and other bits get included whether you like it or not.  I have a mix of file systems in use, and here are the new configuration bits I had to add to /etc/dracut.conf.d:

add_dracutmodules+="btrfs"
add_drivers+="btrfs zlib_deflate xor raid6_pq"
add_drivers+="md-mod raid1 raid456"
omit_drivers+="nouveau"

That it would exclude the RAID and btrfs drivers by default was very surprising.

2. udevd is broken by default

The default configuration of udevd simply doesn't work right.  It limits itself to an absurdly tiny number of processes, and ends up failing to run trivial scripts needed by the Linux "MD" disk subsystem.  That's a big part of my boot problem.  The solution is to create a file named /etc/systemd/system/systemd-udevd.service with this inside:

[Unit]
Description=udev Kernel Device Manager
Documentation=man:systemd-udevd.service(8) man:udev(7)
DefaultDependencies=no
Wants=systemd-udevd-control.socket systemd-udevd-kernel.socket
After=systemd-udevd-control.socket systemd-udevd-kernel.socket systemd-sysusers.service
Before=sysinit.target
ConditionPathIsReadWrite=/sys

[Service]
Type=notify
OOMScoreAdjust=-1000
Sockets=systemd-udevd-control.socket systemd-udevd-kernel.socket
Restart=always
RestartSec=0
ExecStart=/usr/lib/systemd/systemd-udevd
MountFlags=slave
KillMode=mixed
WatchdogSec=3min
TasksMax=infinity

The important part is that "TasksMax=infinity" line.  That's what fixes the system so that it will actually boot again.

No comments:

Post a Comment