zunzuncito

For whatever reason I’ve been uncovering software bugs at an unprecedented rate in the past 10 days. This is by no means a bad thing, I enjoy hunting down and fixing bugs, but it does mean that the additional overhead of drafting a post about each bug becomes a bit too much. So instead here’s a quick overview - the linked patches and merge requests will have more information, if you are interested.

Trash size calculation in KIO

I noticed this one pretty much right after starting to use Dolphin but did not end up looking into it until quite a bit later: when displaying the size of the items in the trash, the application would always show 0 bytes. This would also cause the automated cleanup of items to fail - Dolphin simply believed that the trash was empty.

KDE uses the KIO framework to provide management of the trash. A recent commit had changed the construction of a QDirIterator in a way that would make it ignore all items when iterating over the trash directory. Thankfully the fix was straightforward and it was merged quickly.

git-shortlog(1) segfaults outside of a git repository

This one I uncovered as I was writing a small script to give me an overview of commit authors in all the git repositories I had cloned locally. I was happily scanning through my source directory using the --author flag for git-shortlog(1) to generate this, fully expecting git to complain about the few non-git directories I had. Instead of complaints, however, I got a segfault.

Turns out that a change back in May stopped setting SHA1 as the default object hash. This was done to progress the slow-moving transition to stronger hash functions but inadvertently broke git-shortlog(1) whose argument parsing machinery expected a default hash algorithm to be set. I sent a patch upstream.

An infinite loop in plasmashell

I regularly use the Activities functionality in Plasma 6 and switch through my activities using Plasma’s built-in activity manager. A couple of days ago I managed to make plasmashell, the provider for Plasma’s desktop and task bar, freeze - I had hit the “up arrow” key in the activity filter text box when there were no results visible. This was perfectly reproducible, so I went to investigate.

The cause of the issue was a do-while construct not handling a specific sentinel value, making it loop infinitely. For this one I also opened a merge request upstream.

About a week ago I noticed that fd(1), a Rust-based alternative to find(1), would suddenly segfault on my musl-based server system. Usually a segfault is nothing particularly special to my eyes, but this one was different. Even just having fd(1) attempt to print its help text was enough to trigger it, and when I attempted to debug it with gdb(1), I saw the following:

(gdb) run
Starting program: /usr/bin/fd

Program received signal SIGSEGV, Segmentation fault.
memcpy () at ../src_musl/src/string/x86_64/memcpy.s:18
warning: 18	../src_musl/src/string/x86_64/memcpy.s: No such file or directory
(gdb) bt
#0  memcpy () at ../src_musl/src/string/x86_64/memcpy.s:18
#1  0x00007ffff7ab7177 in __copy_tls () at ../src_musl/src/env/__init_tls.c:66
#2  0x00007ffff7ab730d in static_init_tls () at ../src_musl/src/env/__init_tls.c:149
#3  0x00007ffff7aae89d in __init_libc () at ../src_musl/src/env/__libc_start_main.c:39
#4  0x00007ffff7aae9c0 in __libc_start_main () at ../src_musl/src/env/__libc_start_main.c:80
#5  0x00007ffff74107f6 in _start ()

So… the segfault is in musl, not in fd!?

I immediately checked whether other basic programs on the system worked. They did. I checked when I last updated musl. A couple of months ago, so that can’t be it. I checked specifically whether another Rust-based program worked. It did.

fd(1) had been updated pretty recently, and I remembered it working correctly about a month ago, so maybe something specific to fd(1)’s usage of Rust triggered this segfault in musl? I wanted to make sure I could reproduce this in a development environment, so I cloned the fd(1) repository, built a debug release, and ran it…

It worked. Huh!?

I decided it was likely that portage, Gentoo’s package manager, was building the program differently, so I took care to apply the same build flags to the development build. And what can I say:

error: failed to run custom build command for `crossbeam-utils v0.8.20`

Caused by:
  process didn't exit successfully: `fd/target/[...]/build-script-build`
      (signal: 11, SIGSEGV: invalid memory reference)

… it didn’t even get to build the fd binary proper. A segfault again, too. What on earth was going on? Why didn’t this also happen in the portage build?

Thankfully I now had a reproducer, so I did the only sensible thing and started removing random build flags until I got fd to build again. This was our culprit:

-Wl,-z,pack-relative-relocs

Already pretty out of my depth considering the fact that I couldn’t fathom how fd(1) got musl to segfault on memcpy, I now also found that a piece of the puzzle required me to understand specific linker flags. Oof.

Unsure what to do next I decided on a whim to compare the working and the broken binary with readelf(1). The most obvious difference was that the working binary had its .rela.dyn relocation section populated with entries whilst the broken one was missing .rela.dyn but had .relr.dyn instead. At a loss, I stopped and went to do something else.

The story would probably have ended here had I not mentioned this conundrum to my partner later in the day. We decided to have another look at the binaries. After some discussion we determined that the working binary was dynamically linked whilst the broken one wasn’t. The other working Rust-based program, rg(1), was also dynamically linked and had been built a while ago, so at some point portage must have stopped producing Rust executables that were dynamically linked. Finally some progress!

At this point we need some background. Early on, Rust decided to use the x86_64-unknown-linux-musl target to provide statically-linked binaries that would run on a wide range of systems. Whilst support for dynamically linked executables on musl systems was added back in 2017, the default behaviour was never changed, so Gentoo has to make sure to disable static linking by passing the target-feature=-crt-static flag.

It does this in a system-wide fashion by setting an environment variable in /etc/env.d:

$ cat /etc/env.d/50rust-bin-1.80.1
LDPATH="/usr/lib/rust/lib"
MANPATH="/usr/lib/rust/man"
CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_RUSTFLAGS="-C target-feature=-crt-static"

This setting should therefore be picked up by portage as well, but when I examined its build environment it was simply not there. So finally we come to the last piece of the puzzle: a recent change in how RUSTFLAGS are set within portage. Here’s the important part:

local -x CARGO_TARGET_"${TRIPLE}"_RUSTFLAGS="-C strip=none -C linker=${LD_A[0]}"
[[ ${#LD_A[@]} -gt 1 ]] && local CARGO_TARGET_"${TRIPLE}"_RUSTFLAGS+="$(printf -- ' -C link-arg=%s' "${LD_A[@]:1}")"
local CARGO_TARGET_"${TRIPLE}"_RUSTFLAGS+=" ${RUSTFLAGS}"

Quoth the bash(1) manual:

Local variables “shadow” variables with the same name declared at previous scopes. For instance, a local variable declared in a function hides a global variable of the same name: references and assignments refer to the local variable, leaving the global variable unmodified.

When previously the RUSTFLAGS environment variable was only touched when cross-compiling, it was now overridden. To confirm, I edited the file in question to include the previous value, and both fd(1) and rg(1) worked again. Success!

This whole saga was also reported to the Gentoo bug tracker and promptly fixed. A project for another day is figuring out exactly how a change from static linking to dynamic linking causes segfaults like this, because I sure would love to know the details.

For the last couple of months I have been running sway on my main desktop system after having been forced away from hikari because of its practically halted development and incompatibility with newer wlroots versions.

I never felt completely satisfied with it and the whole experience was rather joyless, so about a week ago I decided to give KDE Plasma 6 a try after a surprisingly decent experience on the KDE Neon live image.

Whilst undoubtedly greater in its complexity and code size than sway, to me Plasma 6 seems like one of the last decent desktop environments still remaining. It’s incredibly customisable (but still comes with good defaults), looks nice out of the box, and most importantly seems to care about providing a nicely integrated and featureful experience. This even includes a companion app on Android, KDE Connect. It remains to be seen whether it will fully convince me in the long run, but for now I am very satisfied with it.

A picture of the KDE Plasma 6 desktop
environment, with a browser window, a terminal, and an instance of Dolphin, a
file manager.
KDE Plasma 6 with a few windows open

This last week was mostly spent learning about the desktop environment and setting everything up exactly how I want it to be, but there were two notable bugs to squash as well.

The first one reared its ugly head once I enabled backwards-compatibility with Qt5-based apps. I have a couple of such apps still, most prominently Mumble and Quassel IRC. Once the latter was built against the KFramework libraries, no more notifications were shown…

Fixing this ended up taking about two days, most of which were spent discovering exactly how KNotifications work. KDE provides apps with a tighter integration to the notification service, allowing users to specify which types of notifications to show, and how. Applications specify their notifications by shipping an <app>.notifyrc file. KDE ties this file to the application by matching its base name to the name given to the application (usually through a call to QCoreApplication::applicationName or when creating KAboutData).

It turns out that Quassel had recently been patched to fix an issue where desktop environments did not show its icon correctly. This required a call to setDesktopFileName in KAboutData to make environments aware of the connection. However, Quassel’s application name was changed in the same commit, severing its link with the name given through its quassel.notifyrc file. This seems to have been done in addition to the setDesktopFileName call and was not necessary to solve the issue the commit was trying to address.

I prepared a pull request fixing this issue by reverting part of the offending commit.

A picture of a
notification from Quassel IRC saying 'yay for notifications'.
Glad to have these back

The second bug I randomly came across whilst perusing journalctl and seeing the following error from Dolphin, KDE’s file manager:

QString(View)::contains(): called on an invalid QRegularExpression object
(pattern is '\A(?:file:///home/wolf/[Z-A]/?)\z')

Seeing this immediately made me wonder whether Dolphin plugs a URL straight into a regular expression without escaping it, and the answer, of course, is yes. I spent most of today’s afternoon hunting this issue down and preparing a merge request that fixes it in an elegant way.

I have a pretty extensive music library that I manage with MPD, the Music Player Daemon. For the longest time now I have also been aware of beets, another management system for music libraries. I played around with it a few times but never took the plunge to have it organize my entire collection.

A few days ago, whilst looking up a particularly obscure recording, I ended up finding it on MusicBrainz and decided to give beets, which integrates very tightly with that service, another serious try.

Yesterday I finally completed a first rough import of my entire library (which encompasses about 20,000 songs in 1400 albums). Given the integration with MusicBrainz, I now try to map every album to a release in their database. If I can’t find it there, I instead fall back to an old favourite of mine, Discogs. beets will automatically update and correct any tags once I select the right release.

Whilst importing I decided that I should make more use of the “Grouping” tag as a way to organize albums into an arbitrary group. This is useful if a series of media features music that was composed by multiple artists. By matching on the Haibane Renmei grouping, for example, I can find all music that was made for that show, without having to keep artist names in mind.

“Grouping” seemed well-supported in MPD, but whilst updating some albums that I (sadly) only have in MP3 format, I found that MPD would not add the grouping information to its database.

As per the ID3v2.4 standard, the TIT1 frame is used for this kind of information in MP3 files. Sure enough, that tag was set correctly by beets, and both mutagen-inspect and ffprobe found it. MPD, however, even though this PR had been merged almost 3 years ago, refused to pick it up.

After having the #mpd IRC channel sanity-check my configuration, I investigated some more. Perhaps my version of libid3tag was outdated. It wasn’t. Perhaps there were some encoding issues, but then why would other tags from the same file work fine? Couldn’t be that either. I hooked up GDB and found that this line from the PR was never actually reached at all!

I decided to look a bit closer at how exactly MPD reads tags. The specific scan_id3_tag function that the PR modified is only called in two places, plugins/DsdLib.cxx and (indirectly) in plugins/MadDecoderPlugin.cxx. I had neither of these decoders installed, so… MPD just never got to read anything.

Yet how was I getting any tags, then?

After some spelunking in the decoder plugin folders and with the fact on my mind that the only decoder I had actually compiled in was FFmpeg, something dawned on me. Perhaps it was FFmpeg that was reading the tags.

Indeed it was. Turns out that FFmpeg does all of the heavy lifting here, and MPD really just asks it for any metadata and parses the ones it understands.

MPD uses “grouping” as a cross-format identifier for grouping information. It expects that particular string to be a key in the AVDictionary returned by FFmpeg here. Crucially, FFmpeg does not expose TIT1 as “grouping” in its metadata conversion table, having MPD drop TIT1 on the floor like a hot potato.

It is debatable where this particular bug should be fixed. I decided to send a patch upstream to FFmpeg, given that more than just MPD can benefit from a fix there. For the next poor soul I also prepared a PR that clarifies how exactly MPD reads metadata.