Posts by ongardie

This post is about my experience running a Linux desktop virtual machine on a Linux host. I used the VM interactively, often in fullscreen mode, to develop software and run web apps (mostly Slack and Google Docs). My experience wasn't great, as I ran into many challenges.

My previous post on running VMs discussed running bare-bones Linux VMs and getting SSH access to them. I usually do that for testing, where the VMs are light-weight, shortlived, and disposable. It turns out that long-lived desktop VMs have more challenging requirements and come with a whole new set of issues.

VM setup

Following a virt-install, I did many of the tasks I normally do for setting up a Linux host (see my configs repo), including setting the hostname, setting the timezone, configuring APT, installing packages, and configuring other software.

I renamed the username and group from debian to diego, while logged in as root:

usermod -d /home/diego -m debian
usermod -c 'Diego' -l diego debian
groupmod -n diego debian

I could have created a new user instead, but I guess I like having uid 1000.

CPU and RAM

For CPU, I typically assigned 4 vCPUs to the VM. Changing this value requires restarting the VM. Just because the VM has a vCPU doesn't mean it's using it, so it might be reasonable to set this to half or more of the host CPUs. I think 4 vCPUs is the bare minimum needed for software development with VS Code and Rust, and I probably should have allocated more.

For RAM, virt-manager has a field for the maximum allocation and another for the current allocation. Changing the maximum allocation requires restarting the VM, but changing the current allocation up to the maximum can happen at runtime.

I created an additional swap disk to relieve memory pressure. Using a separate disk image was convenient because I didn't have to copy it every time when moving the VM to a different host. I still copied the swap image the first time because the VM's /etc/fstab referred to the swap partition's PARTUUID.

I also set browser.tabs.unloadOnLowMemory to true in Firefox, which may help with memory pressure.

Display

For video settings, I tried a few different options. My main problem was that I encountered serious rendering glitches with Google Docs in Firefox. I ultimately used Virtio with 3D Acceleration on and a display type of Spice Server with OpenGL on. However, I set gfx.canvas.accelerated to false in Firefox to work around the Google Docs rendering glitches.

I wanted the VM to be able to run in both windowed and fullscreen modes, which for my widescreen monitor meant 1792x1344 and 5120x1440 resolutions.

The default display resolutions were missing 5120x1440. I created /etc/X11/xorg.conf.d/10-monitor.conf:

Section "Monitor"
	Identifier "Virtual-1"
	Modeline "5120x1440_60.00"  624.50  5120 5496 6048 6976  1440 1443 1453 1493 -hsync +vsync
	Option "PreferredMode" "1792x1344"
EndSection

I got that Modeline by running:

$ sudo apt install xcvt
[...]
$ cvt 5120 1440
# 5120x1440 59.96 Hz (CVT) hsync: 89.52 kHz; pclk: 624.50 MHz
Modeline "5120x1440_60.00"  624.50  5120 5496 6048 6976  1440 1443 1453 1493 -hsync +vsync

I found that the Xfce Display Settings app was causing the VM to enter 5120x1440 mode on login, even though I wanted it to default to 1792x1344. I think I removed .config/xfce4/xfconf/xfce-perchannel-xml/displays.xml and then never opened the Xfce Display Settings again to work around that.

I created two scripts and two launcher buttons on the Xfce panel to switch between windowed and fullscreen modes. The video-windowed script:

#!/bin/sh
exec xrandr --output Virtual-1 --mode 1792x1344

And the video-fullscreen script:

#!/bin/sh
exec xrandr --output Virtual-1 --mode 5120x1440_60.00

Finally, I ran into an issue with the Notion window manager when running the VM in windowed mode. I normally hold the Meta key and drag the right mouse button to resize windows. This doesn't work. I could resize a window by dragging the left mouse button on the window border, but that border so thin that it's hard to hit. As a workaround, I enabled Shift+Meta while dragging windows in the VM, in .notion/cfg_bindings.lua:

     bdoc("Resize the frame."),
     mdrag("Button1@border", "WFrame.p_resize(_)"),
     mdrag(META.."Button3", "WFrame.p_resize(_)"),
+    mdrag("Shift+"..META.."Button3", "WFrame.p_resize(_)"),

Suspend

I wanted to be able to keep this VM "turned on" even while the host computer was suspended or hibernated.

Early on, I found that running suspend or hibernate within the VM did not work, and I disabled it:

sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

The Xfce logout dialog stops showing the buttons to take these disabled actions, which is nice.

I also found that I couldn't pause and resume the VM in virt-manager successfully.

I could generally suspend the host, and the VM would tolerate that.

However, in March 2025, the VM kept crashing when the host suspended. I upgraded the qemu-* and seabios packages to the Debian 12 backports versions (and had to install qemu-system-modules-spice), which resolved the suspend issue.

Shared filesystem

I set up a Virtio filesystem to share files between the host and the VM. In virt-manager, I used virtio-9p because virtiofs wasn't supported when running libvirt under in user (session) mode:

Unable to add device: unsupported configuration: virtiofs is not yet supported in session mode

The target path in virt-manager ends up being the name of the 9p share in the VM. I added this to /etc/fstab in the VM:

/home/diego/host-share /home/diego/host-share 9p trans=virtio,version=9p2000.L,posixacl,msize=104857600 0 2

I don't remember where I found these options; it may have been the QEMU wiki.

Disk image size and VM migration

I occasionally moved this VM from a desktop to a laptop and vice versa. I didn't set up any fancy online migration; I just shut it down and copied the disk image.

Ideally, I'd free up some space and shrink the disk image prior to copying it. I have discard enabled in the VM's /etc/fstab. I found I had to change the virt-manager disk's Discard mode setting from Hypervisor default to unmap. Then I ran:

sudo fstrim -v /

within the VM, which reported that it trimmed many gigabytes. The qcow2 image shrank in size accordingly on the host. The manual fstrim may not be necessary if you configure virt-manager from the start.

I used rsync to copy the disk image, with the --compress and --sparse flags. I probably should have used the --inplace flag also (which used to conflict with --sparse but doesn't anymore).

To copy the VM's configuration, I ran this on the original host:

virsh dumpxml $NAME > $NAME.xml

And then imported it on the new host:

virsh define --file $NAME.xml

I also needed to change the display Splice device to auto, but that might be because I had messed with it on the original host.

Keyboard

I set Caps Lock to be Control on some keyboards. I found I had to do this again within the VM by setting:

XKBOPTIONS="ctrl:nocaps"

in /etc/default/keyboard.

Webcam

I tried passing through a USB webcam for video conferencing, but this was too slow to work reliably. I don't have a good solution to this. My workaround was to run the video conferencing on the host.

Conclusion

That's it. I used this VM for months, so I don't think I'd find new issues beyond these. The good news is that most of these are either easy to work around or acceptable limitations, with some patience. My top remaining issues are:

No webcam support,
No ability to suspend/hibernate/pause the VM, and
No dynamic CPU allocation.

Webcam support is important, especially because video conferencing often needs to be authenticated, and that account may "belong" inside the VM. I don't know how to solve it today, other than perhaps passing a PCIe USB card through to the VM. The last two are "software issues" and might well be solvable with some more configuration effort.

Trying Nushell

ongardie

2025-04-28 23:03

I've been trying out Nushell again lately. I blogged in 2021 about how command-not-found is slow to build an index of which commands are provided by which Debian packages. I created an alternative Posix shell script that performs well without an index. Now, I ported this script to Nushell. I found it interesting to compare the differences.

Comparison

The two versions are here. Both rely heavily on regular expressions. The Posix shell version is about 10 lines and 250 characters longer. Part of this is additional logic to use ripgrep with LZ4 when available, yet fall back to some default that works otherwise:

PATTERN="^(usr/)?s?bin/$1\s"
if command -v rg > /dev/null && command -v lz4 > /dev/null; then
    LINES=$(files | xargs -0 rg --no-filename --search-zip "$PATTERN")
else
    echo "Run 'sudo apt install ripgrep lz4' to speed this up"
    LINES=$(files | xargs -0 /usr/lib/apt/apt-helper cat-file | grep -P "$PATTERN")
fi

The equivalent Nushell code uses par-each for parallelism, which is built into Nushell and seems quite handy:

let packages = $files
    | par-each {
        /usr/lib/apt/apt-helper cat-file $in
            | parse -r ('^(?:usr/)?s?bin/' + $command + '[ \t]+(?<packages>.*)$')
            | get packages
            # ...
    }

(This website uses Pygments to do syntax highlighting. I found that it doesn't support Nushell syntax yet, so the above is rendered as Perl for now. Perl is my go-to when I need syntax highlighting on an unsupported language, since it accepts almost any syntax.)

I also appreciate how I was able to replace this sequence of sed commands. It was opaque enough to require a comment, and the sed command had two bugs I found in writing this post (missing -E, which is needed for the +, and missing m flag, which is needed for multiline processing):

# This sed expression drops the filename, splits the package list by the comma
# delimiter, and drops the section names.
PACKAGES=$(echo "$LINES" | sed -E 's/^.* +//; s/,/\n/g; s/^.*\///m')

With this Nushell code (following the longer pattern above), which makes its intent more clear:

# ...
| split row ','
| str replace -r '^.*/' ''

Similarly, another buggy call to sed in Posix shell (this won't work with newlines):

$(echo "$PACKAGES" | sed 's/ /|/g')

becomes a more obvious operation in Nushell:

($packages | str join '|')

Performance

I ran a quick benchmark to compare the versions of apt-binary.sh (with and without ripgrep) and apt-binary.nu. I benchmarked with the files in cache, by repeating each command at least once and discarding the first result, although it didn't seem to matter much. I used Nushell v0.103.0 from the Github release binary for x86_64-unknown-linux-gnu.

I ran this in a container where there wasn't much parallelism available, as it only had Debian Bookworm package lists for one architecture:

$ ls /var/lib/apt/lists/*Contents* | get size | sort
╭───┬──────────╮
│ 0 │  13.6 kB │
│ 1 │ 105.9 kB │
│ 2 │ 132.2 kB │
│ 3 │ 166.8 kB │
│ 4 │   1.5 MB │
│ 5 │  19.8 MB │
│ 6 │  57.0 MB │
╰───┴──────────╯

Here are the results for the two versions:

$ sudo apt-get remove -y ripgrep | ignore
$ timeit { apt-binary.sh cvs }
Run 'sudo apt install ripgrep lz4' to speed this up
Sorting... Done
Full Text Search... Done
cvs/stable 2:1.12.13+real-28+deb12u1 amd64
  Concurrent Versions System

908ms 116µs 365ns
$ sudo apt-get install -y ripgrep | ignore
$ timeit { apt-binary.sh cvs }
...
804ms 462µs 550ns
$ timeit { apt-binary.nu cvs }
...
1sec 510ms 564µs 515ns

The Nushell version takes almost twice as long as the ripgrep version. It seems to be much slower at doing pipelines with built-in commands, while performing similarly to dash for external pipelines.

Here's an example with counting bytes, where piping into Nushell's length adds a lot of time:

$ let f = "/var/lib/apt/lists/deb.debian.org_debian_dists_bookworm_main_Contents-all.lz4"
$ timeit { open --raw $f | wc -c }
57084589
4ms 101µs 100ns
$ timeit { open --raw $f | length | print }
57084589
756ms 112µs 752ns
$ timeit { sh -c $'lz4 -d ($f) -c | wc -c' }
516439434
252ms 204µs 142ns
$ timeit { lz4 -d $f -c | wc -c }
516439434
259ms 633µs 621ns
$ timeit { lz4 -d $f -c | length | print }
516439434
6sec 361ms 656µs 908ns

Here's another example for a basic grep or equivalent, where Nushell's find takes much longer:

$ timeit { sh -c $'lz4 -d ($f) -c | grep ripgrep' }
usr/src/rustc-1.78.0/src/tools/rust-analyzer/crates/project-model/test_data/ripgrep-metadata.json devel/rust-web-src
283ms 70µs 233ns
$ timeit { lz4 -d $f -c | grep ripgrep }
usr/src/rustc-1.78.0/src/tools/rust-analyzer/crates/project-model/test_data/ripgrep-metadata.json devel/rust-web-src
297ms 517µs 864ns
$ timeit { lz4 -d $f -c | find ripgrep | print -r }
usr/src/rustc-1.78.0/src/tools/rust-analyzer/crates/project-model/test_data/ripgrep-metadata.json devel/rust-web-src
805ms 466µs 406ns

These may not be quite apples-to-apples comparisons for reasons like Unicode handling, but it seems like Nushell's internal pipelines could use more optimization.

Refactoring issues

Nushell has some support for testing and assertions, which I hoped to use for testing the parsing code. I ran into some problems, however. It seems like parse is special in being able to take a byte stream and parse lines of it, but this somehow doesn't work with a function call (a "command" in Nushell terminology):

$ let f = "/var/lib/apt/lists/deb.debian.org_debian_dists_bookworm_main_Contents-all.lz4"
$ def parsefn [pattern] { $in | parse -r $pattern }
$ timeit { lz4 -d $f -c | parse -r "^usr/bin/(.*) " | length | print }
8791
1sec 142ms 15µs 922ns
$ timeit { lz4 -d $f -c | parsefn "^usr/bin/(.*) " | length | print }
0
1sec 344ms 869µs 81ns
$ timeit { lz4 -d $f -c | lines | parse -r "^usr/bin/(.*) " | length | print }
8791
1sec 269ms 821µs 783ns
$ timeit { lz4 -d $f -c | lines | parsefn "^usr/bin/(.*) " | length | print }
8791
3sec 169ms 105µs 892ns

It seems like byte streams can't cross into function calls:

$ /bin/echo hi | describe
byte stream
$ def describefn [] { $in | describe }
$ /bin/echo hi | describefn
string

So by trying to refactor my code to test it, I unintentionally changed its behavior. Getting the old behavior back would require harming performance drastically or finding a different abstraction boundary.

Closing thoughts

I'm torn about Nushell. Even with over two decades of using Bash/Dash/Zsh, I still struggle with it. In writing this post, I found 3 bugs in my Posix shell script. Nushell feels like a massive improvement. It has nice syntax, type checking, data structures, convenient argument parsing, and a cohesive library of built-in commands. Nushell remains succinct enough to feel like a shell rather than a programming language, with easy escapes into "raw" Unix programs.

Beyond retraining my brain, I struggle with two things. First, it didn't take me long to run into the issues above, so Nushell may still need more time/work to mature. Second, whenever I need interoperability with others, I can count on them having a Posix shell available, but I can't count on them using Nushell. I shouldn't let that hold me back from using a better tool on my own computers, but I can't escape having to write Posix shell scripts going forward, which means having to remember all the gotchas and tricks. Maybe it'd help if there was a subset of Nushell that could be compiled to Posix shell for interoperability.

Running Virtual Machines on Linux

ongardie

2024-12-02 01:33

Virtual machines are useful for running other operating systems within your computer and for testing and sandboxing system-level software. This post is about running VMs locally on Linux: how to get a usable disk image and how to connect to it over SSH. It's not as trivial as it sounds.

The VM ecosystem has evolved over the last decades. QEMU/KVM is still the easiest way to get started on Linux, but many of the ecosystem's projects are designed for running cloud services rather than for desktop or casual server use. This post aims to provide simple instructions for running VMs without installing large software stacks. It covers running VMs with and without cloud-init and libvirt.

This post deals with a few challenges:

Operating systems and distributions traditionally offered installers, which you'd insert into a computer on removable media, like a CD or USB stick. Installers can also be used for VMs, but they take a lot of steps to run. Now, distros typically offer multiple types of disk images, letting users skip the install process. (These images handle issues that normally come with cloning disks, like generating new machine IDs and SSH keys.) Where to find these images or which one to choose can be non-obvious. For VMs, I usually want barebone images that offer a quick path to SSH access.
The virtual disk images are often too small to be practical. For example, Debian's are only 2 GB in size, so you can't download or install much software in them before running out of space. Worse yet, lots of software breaks with confusing errors when the filesystem is out of space. This post includes instructions for growing the disk images, their root partition, and their root filesystem. Since QEMU's qcow disk image format is sparse, this won't require much more space on the host's disk.
This post uses QEMU's default user-mode networking stack, which creates a local network with NAT for the VM. This allows the VM to make connections to the Internet and to the host (at 10.0.2.2). Note that the VM might not be able to ping the host, but TCP and UDP traffic should still work. However, the host won't be allowed to make connections to the VM. This post uses QEMU's host port forwarding to allow the host to connect over SSH to the guest VM.

Update 2025-05-08: The VM will be able to access services listening on localhost on the host, as well as services on the local network, which could be security concerns.
Logging into the distribution-provided disk images can also be a challenge. This post gets to logging in over SSH and includes both manually installing an SSH key and using cloud-init to automate this process.

These instructions are Linux, Debian, and amd64-centric because that's what I'm most familiar with. There are many alternatives and options at every level of the stack. This post aims to provide a reasonable starting point, not the most optimal configuration.

Update 2025-05-08: There's a follow-up post on running desktop VMs.

`nocloud` images

This section describes how to start a basic interactive VM. To achieve this, we'll use QEMU/KVM directly and use Debian's nocloud image. This image allows root to log in with no password and does not have cloud-init installed. (The next section deals with cloud-init images.)

First, ensure the host CPU has virtualization extensions enabled. Almost all modern amd64/x86-64 CPUs support virtualization, but some systems have this disabled in the UEFI settings. For AMD, the extensions are called SVM or AMD-V, and should cause an svm flag to show up in /proc/cpuinfo when enabled. For Intel, the extensions are called VT-x and should cause a vmx flag to show up in /proc/cpuinfo when enabled.

grep -m 1 -P '^flags\b.*\b(svm|vmx)\b' /proc/cpuinfo

If you get no output, reboot into your UEFI settings, enable the setting, and check again.

Then, install the packages on the host for KVM/QEMU:

sudo apt install ovmf qemu-system-gui qemu-system-x86 qemu-utils

Download the Debian 12 nocloud VM image, resize the virtual disk, and run the VM:

curl -LO https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-nocloud-amd64.qcow2
cp -i debian-12-nocloud-amd64.qcow2 test.qcow2
qemu-img resize test.qcow2 32G
kvm -m 4G -nic user,hostfwd=tcp::2200-:22 test.qcow2

The options for running KVM/QEMU are vast. Refer to the man page and docs as needed.

In the graphical window that pops up, log into the VM as root with no password.

Note: If the VM grabs (steals) your mouse and keyboard, there's a magical key combination to escape, which is Ctrl-Alt-G for me. Check the window titlebar for a hint if that doesn't work.

Install an SSH server in the VM:

apt update
apt instal --yes openssh-server

Now, you have some authentication options:

Ideally, authorize an SSH key, for example by downloading it from GitHub:

mkdir -m 700 -p .ssh
curl https://github.com/⟪USER⟫.keys | tee -a .ssh/authorized_keys

Or if that's inconvenient, set a root password:

passwd
echo 'PermitRootLogin yes' > /etc/ssh/sshd_config.d/10rootpassword.conf
systemctl restart ssh

Or if you don't care for this VM at all and are confident in the security of your host's network, allow SSH as root with no password:
```
cat > /etc/ssh/sshd_config.d/10insecure.conf <<END
PermitRootLogin yes
PermitEmptyPasswords yes
END
systemctl restart ssh
```

Now, you can SSH from the host:

ssh root@localhost -p 2200

From inside the VM, resize the root partition and filesystem:

apt install --yes cloud-guest-utils
growpart /dev/sda 1
resize2fs /dev/sda1

And upgrade stale packages:

apt upgrade --yes

Now your VM is ready to use.

You can shut it down gracefully or kill the kvm process to stop the VM. Then you can remove its disk image.

`cloud-init` images

This section describes how to run cloud-init images, which are provided by many distributions and operating systems. cloud-init is software inside the VM that runs at boot to discover configuration provided to it from the outside world. This requires a little more setup before starting the VM, but then the VM will be better configured on startup.

There are various ways to inject the cloud-init configuration. This post uses the simplest NoCloud data source with an extra attached drive. The extra drive (like a virtual CD-ROM or thumb drive) is given a volume label of CIDATA so cloud-init inside the VM can discover it during boot and look for configuration files inside.

One added benefit of using cloud-init is that these images typically have cloud-initramfs-growroot, which means they will automatically grow their root partition and filesystem if the virtual disk has extra space (at least on the first boot). This saves us some steps.

In this section, we'll use Debian's generic image, which contains cloud-init. There is no way to log into this image without using cloud-init.

The nocloud terminology is confusing: Debian nocloud images do not contain cloud-init. Debian generic images contain cloud-init and can use its NoCloud data source. No clouds were harmed in the writing of this blog post.

Debian also offers a genericcloud image, but I don't recommend it. It's smaller than the generic image by omitting a bunch of drivers (about 330 MB vs 420 MB). However, these drivers can be useful, even with KVM/QEMU:

QEMU emulates an e1000 driver for the NIC by default, and the genericcloud image doesn't have that driver. (You can use a virtio NIC to work around this by passing model=virtio-net-pci as an option to -nic.)
The way that cloud-install presents its cloud-init configuration drive (as used in the next section) also requires drivers, so the genericcloud images will fail to find the drive.
I don't know what other QEMU devices might have missing drivers and cause headaches in the future.

Download the Debian 12 generic VM image and resize the virtual disk:

curl -LO https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-generic-amd64.qcow2
cp -i debian-12-generic-amd64.qcow2 test.qcow2
qemu-img resize test.qcow2 32G

At this point, if you run kvm like before, you'll observe that you can't log in. This can be frustrating if cloud-init somehow isn't working.

Next, set up a cloud-init configuration to set a password and authorize an SSH key. This happens in a file called user-data. The user-data file format is YAML with an extra #cloud-config header. This post uses a more JSON-like format to prevent whitespace errors; as YAML is a superset of JSON, this is allowed. These options apply to the default user, which varies per distro. The default username is debian for Debian images.

mkdir -p cidata
tee cidata/user-data <<END
#cloud-config
{
    "password": "p4ssw0rd",
    "chpasswd": {
        "expire": false,
    },
    "ssh_authorized_keys": [
        "$(cat ~/.ssh/id_ed25519.pub)",
    ],
}
END

Also, create a blank meta-data file, since that's required:

echo > cidata/meta-data

The cloud-init documentation describes the possible fields.

To get these files to the VM, they'll need to be packaged into an ISO or VFAT filesystem with the volume label of CIDATA. Rather than do that manually, QEMU can create a VFAT drive from a directory. The QEMU invocation is a mouthful, but it's still fairly convenient.

kvm -m 4G \
    -nic user,hostfwd=tcp::2200-:22 \
    test.qcow2 \
    -drive file=fat:./cidata,format=vvfat,if=virtio,label=CIDATA

If all goes well, you can log in interactively as user debian with password p4ssw0rd (as set in user-data), and that user can sudo without a password. You can also use SSH with your SSH key:

ssh debian@localhost -p 2200

Thanks to cloud-init, the root partition and filesystem have already been expanded to the available disk image size.

Upgrade stale packages:

sudo apt update
sudo apt upgrade --yes

Now your VM is ready to use.

Next time you run the VM, you don't need to provide the CIDATA image, since its job is done.

libvirt

This section runs VMs using libvirt, which is a way of managing VMs without lengthy KVM invocations. libvirt isn't necessary, but it's probably a better approach if you want to manage long-lived VMs with a variety of operating systems. It also configures KVM with more modern defaults.

Libvirt is a collection of software, named after the underlying library:

virsh is its main command-line interface.
virt-install is a command-line tool to create the VMs (as this is difficult to do with virsh directly).
virt-viewer provides a virtual keyboard, video, and mouse.
virt-manager is a GUI for managing VMs. While you can use the GUI to do most of this, this post focuses on the command-line tools.

Libvirt can be used in a system-wide mode (qemu:///system) or for your local user (qemu:///session). This post uses the user mode. (The system mode may be useful to bridge your VMs to the host's network. To use the system mode, you'd need to install libvirt-daemon-system and add your user to the libvirt group.)

Unfortunately, different libvirt tools have different default modes:

virsh defaults to qemu:///session.
virt-install defaults to qemu:///system.
virt-manager defaults to showing both and connecting to neither.
virt-viewer defaults to qemu:///session.

You can pass --connect qemu:///session to any of these tools. You may want to set up shell aliases for convenience.

Install the host packages:

sudo apt install \
    gir1.2-spiceclientgtk-3.0 \
    libvirt-clients \
    libvirt-daemon \
    virtinst \
    virt-manager \
    virt-viewer

Create a VM using virt-install based on Debian's generic image. This uses the image downloaded and the cloud-init configuration files created in the previous section.

cp -i debian-12-generic-amd64.qcow2 test.qcow2
qemu-img resize test.qcow2 32G
virt-install --connect qemu:///session \
    --cloud-init 'disable=on,meta-data=./cidata/meta-data,user-data=./cidata/user-data' \
    --disk test.qcow2 \
    --import \
    --memory 4096 \
    --name test \
    --os-variant debian11

This uses the debian11 OS variant since the osinfo-db package in Debian 12 does not currently know about Debian 12. The OS variant doesn't appear to matter much when importing a disk image.

Use the key combination Ctrl-] to exit the virsh console. You can pass --autoconsole none to virt-install if you don't want to be dropped into the console.

Since virt-install supports cloud-init, we didn't need to have QEMU present a CIDATA drive. Actually, we don't even need the YAML files for simple settings. The following is usually sufficient:

--cloud-init "disable=on,clouduser-ssh-key=$HOME/.ssh/id_ed25519.pub"

Forward a host port to SSH to the VM:

virsh qemu-monitor-command --hmp test 'hostfwd_add tcp::2200-:22'

Since libvirt doesn't currently support forwarding host ports, you'll need to run that hostfwd_add command every time you run the VM. Note that test in the command is the name of the VM. This workaround is thanks to Adam Spiers' blog post from 2012.

Then run:

ssh debian@localhost -p 2200

Upgrade stale packages:

sudo apt update
sudo apt upgrade --yes

Now your VM is ready to use.

These are some useful libvirt commands:

List VM statuses:
```
virsh list --all
```
Boot an existing VM:
```
virsh start ⟪NAME⟫
```
View a VM's text console:
```
virsh console ⟪NAME⟫
```
View a VM's graphical console:
```
virt-viewer ⟪NAME⟫
```
Shut down a VM gracefully:
```
virsh shutdown ⟪NAME⟫
```
Shut down a VM forcefully:
```
virsh destroy ⟪NAME⟫
```

Delete a VM and its disks:

virsh undefine ⟪NAME⟫ --remove-all-storage

And recall that virt-manager is a useful GUI that also provides all this virsh and virt-viewer functionality and more.

Someday/maybe

virt-manager issue #143 would allow using cloud-init when creating new VMs from the GUI.
libvirt issue #285 would support forwarding host ports.
virt-customize appears to directly modify VM disk image files and may be an alternative to cloud-init. I haven't tried it.
The libvirt SSH proxy should allow SSH to VMs over VSOCK instead of TCP. This would remove the need for port forwarding for SSH. However, it's not packaged for Debian or widely supported in VM images yet.

Update 2025-05-08: There's a follow-up post on running desktop VMs.

Other distributions and operating systems

Many operating systems offer cloud images and cloud-init support. This section documents a few that I've tried.

Debian

Debian images for Debian 12 (Bookworm) were covered above. For other releases, see https://cloud.debian.org/images/cloud/.

The default user is debian.

Use --os-variant debian11 as the closest version known to Debian 12.

Fedora

These images work with and require cloud-init. For other releases, see https://download-ib01.fedoraproject.org/pub/fedora/linux/releases/. For more info, see https://fedoraproject.org/cloud/download.

Fedora 41:

curl -LO https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2

The default user is fedora.

Use --os-variant fedora37 as the closest version known to Debian 12.

FreeBSD

These images allow logging in interactively as root with no password, and they grow their root filesystem automatically. Despite its name, even the BASIC-CLOUDINIT images do not appear to run cloud-init yet. For other releases, see https://download.freebsd.org/releases/VM-IMAGES/.

FreeBSD 14.1 BASIC-CLOUDINIT with ZFS variant:

curl -L https://download.freebsd.org/releases/VM-IMAGES/14.1-RELEASE/amd64/Latest/FreeBSD-14.1-RELEASE-amd64-BASIC-CLOUDINIT-zfs.qcow2.xz | \
    xz -d > FreeBSD-14.1-RELEASE-amd64-BASIC-CLOUDINIT-zfs.qcow2

FreeBSD 14.1 standard with ZFS variant:

curl -L https://download.freebsd.org/releases/VM-IMAGES/14.1-RELEASE/amd64/Latest/FreeBSD-14.1-RELEASE-amd64-zfs.qcow2.xz | \
    xz -d > FreeBSD-14.1-RELEASE-amd64-zfs.qcow2

Use --os-variant freebsd13.1 as the closest version known to Debian 12.

Ubuntu

These images work with and require cloud-init (which is a Canonical project). For other releases, see https://cloud-images.ubuntu.com/.

Ubuntu 24.04 LTS (Noble):

curl -L -o noble-server-cloudimg-amd64.qcow2 \
    https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img

Ubuntu 24.10 (Oracular):

curl -L -o oracular-server-cloudimg-amd64.qcow2 \
    https://cloud-images.ubuntu.com/oracular/current/oracular-server-cloudimg-amd64.img

The default user is ubuntu.

Use --os-variant ubuntu22.10 as the closest version known to Debian 12.

Website Updates (2024)

ongardie

2024-11-24 03:03

I did a bunch of more updates to this website, including adding a dark mode. Most of the other changes are either invisible or barely noticeable. That's OK. My several visitors will appreciate them. Or at least I'll appreciate them.

Dark mode and syntax highlighting

I added a dark mode using CSS, with prefers-color-scheme (widely availabe as of 2020). I switched syntax highlighting (using Pygments) from inline styles to CSS classes, which use different theme colors for light and dark modes. This change will cause the RSS feed to not have syntax highlighting (since RSS readers shouldn't interpret CSS classes), whereas before some RSS readers may have allowed the style elements. I think RSS feeds aren't supposed to be styled, so that's probably OK.

RSS feed

I reduced the size of the RSS feed by only including summaries for old posts, rather than the full content. Also, if I write too many more posts, the RSS feed will stop including the oldest posts. I really prefer when RSS feeds include the full content with each post, but I think that's less useful for very old posts.

Markdown renderer

Many of these pages are generated from Markdown. I added support for it in 2014 using Python-Markdown, which was probably the obvious choice. Since then, some folks standardized Markdown as CommonMark, which is used widely in VS Code and GitHub (GitHub Flavored Markdown extends CommonMark), among others. The differences between Python-Markdown and CommonMark are small but annoying, so I've switched to markdown-it-py. Since this website is generated as static pages, it was relatively easy to diff the HTML and RSS across this change for manual review.

Mako parsing conflicts

This website contains both HTML and Markdown pages and templates. The Markdown pages are processed through the Mako templating engine and then the Markdown renderer. The HTML pages are just processed through Mako. Most of what I use Mako for is basic variable substitution, loops, and if statements. It also allows running full Python code inside templates, which I like. (I write the templates, so I can live with myself abusing them. I wouldn't want this in a larger team project.)

Mako's syntax is usually not in conflict with Markdown or HTML, but I've found three exceptions where there are ambiguities:

`${variable}`

Mako interprets the occasional ${variable} in a shell script (or this paragraph) as a Mako variable substitution. Sometimes I want one behavior (Mako) and sometimes I want the other (literal). Fortunately, when this happens, it's unlikely that the variable exists in Mako's scope, so it usually causes a build error.

I haven't found an easy way to escape ${variable} locally. The best approach is to use <%text> to opt out of Mako for an entire section. Otherwise, using &dollar; is an option in HTML, but not within a `code block` in Markdown (see CommonMark example). Another approach is to define a Mako variable called dollar with a value of $, then write ${dollar}{variable}.

Trailing backslash

Mako consumes a trailing backslash at the end of a line and merges the line below it. Many programming languages use this syntax too. It's easy to forget about this and get poor rendering of code blocks. Similar to the dollar issue, <%text> is a good way to disable it, or you can use \ in HTML but not Markdown, or you can define a backslash variable.

I found that I was only using the trailing backslash feature of Mako in one place, so I wanted to turn it off and have a default that won't keep biting me. Unfortunately, substituting \\ for \ at the end of the line doesn't help, as consuming the second backslash is baked into Mako's lexer. Instead, I added a workaround that replaces a trailing \ with a unique string before Mako runs and then replaces that string back to a \ after Mako runs. I think that should in most contexts, including code and non-code and Mako-enabled and Mako-disabled regions. This does completely prevent using a trailing backslash in Mako and Mako Python blocks (<% ... %>), but those are usually unnecessary. If I forget and try to use a trailing backslash in Mako, it's likely to cause a build error, which is the behavior I want.

Leading hashes

Mako interprets ## at the start of a line as a comment, which Markdown uses for <h2>. I found out about this one almost immediately after starting to use Markdown on this site. I don't need that style of Mako comments, so I wanted to disable them.

I've had a workaround in place for a while that pre-processed the input and injected <h2> tags before Mako ran. That worked well for me, but in theory it would break ## if used in code blocks or in Mako Python blocks (<% ... %>).

I've updated my workaround to replace a leading ## with # followed by a unique string before Mako runs, and then replace it back after Mako runs. I think that'll work in all contexts.

Social sharing metadata

I added some meta tags for social media. The Open Graph Protocol metadata is used by Meta, LinkedIn, and others, but that site is archived on GitHub. Facebook's page is another resource. Twitter has its own metadata but will largely use Open Graph metadata if available. fediverse:creator is used on Mastodon.

Note: there's a technical distinction between:

<meta name="..." content="...">

and

<meta property="..." content="...">

where some fields want one or the other.

This metadata required some code generator changes. Similar to the page title, the metadata is set above where the main page content is rendered. That information has to be "pushed up" to be available outside the main content. Also, the pages had to be made aware of their URLs for og:url.

The OpenGraph Protocol requires og:image to be set, yet most of my pages don't have an image. I created a larger version of the favicon as a default.

Here's an example of a link to this blog post on Mastodon:

Favicon

Since I redrew the favicon as an SVG for the social sharing image, I also added the SVG for browsers that support it. I created a dark mode variant, but I don't think mainstream browsers support that yet (see the links from this issue). Firefox seems to ignore the prefers-color-scheme and take the last SVG. Other browsers may default to the first SVG, so I have sandwiched the dark favicon declaration in between two light ones:

<link rel="icon" type="image/svg+xml" sizes="any"
    media="(prefers-color-scheme: light)" href="favicon-light.svg" />
<link rel="icon" type="image/svg+xml" sizes="any"
    media="(prefers-color-scheme: dark)" href="favicon-dark.svg" />
<link rel="icon" type="image/svg+xml" sizes="any"
    media="(prefers-color-scheme: light)" href="favicon-light.svg" />
<link rel="icon" type="image/png" sizes="16x16"
    href="favicon.png" />

Hopefully dark mode favicons will start working in more browsers over the next few years.

Heading links

I added links to many headings to make sharing URLs that jump to a place on the page easier. On mobile, you can see an example # link above this paragraph. On desktop, hover over the heading to see it.

I chose not to make the entire heading a link because sometimes headings do legitimately link to things, and that could get confusing. I like that the # is hidden until hovering on desktop, but hovering is not really an available gesture on touch-screens.

I didn't want the RSS feed to include these links because they might not work, so the <a> tags have no content in the HTML, and CSS fills in the #. Since RSS feeds don't use the CSS, they will just get an invisible empty link.

I added these links manually to a bunch of headings. I didn't see a good way to automate that, since the value of the id attribute is worth customizing and the placement would not be obvious to an algorithm. I ended up adding a bunch of <section> elements (2015) for this.

HTML and CSS updates

I updated and modernized some HTML and CSS, including use of these newer features:

var() and custom properties (2017) are useful for defining theme colors for light/dark modes.
:has() (2023) was useful in styling just the <pre> elements that contain <code>.
:is() (2021) helped save some duplication for styles related to h1-h6.
:not() (2021) helped keep some styles self-contained.
box-sizing: border-box (2015) is old news but I started this website in 2007.

For testing dark mode and the styling updates, it was convenient to have the blog index page render the entire contents of the blog on one page that I could scroll through quickly.

I tried to keep the website usable with older phones. For example, I assume iOS users have Safari 15 but not necessarily Safari 16 or newer yet. These are some CSS features that look nice but that I've avoided for now until they're more widely available:

@scope (not available yet in Firefox) would be helpful to create a namespace for Pygments' classes.
nesting (2023) and & nesting selector (2023) would improve readability significantly.
Range comparisons in @media queries (2023) would be a little easier to read.
light-dark (2024) might avoid some variable definitions for dark mode.

Command Not Found

ongardie

2021-01-16 01:12

On Debian/Ubuntu, command-not-found tells you what package to install when you try to run a program you don't have. I find this helpful, but it takes a long time to maintain its index for lookups. This post tells the meandering story of how I first optimized command-not-found, then replaced it with a script that doesn't use an index at all.

update-command-not-found is slow

I noticed recently that apt update stalls on my computer after fetching new data:

Get:139 http://ftp.us.debian.org/debian unstable/non-free amd64 Contents (deb) T-2021-01-15-0800.20-F-2021-01-14-0800.15.pdiff [2,707 B]
Fetched 1,504 kB in 39s (38.5 kB/s)

...stall for about 15 seconds...

Reading package lists... Done
Building dependency tree
Reading state information... Done
10 packages can be upgraded. Run 'apt list --upgradable' to see them.

I found the culprit to be update-command-not-found from the command-not-found package. That package provides a useful error message when you attempt to run a command you don't have installed. It searches through the APT cache for Debian (or Ubuntu or whatever) packages that would install an executable with the same name. Here's an example:

$ python2

Command 'python2' not found, but can be installed with:

sudo apt install python2-minimal

To make this search efficient, update-command-not-found builds a lookup table when apt update runs. This table is stored in in /var/lib/command-not-found/. Unfortunately, building this lookup table and maintaining it is slow, at least for me.

The code is written in Python. It's maintained upstream by Ubuntu but is modified by Debian (in a reversal from their typical roles). The relevant code for me is all from Debian's changes that read through package Contents files. The changes are kept in a series of patches inside the debian/patches/ directory (using quilt), and the relevant patch is 0003-cnf-update-db-Add-support-for-Contents-files.patch. My computer seems to spend all its time in the method _parse_single_contents_file.

Upon profiling update-command-not-found with pyinstrument and plenty of trial-and- error, I was able to speed it up by about 40% (again, on my computer) with a minor change. The patch itself is quite small:

     def _parse_single_contents_file(self, con, f, fp):
         # read header
         suite=None      # FIXME
+        pattern = re.compile(b'usr/sbin|usr/bin|sbin|bin')

         for l in fp:
-            l = l.decode("utf-8")
-            if not (l.startswith('usr/sbin') or l.startswith('usr/bin') or
-                    l.startswith('bin') or l.startswith('sbin')):
+            if not pattern.match(l):
                 continue
+            l = l.decode("utf-8")
             try:
                 command, pkgnames = l.split(None, 1)
             except ValueError:

Each line l in the file stream fp comes from a Contents file, which looks like this:

usr/bin/cvs                       vcs/cvs
usr/bin/parallel                  utils/moreutils,utils/parallel
usr/share/cvs/contrib/README      vcs/cvs

The contents files are kept in /var/lib/apt/lists/ and have names like ftp.us.debian.org_debian_dists_unstable_non-free_Contents-amd64.lz4. They can be decompressed with:

lz4 -d < FILE

or:

/usr/lib/apt/apt-helper cat-file FILE

Update (2021-03-29): You might not have the contents files on your system. If you don't, install the apt-file package and then run apt update.

Before, the code would check if each line started with four separate strings. I optimized that to use a pre-compiled regular expression, which is more efficient. I also deferred decoding the line from an array of bytes into a Unicode string. Many lines don't have the ASCII prefixes we're looking for, so we don't need to spend time decoding those.

I submitted this change to Debian in bug #980076.

That improvement got the updates down from about 16 seconds to about 10 seconds for me. I looked for more opportunities to speed up update-command-not-found, but nothing major jumped out at me. One approach would be to parallelize the work so multiple files can be searched at once. Another would be to rewrite the tool in a faster language; there's lots of discussion about this in Debian bug #881692.

Skip the index

Then I started wondering whether the index that update-command-not-found builds is even worth building. If I'm typing a command in my terminal and it doesn't exist, I'm OK with waiting a second or so to be told what packages to install. Can we search the Contents files quickly enough to make this interactive without an index?

There's a package called apt-file that command-not-found depends on and does just this. It's written in Perl. This is equivalent to what update-command-not-found does:

apt-file search --regex "^(usr/)?s?bin/$PACKAGE$"

It's slower than I'd like (times shown are with warm caches):

$ /bin/time -p apt-file search --regex "^(usr/)?s?bin/cvs$"
cvs: /usr/bin/cvs
real 3.35
user 4.44
sys 1.08

The man page warns users that the --regexp option is slow. That turns out to be true. This next invocation is equivalent but significantly faster:

$ /bin/time -p apt-file search "bin/cvs" | grep -P ": /(usr/)?s?bin/cvs$"
cvs: /usr/bin/cvs
real 1.22
user 1.57
sys 0.52

Honestly, that's fast enough, and I probably should have stopped there. Spoiler: I didn't.

Parallel search

Searching through multiple Contents files is trivially parallelizable, or at least it should be. I don't know Perl, so I didn't want to make major changes to apt-file.

What I wanted to do was a parallel execution of apt-helper cat-file piped into grep. In shell scripts, you can do this using tools like GNU parallel or the conflicting program from moreutils. I came across this 2012 rant on GNU parallel by Joey Hess, the developer of moreutils, and that's enough reason for me to avoid it. The whole mess of not knowing which parallel might be installed at /usr/bin/parallel, if any, was also off-putting.

Instead, I switched to using ripgrep. ripgrep (rg) is parallel by default. It can decode compressed files with the -z flag if lz4 and similar programs are available.

This initial test was promising:

$ /bin/time -p rg -z --no-filename "^(usr/)?s?bin/cvs\s" /var/lib/apt/lists/*Contents* | cat
usr/bin/cvs                   vcs/cvs
usr/bin/cvs                   vcs/cvs
usr/bin/cvs                   vcs/cvs
usr/bin/cvs                   vcs/cvs
usr/bin/cvs                   vcs/cvs
usr/bin/cvs                   vcs/cvs
real 0.39
user 1.42
sys 2.12

The pipe into cat is there to disable ripgrep's "nice" output for interactive terminals.

So, this is definitely fast enough for interactive use. Even after I drop my caches, the same command only takes about 0.46 seconds on this machine.

My actual script, which you can find at the end of this post, has some more bells and whistles:

It falls back to a regular grep when rg or lz4 are unavailable.
It doesn't scan through the small files in /var/lib/apt/lists/ with extension .diff_Index.

Extracting the package names

The next challenge was formatting the data. If you remember from the earlier example, some of the lines in the Contents files have a comma-separated list of packages. I don't know of a great way to deal with that in shell script. I ended up with this:

sed -E 's/^.* +//; s/,/\n/g; s/^.*\///m'

Update (2025-04-27): Fixed two bugs in the above (missing -E, which is needed for the +, and missing m flag, which is needed for multiline processing).

The input looks like:

usr/bin/parallel             utils/moreutils,utils/parallel

The first regular expression strips out the path, leaving:

utils/moreutils,utils/parallel

The second regular expression breaks it into lines by comma, leaving:

utils/moreutils
utils/parallel

The third regular expression strips off the section names, leaving:

moreutils
parallel

Finally, that gets piped through sort -u to remove the duplicates from multiple architectures or distributions.

Printing relevant information

I could just print the package names and call it a day, but it's helpful to print more information. The best way I know to do this is with apt search:

$ apt search --names-only '^(moreutils|parallel)$'
Sorting... Done
Full Text Search... Done
moreutils/stable 0.62-1 amd64
  additional Unix utilities

parallel/stable,stable,testing,testing,unstable,unstable,now 20161222-1.1 all
  build and execute command lines from standard input in parallel

Sadly, this takes about 1 second, but I think it's valuable enough to be worth the delay.

You're not really supposed to use apt in a script like this because its output may change. I'm not really bothered by that, though, since my script is intended for human consumption. The apt man page says to use apt-cache instead. However, I don't see a way to get apt-cache to format the results in a similar way.

Building that regular expression from the list of packages is also not obvious in a shell script. I ended up with this ugly thing:

PACKAGES_DISJUNCTION=$(echo $PACKAGES | sed 's/ /|/g')
apt search --names-only "^($PACKAGES_DISJUNCTION)$"

Putting it all together

I've assembled all this into a script called apt-binary, along with how to enable it for bash and zsh.

If you're removing command-not-found from your system, use apt purge command-not-found to get rid of its index, too.

Update (2021-03-29): Note that you'll want to keep the apt-file package installed, since that package registers hooks with apt to download the package contents files.

I hope this was useful or that you learned a trick or two. I found it to be a frustrating exercise. I'm a professional software engineer that's comfortable with several programming languages and different models for parallel programming, yet for "simple" scripts like this, I'm sort of forced into ancient UNIX tooling. In this environment, simple parallelism problems and simple string manipulation can actually be pretty difficult. I'd normally use Python for a language that's universally available with no setup, but it's not well-suited to parallel or high-performance code. Go or Rust would have been better choices, but they might not be set up everywhere. I think I settled on an OK compromise here with ripgrep falling back to grep and a bunch of regular expressions; I just feel like this should have been easier.

Update (2025-04-28): New post describing a Nushell port of this script.

Website Updates (2020)

ongardie

2020-08-07 23:30

The code for this website has been running without any major updates since 2009. Back then, I wrote it as a Python 2 program using FastCGI, served originally by lighttpd and more recently by Caddy.

I recently overhauled the code to run as a static site generator. This makes it easier to run locally, feels better from a security perspective, and actually simplifies the code in a few ways:

When serving individual requests, you need to figure out what page the request is asking for. With generating an entire site, you just loop through all the possible pages.
When serving individual requests, you need to load in only the relevant data for those requests. When generating an entire site, you can just load the input data once at startup.
When serving individual requests, you need to recover gracefully from errors. When generating an entire site, you can just let any exceptions propogate to crashing the program and have the user fix the problem and rerun.

One thing I gave up in switching to a static site generator was the page trail. Before, the history of where you've been on this site was tracked with a session cookie and displayed just under the title bar. I somehow felt that was an important feature in the early 2000s, but that sort of navigation is better suited to browser history today.

On a related note, I used to host the code for this website with a local cgit instance. I've turned that off and moved the code to a GitHub repo instead. If you're curious, you can look through the history of that repo to see what's changed.

LogCabin 1.1

ongardie

2015-07-26 18:27

Announcing the release of LogCabin 1.1! This is the second stable release of the LogCabin coordination service, which includes a C++ implementation of the Raft consensus algorithm.

Continue reading full article

LogCabin 1.0

ongardie

2015-05-01 23:57

LogCabin 1.0 is out! This is the first stable release of the LogCabin coordination service, which includes a C++ implementation of the Raft consensus algorithm. If you're new to these, I recently spoke about Raft and a little about LogCabin at the Sourcegraph Hacker Meetup in San Francisco; watch the video for a visual walk-through of how Raft works.

Continue reading full article

LogCabin.appendEntry(6, "Preparing for 1.0")

ongardie

2015-04-03 22:06

This is the sixth in a series of blog posts detailing the ongoing development of LogCabin. This entry describes progress towards the upcoming 1.0.0 release of LogCabin, a useful new command-line client to access LogCabin, and several other improvements.

Continue reading full article

LevelDB with Transactions on Node.js

ongardie

2015-03-11 00:17

This post surveys the libraries available in Node.js for LevelDB, shows that support for transactions is missing, and lays out a path to get there. In short, we need the Node bindings for LevelDB to expose snapshots, then we can build transactions with snapshot isolation on top without much trouble.

Continue reading full article

VM setup

CPU and RAM

Display

Suspend

Shared filesystem

Disk image size and VM migration

Keyboard

Webcam

Conclusion

Comparison

Performance

Refactoring issues

Closing thoughts

nocloud images

cloud-init images

libvirt

Someday/maybe

Other distributions and operating systems

Debian

Fedora

FreeBSD

Ubuntu

Dark mode and syntax highlighting

RSS feed

Markdown renderer

Mako parsing conflicts

${variable}

Trailing backslash

Leading hashes

Social sharing metadata

Favicon

Heading links

HTML and CSS updates

update-command-not-found is slow

Skip the index

Parallel search

Extracting the package names

Printing relevant information

Putting it all together

`nocloud` images

`cloud-init` images

`${variable}`