QEMU, KVM, and GPU passthrough on Debian testing posted Wed, 15 Jul 2015 14:55:03 UTC

I decided to take the plunge and try to run everything on one machine. I gutted both of my existing machines and bought a few extra parts. The final configuration ended up using an AMD FX-8350 on an ASRock 970 Extreme4 motherboard with 32GB of RAM in a Fractal Design R5 case. I've got a GeForce 8400 acting as the display under Linux and a GeForce 670 GTX being passed through to Windows.

I am using the following extra arguments on my kernel command line:


pci-stub.ids=10de:1189,10de:0e0a rd.driver.pre=pci-stub isolcpus=4-7 nohz=off

The identifiers I'm specifying are for the GPU and HDMI audio on my Geforce 670 so the nouveau driver doesn't latch onto the card. To further help prevent that situation, the rd.driver.pre statement should load the pci-stub driver as early as possible during the boot process. It's worth noting I'm using dracut. And finally, isolcpus is basically blocking off those 4 cores to prevent Linux from scheduling any processes on those cores. Along that same line of thinking, I tried to add the following to /etc/default/irqbalance:


IRQBALANCE_BANNED_CPUS=000000f0

but realized the current init.d script that systemd is using to start irqbalance won't ever pass along that environment variable correctly, so for now, I'm starting irqbalance by hand after boot.

I added these modules to /etc/modules:


vfio
vfio_pci

I added these options to /etc/modprobe.d/local.conf (you might need to remove the continuation characters and make that all one line):


install vfio_pci /sbin/modprobe --first-time --ignore-install vfio_pci ; /bin/echo 0000:02:00.0 > /sys/bus/pci/devices/0000:02:00.0/driver/unbind ; \
        /bin/echo 10de 1189 > /sys/bus/pci/drivers/vfio-pci/new_id ; /bin/echo 0000:02:00.1 > /sys/bus/pci/devices/0000:02:00.1/driver/unbind ; \
        /bin/echo 10de 0e0a > /sys/bus/pci/drivers/vfio-pci/new_id ; /bin/echo 0000:05:00.0 > /sys/bus/pci/devices/0000:05:00.0/driver/unbind ; \
        /bin/echo 1b21 1042 > /sys/bus/pci/drivers/vfio-pci/new_id ; /bin/echo 0000:00:13.0 > /sys/bus/pci/devices/0000:00:13.0/driver/unbind ; \
        /bin/echo 0000:00:13.2 > /sys/bus/pci/devices/0000:00:13.2/driver/unbind ; /bin/echo 1002 4397 > /sys/bus/pci/drivers/vfio-pci/new_id ; \
        /bin/echo 1002 4396 > /sys/bus/pci/drivers/vfio-pci/new_id
options kvm-amd npt=0

So that looks like a mess, but it's fairly straightforward really. Since I can't easily pass my device identifiers for USB through the kernel command line, I'm unbinding them individually and rebinding them to the vfio-pci driver. I'm passing through all the USB2 and USB3 devices running the ports on the front of my case. I'm also binding the GPU/audio device to vfio-pci here and specifying an option to KVM which is suppose to help performance on AMD machines. I setup some hugepage reservations and enabled IPv4 forwarding in /etc/sysctl.d/local.conf:


# Set hugetables / hugepages for KVM single guest needing 8GB RAM
vm.nr_hugepages = 4126

# forward traffic
net.ipv4.ip_forward = 1

Since bridging is my network usage model of choice, I needed to change /etc/network/interfaces:


auto lo br0
iface lo inet loopback

iface eth0 inet manual

iface br0 inet dhcp
        bridge_ports eth0
        bridge_stp off
        bridge_waitport 0
        bridge_fd 0

Putting everything I've discovered together, I've created a shell script. It includes all of the different things that need to happen like setting the cpufreq governor and pinning individual virtual CPU thread process identifiers to their respective physical CPU's. I'm using zsh as that is my goto shell for all things, but most anything should suffice. The script also depends on the presence of the qmp-shell script available here. You will want both the qmp-shell script itself, and the dependent Python library called qmp.py. Once all of that is in place, here is the final script to start everything:

#!/bin/zsh

for i in {4..7}; do
        echo performance > /sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
        #cat /sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
done

taskset -ac 4-7 qemu-system-x86_64 -qmp unix:/run/qmp-sock,server,nowait -display none -enable-kvm -M q35,accel=kvm -m 8192 -cpu host,kvm=off \
        -smp 4,sockets=1,cores=4,threads=1 -mem-path /dev/hugepages -rtc base=localtime -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root \
        -device vfio-pci,host=02:00.0,bus=root,addr=00.0,multifunction=on,x-vga=on -vga none -device vfio-pci,host=02:00.1,bus=root,addr=00.1 \
        -device vfio-pci,host=05:00.0 -device vfio-pci,host=00:13.0 -device vfio-pci,host=00:13.2 -device virtio-scsi-pci,id=scsi \
        -drive if=none,file=/dev/win/cdrive,format=raw,cache=none,id=win-c -device scsi-hd,drive=win-c -drive if=none,file=/dev/win/ddrive,format=raw,cache=none,id=win-d \
        -device scsi-hd,drive=win-d -drive if=none,format=raw,file=/dev/sr0,id=blu-ray -device scsi-block,drive=blu-ray -device virtio-net-pci,netdev=net0 \
        -netdev bridge,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper &

sleep 5

cpuid=4
for threadpid in $(echo 'query-cpus' | qmp-shell /run/qmp-sock | grep '^(QEMU) {"return":' | sed -e 's/^(QEMU) //' | jq -r '.return[].thread_id'); do
        taskset -p -c ${cpuid} ${threadpid}
        ((cpuid+=1))
done

wait

for i in {4..7}; do
        echo ondemand > /sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
        #cat /sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
done

I force the CPU cores assigned to the VM to run at their maximum frequency for the duration of the guest, after which, they scale back down into their normal on-demand mode. I found this helps to smooth out things a little bit more and helps to provide something approaching a physical machine experience, even though I'm using more power to get there. I'm also using qmp-shell to check the PID's of the vCPU threads and assigning each of them to individual pCPU's.

I ended up using the q35 virtual machine layout instead of the default. I'm not positive this matters, but I did end up adding the ioh3420 device later in my testing and it really did seem to improve performance a little bit more. Whether that requires using the q35, I'm not certain. And anyway, once the devices were detected and running under Windows after I first moved from physical to virtual, it wasn't worth it to me to switch back to the default machine type. I'm also using the legacy SeaBIOS instead of OVMF since I was migrating from physical to virtual and it was too much trouble trying to make a UEFI BIOS work after the fact.

Initially I wasn't using virtio based hardware, so you'll possibly need to change that to get up and running and then add in the virtual devices and load the proper virtio drivers. I did run into some weirdness here for a long time where Windows 7 kept crashing trying to install the drivers for either virtio-blk-pci or virtio-scsi-pci. I was using the current testing kernel (linux-image-3.16.0-4-amd64) and never really found a solution. I did end up installing a clean copy of Windows and was able to install the virtio stuff, but this really didn't help me. I finally ended up installing the latest unstable kernel which is linux-image-4.0.0-2-amd64 and I was finally able to install the virtio stuff without the guest OS crashing. I have no idea if that was the actual fix, but it seemed to be the relevant change.

Another thing that took awhile to figure out was how to properly pass through my Blu-ray drive to Windows so that things like AnyDVD HD worked correctly. I finally stumbled across this PDF which actually included qemu related commands to doing passthrough. It ended up being a simple change from scsi-cd to scsi-block.

I also had to forcibly set the GPU and audio drivers under Windows to use MSI by following these directions. Before doing this, audio was atrocious and video was pretty awful too.

That's most of it I think. When I originally posted this, I still wasn't quite happy with the performance of everything. However, in the current incarnation, aside from the possibly excessive power consumption caused by keeping the CPU's running at full tilt, I'm actually really happy with the performance. Hopefully other people will find this useful too!