Incus Fleeting Plugin
Over the last few days I’ve build a new fleeting plugin for Incus, the fork of Canonical’s LXD. With the plugin it’s now possible to use Incus hypervisors with the Gitlab autoscaler runner. The autoscaler runner creates and destroys VMs based on configured limits, and is mostly cloud and hypervisor agnostic. You can find the project here:
github.com/pkramme/fleeting-plugin-incus
Gitlab’s autoscaler runner
Gitlab introduced a new way to create autoscaling CI/CD infrastructure with the “fleeting” library and their new autoscaler runner. Officially, Gitlab supports the big three cloud providers: AWS, GoogleCloud and Azure. The fleeting system is supposed to replace the long time abandoned docker-machine runner, which previously was the only way to let a gitlab-runner create and destroy VMs based on amount of jobs and inside certain limits.
Unfortunately, until now the problem with private CI/CD infrastructures wasn’t really solved - the old docker-machine driver was technically able to create VMs on private infrastructure with docker-machine specific plugins. The new fleeting system unofficially supports other systems aswell, such as kubernetes, but I wasn’t able to find plugins for any private on-prem hypervisor backends.
A plugin is a standalone binary, which is loaded by the gitlab-runner process. The implementation follows a client-server model, where the plugin imports a specific library for communicating with the fleeting system, in this case the runner. The runner, through the fleeting library, can then request new instances, ask for connection details or destroy existing VMs.
Why autoscaling outside of clouds?
So why build something new? Scaling for load isn’t the only thing the autoscaler does. Without the new autoscaler or the old docker-machine runners, some workloads cannot be safely executed. Especially building docker images with DinD inside the Gitlab CI requires the --privileged
flag, which gives the CI job effectively complete control over the host system. If you these kinds of jobs on a Gitlab instance with multiple privilege levels without using different runners per level and project, you are asking for trouble. Even then, jobs running after a runner was compromised will obviously be unsafe, too.
Previously you could either
- setup many runners
- use docker-machine with its drawbacks
- use kaniko
- roll the dice
Now, you can use the autoscaler with its plugins, to create new VMs for every job and be reasonably safe from supply chain attacks or bad actors. And with this plugin you can do this on private infrastructure.
Other benefits
- Compared to one docker based runner
- Better resource isolation through the usage of VMs, no priority inversions or memory stalls affecting other jobs
- Better security through the usage of VMs instead of Linux namespace features
- Compared to public clouds
- Dependending on the hardware used, much better performance in all regards
- Better security through dedicated and/or on-prem infrastructure, no data leaks, no sidechannel attacks, etc
- Much better cost effectiveness, including not being affected by “run-away” usage costs
- Possibly better compliance with local regulations, such as GDPR
- Compared to k8s based runners
- All default runner features work and migration is possible without any code changes to any pipelines
- DinD style workloads are possible
Installation and Configuration
The installation is simple, because fleeting plugins are just binaries that are executed by gitlab-runner
.
git clone github.com/pkramme/fleeting-plugin-incus
cd fleeting-plugin-incus/cmd/fleeting-plugin-incus
go build
install fleeting-plugin-incus /usr/local/bin/fleeting-plugin-incus
The plugin also needs a VM image it can start. Create a base VM with incus.
# generate SSH keys
ssh-keygen -t ed25519
# create a base VM
incus launch images:ubuntu/22.04 runner-base --vm
incus exec runner-base bash
Then, prepare the VM.
# inside the VM
apt-get update && apt-get install openssh-server docker.io nano -y
# deploy your generated ssh public key inside VM
mkdir ~/.ssh/
nano ~/.ssh/authorized_keys
# shutdown
poweroff
Now, publish the VM image.
incus publish runner-base --alias runner-base --reuse
After that, configure the plugin according to the gitlab documentation. The plugin supports some configuration options. You need to set the base image correctly.
[runners.autoscaler.plugin_config]
incus_image = "runner-base"
incus_instance_key_path = "/root/.ssh/id_ed25519"
After a systemctl restart gitlab-runner
you should see something like this:
root@gitlab-runner-2:~# incus ls runner -f compact
NAME STATE IPV4 IPV6 TYPE SNAPSHOTS
runner-kelflllsrb RUNNING 172.17.0.1 (docker0) fd42:7bc9:1310:295c:216:3eff:fe02:9cd7 (enp5s0) VIRTUAL-MACHINE 0
10.199.247.164 (enp5s0)
runner-mhuwwvpfrd RUNNING 172.17.0.1 (docker0) fd42:7bc9:1310:295c:216:3eff:fe5c:b388 (enp5s0) VIRTUAL-MACHINE 0
10.199.247.45 (enp5s0)
Additionally, I suggest you configure docker registry pull-through caches and S3 for caching.