Single-node K8s: KinD, Istio, & MetalLB
I have a cheap dedicated server with the German hosting company Hetzner. It's just for my own personal use, with lots of disk space to satisfy my data hoarding and enough ram to do the things I need to do. I have it mostly because my home internet is slow. Over time, there got to be a lot of services -- Plex, a Bitcoin node, an IPFS gateway, my e-mail, and recently in my attempts to de-google my life, I've added a search engine, a youtube replacement, a twitter replacement, and a pastebin replacement, among other things. (My rants about big data and the erosion of privacy are for another time!) Those services have been running with docker-compose but since I've been working with Kubernetes for my master's thesis project (a distributed stock exchange order matching engine running on K8s), I wanted to move them over to that. Aside from practicing my Kubernetes-fu, I wanted a setup that could easily be transferred from my single server to a more robust multi-node potentially cloud-hosted platform when I eventually upgrade.
The problem until recently has been that the single-server Kubernetes options were not great. Minikube worked fine but had the added overhead of a VM, and attaching host networking so I could expose my public IP was difficult. Other Docker-based options I tried worked well but had networking issues or other problems that made them not quite a full K8s experience. However, recently I tried Kubernetes in Docker (KinD), and found that none of the issues I had previously were present. It worked without hassle or extra setup, and it was easy to mount my host folders into the K8s nodes for services such as Plex and my backup jobs. And while logically there are many extra layers, those translate physically into nothing more than namespaces and other context wrappers -- which means they don't suffer the performance drawbacks of something like a VM. (I was initially concerned that the KinD nodes were VMs wrapped in containers a la Rancher VM, but it turns out they're just
containerd wrapped in containers.) Obviously, there is still overhead, but it was little enough that the benefits finally outweighed the costs.
Although the configuration works well, there were a handful of small issues I ran into, so I thought I'd describe how I did it -- my single-node full Kubernetes instance with Kubernetes-in-Docker, MetalLB, and Istio. There's one hacky workaround to tunnel inbound traffic from the host into Kubernetes, but other than that, it's all mostly straightforward.
The first step is to have Docker and the KinD binary installed on your system. From there, you do a normal KinD installation using whatever configuration you want. For me, that's a basic installation except I've added some host directories to use for app storage rather than a third party PV provider. (I considered setting up a storage backend like Minio S3 to gain some experience with volumes and storage classes, but at some point practicality needs to win, and especially for things like Plex, the unnecessary overhead was more than a philosophical issue.)
Wanting to use the most recent version of Kubernetes, which at the time of this writing was 1.18.4, I manually specified the node image, and ran the setup:
$ kind create cluster --image docker.io/kindest/node:v1.18.4 \ --name kind --config /path/to/kind-config.yaml
The config file merely specified one control plane node and three workers, with my host storage mounted on every worker node. (This way, I could specify the node path in the application's pod spec and know that it would be there on whatever node the pod was created.)
$ cat kind-config.yaml --- kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker extraMounts: - hostPath: /path/to/host/storage containerPath: /mnt/host_storage/ - role: worker extraMounts: - hostPath: /path/to/host/storage containerPath: /mnt/host_storage/ - role: worker extraMounts: - hostPath: /path/to/host/storage containerPath: /mnt/host_storage/ ---
At this point, once KinD has bootstrapped the cluster, you need to either get the raw kubeconfig data by running
kind get kubeconfig, or let KinD set your kubectl context automatically by running
kind export kubeconfig. Ideally, everything shows up as working correctly:
$ kubectl get all --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system pod/coredns-66bff467f8-blj4s 1/1 Running 0 69s kube-system pod/coredns-66bff467f8-jvtn5 1/1 Running 0 69s kube-system pod/etcd-kind-control-plane 1/1 Running 0 78s kube-system pod/kindnet-fgmqb 1/1 Running 0 52s kube-system pod/kindnet-k8xpw 1/1 Running 0 69s kube-system pod/kindnet-q47hs 1/1 Running 0 52s kube-system pod/kindnet-rflnb 1/1 Running 2 52s kube-system pod/kube-apiserver-kind-control-plane 1/1 Running 0 78s kube-system pod/kube-controller-manager-kind-control-plane 1/1 Running 0 78s kube-system pod/kube-proxy-4w24t 1/1 Running 0 52s kube-system pod/kube-proxy-6k5zx 1/1 Running 0 69s kube-system pod/kube-proxy-tz2vh 1/1 Running 0 52s kube-system pod/kube-proxy-xgf7j 1/1 Running 0 52s kube-system pod/kube-scheduler-kind-control-plane 1/1 Running 0 78s local-path-storage pod/local-path-provisioner-67795f75bd-jx4dg 1/1 Running 0 69s NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 86s kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 85s NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-system daemonset.apps/kindnet 4 4 4 4 4 <none> 84s kube-system daemonset.apps/kube-proxy 4 4 4 4 4 kubernetes.io/os=linux 85s NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE kube-system deployment.apps/coredns 2/2 2 2 85s local-path-storage deployment.apps/local-path-provisioner 1/1 1 1 83s NAMESPACE NAME DESIRED CURRENT READY AGE kube-system replicaset.apps/coredns-66bff467f8 2 2 2 69s local-path-storage replicaset.apps/local-path-provisioner-67795f75bd 1 1 1 69s
and the host storage mounts are accessible everywhere an application might be deployed:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0c981780e342 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes 127.0.0.1:41459->6443/tcp kind-control-plane ecc5a3120914 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker c126cf79ed65 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker3 5c05cf09e5d6 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker2 $ docker exec -it ecc5a3120914 ls -lAh /mnt/host_storage/ total 4.0K -rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt $ docker exec -it c126cf79ed65 ls -lAh /mnt/host_storage/ total 4.0K -rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt $ docker exec -it 5c05cf09e5d6 ls -lAh /mnt/host_storage/ total 4.0K -rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt
I had in my head that Istio could also take over the function of the base overlay CNI network for Kubernetes, but I did not manage to get that working, so I let KinD install its default simple CNI network "kindnetd", and then I installed Istio on top of that. Even though I probably should have used the Kubernetes-native way of using a Helm chart, I'm lazy and just used the default profile that comes with Istio's
$ istioctl install This will install the default Istio profile into the cluster. Proceed? (y/N) Y Detected that your cluster does not support third party JWT authentication. Falling back to less secure first party JWT. See https://istio.io/docs/ops/best practices/security/#configure-third-party-service-account-tokens for details. ✔ Istio core installed ✔ Istiod installed ✔ Ingress gateways installed ✔ Addons installed ✔ Installation complete
And now we see Istio up and running, with its ingress gateway waiting patiently for an external IP that we will give it shortly.
$ kubectl get all -n istio-system NAME READY STATUS RESTARTS AGE pod/istio-ingressgateway-66c7db878f-gp9vz 1/1 Running 0 3m31s pod/istiod-7cdc645bb4-2h75j 1/1 Running 0 4m15s pod/prometheus-5c84c494dd-879h4 2/2 Running 0 3m31s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/istio-ingressgateway LoadBalancer 10.100.197.167 <pending> 15021:31353/TCP,80:32377/TCP,443:31939/TCP,15443:30195/TCP 3m31s service/istiod ClusterIP 10.105.18.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP,53/UDP,853/TCP 4m15s service/prometheus ClusterIP 10.107.187.215 <none> 9090/TCP 3m31s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/istio-ingressgateway 1/1 1 1 3m31s deployment.apps/istiod 1/1 1 1 4m15s deployment.apps/prometheus 1/1 1 1 3m31s NAME DESIRED CURRENT READY AGE replicaset.apps/istio-ingressgateway-66c7db878f 1 1 1 3m31s replicaset.apps/istiod-7cdc645bb4 1 1 1 4m15s replicaset.apps/prometheus-5c84c494dd 1 1 1 3m31s NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE horizontalpodautoscaler.autoscaling/istio-ingressgateway Deployment/istio-ingressgateway <unknown>/80% 1 5 1 3m31s horizontalpodautoscaler.autoscaling/istiod Deployment/istiod <unknown>/80% 1 5 1 4m15s
At this point, everything works, but I still need to somehow route traffic from the host into the Kubernetes network. Since I wanted the manifests portable, I also wanted to be able to specify LoadBalancer as a service type, instead of just always using NodePorts for my configurations. After some research I ended up turning to MetalLB, the "bare metal load balancer" -- i.e. a set of configurations to setup Kubernetes with manual IP addresses.
The first issue I ran into with MetalLB was that it didn't create its namespace
metallb-system and it would fail to install if that wasn't present. So I manually created the namespace by applying the following simple yaml:
--- apiVersion: v1 kind: Namespace metadata: name: metallb-system labels: app: metallb istio-injection: disabled ---
Next was to create a basic config for it, describing which IP addresses it could give out as LoadBalancer addresses. I don't know if this was the right thing to do or not, but I ended up deciding to use the node IP address space for the KinD nodes. I was worried that if I told it to always use my host public IP that that might cause issues. I wasn't sure if the node IP address space would work or be visible from the nodes or services, but I tried it out and it worked... for the most part. The cluster nodes and the pods inside the cluster could access the IP, but the host machine could not. (Don't worry, we'll hack a fix for that shortly.)
The node cluster IP space can be found a number of ways. You can run
$ docker inspect network kind
or you can just query a node directly with something like
$ docker exec -it 0c981780e342 ip addr show dev eth0
My cluster's node IP space was
172.18.0.0/16 with nodes being assigned inside
172.18.0.0/24. I decided to use
172.18.6.0/24 figuring ~500 IPs would be plenty for whatever I was going to do. So the config map yaml which I applied next was the following:
--- apiVersion: v1 kind: ConfigMap metadata: namespace: metallb-system name: config data: config: | address-pools: - name: custom-ip-space protocol: layer2 addresses: - 172.18.5.2-172.18.5.254 - 172.18.6.2-172.18.6.254 ---
I didn't think I needed to leave room for a gateway and broadcast, since this wasn't actually a new network, but I also wasn't sure how MetalLB worked, so I figured better safe than sorry. (Editor's note: It turns out MetalLB works at a higher level than that, using ARP spoofing or alternatively BGP routing to direct traffic to the correct cluster node.)
Lastly, before MetalLB will work, we need to generate a random secret for it to use. The secret is nothing more than 128 random bytes, base64-encoded, with the line breaks intact. An example method to generate this secret string would be:
$ dd if=/dev/urandom bs=1 count=128 2>/dev/null | base64
$ openssl rand -base64 128
The secret has to have the name
memberlist, be in the
metallb-system namespace, and be a single key/value, with the key name
secretkey and the value being the random bytes generated earlier.
An example command to create this new secret might be:
$ kubectl create secret generic -n metallb-system \ memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
Once this new secret is created, you can just install the latest version of MetalLB directly from the source. In my instance, the most recent release on github was 0.9.3, so I applied the following:
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
which created several new resources:
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml podsecuritypolicy.policy/controller created podsecuritypolicy.policy/speaker created serviceaccount/controller created serviceaccount/speaker created clusterrole.rbac.authorization.k8s.io/metallb-system:controller created clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created role.rbac.authorization.k8s.io/config-watcher created role.rbac.authorization.k8s.io/pod-lister created clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created rolebinding.rbac.authorization.k8s.io/config-watcher created rolebinding.rbac.authorization.k8s.io/pod-lister created daemonset.apps/speaker created deployment.apps/controller created
With that, the basic installation is complete. Istio is configured and any services that we assign LoadBalancers to will get an IP from the
172.18.6.0/24 range and be accessible from all pods and nodes. Services can also explicitly request/assign their own IP from within that range, which will come in handy shortly.
But first, let's run get all and make sure that everything is correct, including Istio's ingress gateway getting an IP address:
$ kubectl get all --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE istio-system pod/istio-ingressgateway-66c7db878f-gp9vz 1/1 Running 0 107m istio-system pod/istiod-7cdc645bb4-2h75j 1/1 Running 0 108m istio-system pod/prometheus-5c84c494dd-879h4 2/2 Running 0 107m kube-system pod/coredns-66bff467f8-blj4s 1/1 Running 0 119m kube-system pod/coredns-66bff467f8-jvtn5 1/1 Running 0 119m kube-system pod/etcd-kind-control-plane 1/1 Running 0 119m kube-system pod/kindnet-fgmqb 1/1 Running 0 119m kube-system pod/kindnet-k8xpw 1/1 Running 0 119m kube-system pod/kindnet-q47hs 1/1 Running 0 119m kube-system pod/kindnet-rflnb 1/1 Running 2 119m kube-system pod/kube-apiserver-kind-control-plane 1/1 Running 0 119m kube-system pod/kube-controller-manager-kind-control-plane 1/1 Running 0 119m kube-system pod/kube-proxy-4w24t 1/1 Running 0 119m kube-system pod/kube-proxy-6k5zx 1/1 Running 0 119m kube-system pod/kube-proxy-tz2vh 1/1 Running 0 119m kube-system pod/kube-proxy-xgf7j 1/1 Running 0 119m kube-system pod/kube-scheduler-kind-control-plane 1/1 Running 0 119m local-path-storage pod/local-path-provisioner-67795f75bd-jx4dg 1/1 Running 0 119m metallb-system pod/controller-57f648cb96-7q9cx 1/1 Running 0 72s metallb-system pod/speaker-4pf7b 1/1 Running 0 72s metallb-system pod/speaker-c22hp 1/1 Running 0 72s metallb-system pod/speaker-c2f2l 1/1 Running 0 72s metallb-system pod/speaker-crx56 1/1 Running 0 72s NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 119m istio-system service/istio-ingressgateway LoadBalancer 10.100.197.167 172.18.5.2 15021:31353/TCP,80:32377/TCP,443:31939/TCP,15443:30195/TCP 107m istio-system service/istiod ClusterIP 10.105.18.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP,53/UDP,853/TCP 108m istio-system service/prometheus ClusterIP 10.107.187.215 <none> 9090/TCP 107m kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 119m NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-system daemonset.apps/kindnet 4 4 4 4 4 <none> 119m kube-system daemonset.apps/kube-proxy 4 4 4 4 4 kubernetes.io/os=linux 119m metallb-system daemonset.apps/speaker 4 4 4 4 4 beta.kubernetes.io/os=linux 72s NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE istio-system deployment.apps/istio-ingressgateway 1/1 1 1 107m istio-system deployment.apps/istiod 1/1 1 1 108m istio-system deployment.apps/prometheus 1/1 1 1 107m kube-system deployment.apps/coredns 2/2 2 2 119m local-path-storage deployment.apps/local-path-provisioner 1/1 1 1 119m metallb-system deployment.apps/controller 1/1 1 1 72s NAMESPACE NAME DESIRED CURRENT READY AGE istio-system replicaset.apps/istio-ingressgateway-66c7db878f 1 1 1 107m istio-system replicaset.apps/istiod-7cdc645bb4 1 1 1 108m istio-system replicaset.apps/prometheus-5c84c494dd 1 1 1 107m kube-system replicaset.apps/coredns-66bff467f8 2 2 2 119m local-path-storage replicaset.apps/local-path-provisioner-67795f75bd 1 1 1 119m metallb-system replicaset.apps/controller-57f648cb96 1 1 1 72s NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE istio-system horizontalpodautoscaler.autoscaling/istio-ingressgateway Deployment/istio-ingressgateway <unknown>/80% 1 5 1 107m istio-system horizontalpodautoscaler.autoscaling/istiod Deployment/istiod <unknown>/80% 1 5 1 108m
Looks good. The only problem is that the host (and outside traffic) cannot connect inbound to that load balancer IP address. To accomplish this, we finally get to the hacky band-aid patch that I'm not thrilled about but which has so far worked extremely well: We run socat in host's Docker as reverse proxy, connecting a port on the host to the KinD node network space.
As an example, let us forward the host's TCP port 80 the Istio ingress gateway's load balancer on TCP port 80. To do this, on the host machine's Docker, we run a socat instance like so:
$ docker run -d --network kind -p 80:80 docker.io/alpine/socat:latest \ -dd tcp-listen:80,fork,reuseaddr tcp-connect:172.18.5.2:80
You can see at the end where the load balancer IP address goes, and you can also see with the "-p 80:80" how the container port is mapped from the host. This creates a straight line from a host port bind over to socat, over to the load balancer service inside the cluster. If we imagine UDP port 9000 was also listening on the Istio ingress gateway, we could do a UDP port forward from the host in a similar manner:
$ docker run -d --network kind -p "9000:9000/udp" \ docker.io/alpine/socat:latest -dd \ UDP4-RECVFROM:9000,fork UDP4-SENDTO:172.18.5.2:9000
(The -dd flag is just the verbosity of logging/debugging output and can be ommitted.)
For my setup, I have a single docker-compose file with each load-balanced service port getting its own socat container. An example might look like:
--- version: "3" services: socat_istio_ingress: image: docker.io/alpine/socat:latest restart: unless-stopped container_name: socat_istio_ingress networks: - kind ports: - "0.0.0.0:80:80/tcp" command: "-dd tcp-listen:80,fork,reuseaddr tcp-connect:172.18.5.2:80" socat_istio_ingress_udp9000: image: docker.io/alpine/socat:latest restart: unless-stopped container_name: socat_istio_ingress_udp9000 networks: - kind ports: - "0.0.0.0:9000:9000/udp" command: "-dd UDP4-RECVFROM:9000,fork UDP4-SENDTO:172.18.5.2:9000" networks: kind: external: name: kind ---
This feels too hacky to me, but it works phenomenally well, and I have had no issues with speed or bandwidth. It also allows me to cut off external traffic without touching the Kubernetes cluster. There's only two major downsides. The first one is that I no longer get source IP addresses, since as far as K8s is concerned, all traffic originates from the socat container IPs. That's not an issue for me (yet) on my single-node personal setup, but it would be problematic for organizations looking to block DDoS attacks, gather user geolocation metrics, or other things that utilize source IP addresses. The other (and lesser) downside is that I can't connect from the host to NodePort addresses, so I have to set those as load balancers, and map the host port as a localhost bind rather than a 0.0.0.0 bind. I know I could use iptables for the port forwarding, and that's probably the best solution for everything, but I want to keep things containerized and predictable without messing around with host networking. This setup is mostly so I can get some K8s experience, utilizing its full feature set, so I don't mind so hacking around the edges to make my setup work. (As an aside,
kubectl port-forward also works reasonably well and you can set the timeout to inifinity, so I might just end up using that.)
Editor's note: After posting this article, I found out about a recent feature in KinD which is an additional configuration option
extraPortMappings that lets you bind host ports to KinD container ports, similar to Docker's port forwarding option. This solves the issue and makes my socat hack unnecessary, and it is what I have switched to for forwarding incoming traffic.
Even with the few small issues, this setup does indeed provide a full feature-complete Kubernetes on my single bare-metal server, with the ability to assign load balancers to services. This allows my configuration to be more portable (even though I'll have to remove the manually specified LB IP addresses). And it allows me to work with and practice using Kubernetes in managing my own services. And of course, while there is certainly overhead to all of this, it's not too terrible. Plex still works fine, and I get all the added benefits of Istio, such as easy monitoring and tracing. (Although instead of Istio's ingress gateway, I use Traefik 2 as my ingress.)
There wasn't too much that was difficult about this, but it was still a kind of interesting setup, and I had to google a few times to find all of the information to glue all the bits and pieces together. So I thought it was still worth putting out there.
Plus, I needed content for my new homepage...