Single-node K8s: KinD, Istio, & MetalLB
I have a cheap dedicated server with the German hosting company Hetzner. It's just for my own personal use, with lots of disk space to satisfy my data hoarding and enough ram to do the things I need to do. I have it mostly because my home internet is slow. Over time, there got to be a lot of services -- Plex, a Bitcoin node, an IPFS gateway, my e-mail, and recently in my attempts to de-google my life, I've added a search engine, a youtube replacement, a twitter replacement, and a pastebin replacement, among other things. (My rants about big data and the erosion of privacy are for another time!) Those services have been running with docker-compose but since I've been working with Kubernetes for my master's thesis project (a distributed stock exchange order matching engine running on K8s), I wanted to move them over to that. Aside from practicing my Kubernetes-fu, I wanted a setup that could easily be transferred from my single server to a more robust multi-node potentially cloud-hosted platform when I eventually upgrade.
The problem until recently has been that the single-server Kubernetes options were not great. Minikube worked fine but had the added overhead of a VM, and attaching host networking so I could expose my public IP was difficult. Other Docker-based options I tried worked well but had networking issues or other problems that made them not quite a full K8s experience. However, recently I tried Kubernetes in Docker (KinD), and found that none of the issues I had previously were present. It worked without hassle or extra setup, and it was easy to mount my host folders into the K8s nodes for services such as Plex and my backup jobs. And while logically there are many extra layers, those translate physically into nothing more than namespaces and other context wrappers -- which means they don't suffer the performance drawbacks of something like a VM. (I was initially concerned that the KinD nodes were VMs wrapped in containers a la Rancher VM, but it turns out they're just containerd
wrapped in containers.) Obviously, there is still overhead, but it was little enough that the benefits finally outweighed the costs.
Although the configuration works well, there were a handful of small issues I ran into, so I thought I'd describe how I did it -- my single-node full Kubernetes instance with Kubernetes-in-Docker, MetalLB, and Istio. There's one hacky workaround to tunnel inbound traffic from the host into Kubernetes, but other than that, it's all mostly straightforward.
The first step is to have Docker and the KinD binary installed on your system. From there, you do a normal KinD installation using whatever configuration you want. For me, that's a basic installation except I've added some host directories to use for app storage rather than a third party PV provider. (I considered setting up a storage backend like Minio S3 to gain some experience with volumes and storage classes, but at some point practicality needs to win, and especially for things like Plex, the unnecessary overhead was more than a philosophical issue.)
Wanting to use the most recent version of Kubernetes, which at the time of this writing was 1.18.4, I manually specified the node image, and ran the setup:
$ kind create cluster --image docker.io/kindest/node:v1.18.4 \
--name kind --config /path/to/kind-config.yaml
The config file merely specified one control plane node and three workers, with my host storage mounted on every worker node. (This way, I could specify the node path in the application's pod spec and know that it would be there on whatever node the pod was created.)
$ cat kind-config.yaml
---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: /path/to/host/storage
containerPath: /mnt/host_storage/
- role: worker
extraMounts:
- hostPath: /path/to/host/storage
containerPath: /mnt/host_storage/
- role: worker
extraMounts:
- hostPath: /path/to/host/storage
containerPath: /mnt/host_storage/
---
At this point, once KinD has bootstrapped the cluster, you need to either get the raw kubeconfig data by running kind get kubeconfig
, or let KinD set your kubectl context automatically by running kind export kubeconfig
. Ideally, everything shows up as working correctly:
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-66bff467f8-blj4s 1/1 Running 0 69s
kube-system pod/coredns-66bff467f8-jvtn5 1/1 Running 0 69s
kube-system pod/etcd-kind-control-plane 1/1 Running 0 78s
kube-system pod/kindnet-fgmqb 1/1 Running 0 52s
kube-system pod/kindnet-k8xpw 1/1 Running 0 69s
kube-system pod/kindnet-q47hs 1/1 Running 0 52s
kube-system pod/kindnet-rflnb 1/1 Running 2 52s
kube-system pod/kube-apiserver-kind-control-plane 1/1 Running 0 78s
kube-system pod/kube-controller-manager-kind-control-plane 1/1 Running 0 78s
kube-system pod/kube-proxy-4w24t 1/1 Running 0 52s
kube-system pod/kube-proxy-6k5zx 1/1 Running 0 69s
kube-system pod/kube-proxy-tz2vh 1/1 Running 0 52s
kube-system pod/kube-proxy-xgf7j 1/1 Running 0 52s
kube-system pod/kube-scheduler-kind-control-plane 1/1 Running 0 78s
local-path-storage pod/local-path-provisioner-67795f75bd-jx4dg 1/1 Running 0 69s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 86s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 85s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kindnet 4 4 4 4 4 <none> 84s
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 kubernetes.io/os=linux 85s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 85s
local-path-storage deployment.apps/local-path-provisioner 1/1 1 1 83s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-66bff467f8 2 2 2 69s
local-path-storage replicaset.apps/local-path-provisioner-67795f75bd 1 1 1 69s
and the host storage mounts are accessible everywhere an application might be deployed:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c981780e342 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes 127.0.0.1:41459->6443/tcp kind-control-plane
ecc5a3120914 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker
c126cf79ed65 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker3
5c05cf09e5d6 kindest/node:v1.18.4 "/usr/local/bin/entr…" 2 minutes ago Up 2 minutes kind-worker2
$ docker exec -it ecc5a3120914 ls -lAh /mnt/host_storage/
total 4.0K
-rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt
$ docker exec -it c126cf79ed65 ls -lAh /mnt/host_storage/
total 4.0K
-rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt
$ docker exec -it 5c05cf09e5d6 ls -lAh /mnt/host_storage/
total 4.0K
-rw-r--r-- 1 1000 1000 11 Jul 8 04:17 file_on_host.txt
I had in my head that Istio could also take over the function of the base overlay CNI network for Kubernetes, but I did not manage to get that working, so I let KinD install its default simple CNI network "kindnetd", and then I installed Istio on top of that. Even though I probably should have used the Kubernetes-native way of using a Helm chart, I'm lazy and just used the default profile that comes with Istio's istioctl
program:
$ istioctl install
This will install the default Istio profile into the cluster. Proceed? (y/N) Y
Detected that your cluster does not support third party JWT authentication. Falling back to less secure first party JWT. See https://istio.io/docs/ops/best practices/security/#configure-third-party-service-account-tokens for details.
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✔ Addons installed
✔ Installation complete
And now we see Istio up and running, with its ingress gateway waiting patiently for an external IP that we will give it shortly.
$ kubectl get all -n istio-system
NAME READY STATUS RESTARTS AGE
pod/istio-ingressgateway-66c7db878f-gp9vz 1/1 Running 0 3m31s
pod/istiod-7cdc645bb4-2h75j 1/1 Running 0 4m15s
pod/prometheus-5c84c494dd-879h4 2/2 Running 0 3m31s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/istio-ingressgateway LoadBalancer 10.100.197.167 <pending> 15021:31353/TCP,80:32377/TCP,443:31939/TCP,15443:30195/TCP 3m31s
service/istiod ClusterIP 10.105.18.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP,53/UDP,853/TCP 4m15s
service/prometheus ClusterIP 10.107.187.215 <none> 9090/TCP 3m31s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/istio-ingressgateway 1/1 1 1 3m31s
deployment.apps/istiod 1/1 1 1 4m15s
deployment.apps/prometheus 1/1 1 1 3m31s
NAME DESIRED CURRENT READY AGE
replicaset.apps/istio-ingressgateway-66c7db878f 1 1 1 3m31s
replicaset.apps/istiod-7cdc645bb4 1 1 1 4m15s
replicaset.apps/prometheus-5c84c494dd 1 1 1 3m31s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/istio-ingressgateway Deployment/istio-ingressgateway <unknown>/80% 1 5 1 3m31s
horizontalpodautoscaler.autoscaling/istiod Deployment/istiod <unknown>/80% 1 5 1 4m15s
At this point, everything works, but I still need to somehow route traffic from the host into the Kubernetes network. Since I wanted the manifests portable, I also wanted to be able to specify LoadBalancer as a service type, instead of just always using NodePorts for my configurations. After some research I ended up turning to MetalLB, the "bare metal load balancer" -- i.e. a set of configurations to setup Kubernetes with manual IP addresses.
The first issue I ran into with MetalLB was that it didn't create its namespace metallb-system
and it would fail to install if that wasn't present. So I manually created the namespace by applying the following simple yaml:
---
apiVersion: v1
kind: Namespace
metadata:
name: metallb-system
labels:
app: metallb
istio-injection: disabled
---
Next was to create a basic config for it, describing which IP addresses it could give out as LoadBalancer addresses. I don't know if this was the right thing to do or not, but I ended up deciding to use the node IP address space for the KinD nodes. I was worried that if I told it to always use my host public IP that that might cause issues. I wasn't sure if the node IP address space would work or be visible from the nodes or services, but I tried it out and it worked... for the most part. The cluster nodes and the pods inside the cluster could access the IP, but the host machine could not. (Don't worry, we'll hack a fix for that shortly.)
The node cluster IP space can be found a number of ways. You can run
$ docker inspect network kind
or you can just query a node directly with something like
$ docker exec -it 0c981780e342 ip addr show dev eth0
My cluster's node IP space was 172.18.0.0/16
with nodes being assigned inside 172.18.0.0/24.
I decided to use 172.18.5.0/24
and 172.18.6.0/24
figuring ~500 IPs would be plenty for whatever I was going to do. So the config map yaml which I applied next was the following:
---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: custom-ip-space
protocol: layer2
addresses:
- 172.18.5.2-172.18.5.254
- 172.18.6.2-172.18.6.254
---
I didn't think I needed to leave room for a gateway and broadcast, since this wasn't actually a new network, but I also wasn't sure how MetalLB worked, so I figured better safe than sorry. (Editor's note: It turns out MetalLB works at a higher level than that, using ARP spoofing or alternatively BGP routing to direct traffic to the correct cluster node.)
Lastly, before MetalLB will work, we need to generate a random secret for it to use. The secret is nothing more than 128 random bytes, base64-encoded, with the line breaks intact. An example method to generate this secret string would be:
$ dd if=/dev/urandom bs=1 count=128 2>/dev/null | base64
or
$ openssl rand -base64 128
The secret has to have the name memberlist
, be in the metallb-system
namespace, and be a single key/value, with the key name secretkey
and the value being the random bytes generated earlier.
An example command to create this new secret might be:
$ kubectl create secret generic -n metallb-system \
memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
Once this new secret is created, you can just install the latest version of MetalLB directly from the source. In my instance, the most recent release on github was 0.9.3, so I applied the following:
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
which created several new resources:
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
podsecuritypolicy.policy/controller created
podsecuritypolicy.policy/speaker created
serviceaccount/controller created
serviceaccount/speaker created
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
role.rbac.authorization.k8s.io/config-watcher created
role.rbac.authorization.k8s.io/pod-lister created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
rolebinding.rbac.authorization.k8s.io/config-watcher created
rolebinding.rbac.authorization.k8s.io/pod-lister created
daemonset.apps/speaker created
deployment.apps/controller created
With that, the basic installation is complete. Istio is configured and any services that we assign LoadBalancers to will get an IP from the 172.18.5.0/24
or 172.18.6.0/24
range and be accessible from all pods and nodes. Services can also explicitly request/assign their own IP from within that range, which will come in handy shortly.
But first, let's run get all and make sure that everything is correct, including Istio's ingress gateway getting an IP address:
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system pod/istio-ingressgateway-66c7db878f-gp9vz 1/1 Running 0 107m
istio-system pod/istiod-7cdc645bb4-2h75j 1/1 Running 0 108m
istio-system pod/prometheus-5c84c494dd-879h4 2/2 Running 0 107m
kube-system pod/coredns-66bff467f8-blj4s 1/1 Running 0 119m
kube-system pod/coredns-66bff467f8-jvtn5 1/1 Running 0 119m
kube-system pod/etcd-kind-control-plane 1/1 Running 0 119m
kube-system pod/kindnet-fgmqb 1/1 Running 0 119m
kube-system pod/kindnet-k8xpw 1/1 Running 0 119m
kube-system pod/kindnet-q47hs 1/1 Running 0 119m
kube-system pod/kindnet-rflnb 1/1 Running 2 119m
kube-system pod/kube-apiserver-kind-control-plane 1/1 Running 0 119m
kube-system pod/kube-controller-manager-kind-control-plane 1/1 Running 0 119m
kube-system pod/kube-proxy-4w24t 1/1 Running 0 119m
kube-system pod/kube-proxy-6k5zx 1/1 Running 0 119m
kube-system pod/kube-proxy-tz2vh 1/1 Running 0 119m
kube-system pod/kube-proxy-xgf7j 1/1 Running 0 119m
kube-system pod/kube-scheduler-kind-control-plane 1/1 Running 0 119m
local-path-storage pod/local-path-provisioner-67795f75bd-jx4dg 1/1 Running 0 119m
metallb-system pod/controller-57f648cb96-7q9cx 1/1 Running 0 72s
metallb-system pod/speaker-4pf7b 1/1 Running 0 72s
metallb-system pod/speaker-c22hp 1/1 Running 0 72s
metallb-system pod/speaker-c2f2l 1/1 Running 0 72s
metallb-system pod/speaker-crx56 1/1 Running 0 72s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 119m
istio-system service/istio-ingressgateway LoadBalancer 10.100.197.167 172.18.5.2 15021:31353/TCP,80:32377/TCP,443:31939/TCP,15443:30195/TCP 107m
istio-system service/istiod ClusterIP 10.105.18.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP,53/UDP,853/TCP 108m
istio-system service/prometheus ClusterIP 10.107.187.215 <none> 9090/TCP 107m
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 119m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kindnet 4 4 4 4 4 <none> 119m
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 kubernetes.io/os=linux 119m
metallb-system daemonset.apps/speaker 4 4 4 4 4 beta.kubernetes.io/os=linux 72s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
istio-system deployment.apps/istio-ingressgateway 1/1 1 1 107m
istio-system deployment.apps/istiod 1/1 1 1 108m
istio-system deployment.apps/prometheus 1/1 1 1 107m
kube-system deployment.apps/coredns 2/2 2 2 119m
local-path-storage deployment.apps/local-path-provisioner 1/1 1 1 119m
metallb-system deployment.apps/controller 1/1 1 1 72s
NAMESPACE NAME DESIRED CURRENT READY AGE
istio-system replicaset.apps/istio-ingressgateway-66c7db878f 1 1 1 107m
istio-system replicaset.apps/istiod-7cdc645bb4 1 1 1 108m
istio-system replicaset.apps/prometheus-5c84c494dd 1 1 1 107m
kube-system replicaset.apps/coredns-66bff467f8 2 2 2 119m
local-path-storage replicaset.apps/local-path-provisioner-67795f75bd 1 1 1 119m
metallb-system replicaset.apps/controller-57f648cb96 1 1 1 72s
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
istio-system horizontalpodautoscaler.autoscaling/istio-ingressgateway Deployment/istio-ingressgateway <unknown>/80% 1 5 1 107m
istio-system horizontalpodautoscaler.autoscaling/istiod Deployment/istiod <unknown>/80% 1 5 1 108m
Looks good. The only problem is that the host (and outside traffic) cannot connect inbound to that load balancer IP address. To accomplish this, we finally get to the hacky band-aid patch that I'm not thrilled about but which has so far worked extremely well: We run socat in host's Docker as reverse proxy, connecting a port on the host to the KinD node network space.
As an example, let us forward the host's TCP port 80 the Istio ingress gateway's load balancer on TCP port 80. To do this, on the host machine's Docker, we run a socat instance like so:
$ docker run -d --network kind -p 80:80 docker.io/alpine/socat:latest \
-dd tcp-listen:80,fork,reuseaddr tcp-connect:172.18.5.2:80
You can see at the end where the load balancer IP address goes, and you can also see with the "-p 80:80" how the container port is mapped from the host. This creates a straight line from a host port bind over to socat, over to the load balancer service inside the cluster. If we imagine UDP port 9000 was also listening on the Istio ingress gateway, we could do a UDP port forward from the host in a similar manner:
$ docker run -d --network kind -p "9000:9000/udp" \
docker.io/alpine/socat:latest -dd \
UDP4-RECVFROM:9000,fork UDP4-SENDTO:172.18.5.2:9000
(The -dd flag is just the verbosity of logging/debugging output and can be ommitted.)
For my setup, I have a single docker-compose file with each load-balanced service port getting its own socat container. An example might look like:
---
version: "3"
services:
socat_istio_ingress:
image: docker.io/alpine/socat:latest
restart: unless-stopped
container_name: socat_istio_ingress
networks:
- kind
ports:
- "0.0.0.0:80:80/tcp"
command: "-dd tcp-listen:80,fork,reuseaddr tcp-connect:172.18.5.2:80"
socat_istio_ingress_udp9000:
image: docker.io/alpine/socat:latest
restart: unless-stopped
container_name: socat_istio_ingress_udp9000
networks:
- kind
ports:
- "0.0.0.0:9000:9000/udp"
command: "-dd UDP4-RECVFROM:9000,fork UDP4-SENDTO:172.18.5.2:9000"
networks:
kind:
external:
name: kind
---
This feels too hacky to me, but it works phenomenally well, and I have had no issues with speed or bandwidth. It also allows me to cut off external traffic without touching the Kubernetes cluster. There's only two major downsides. The first one is that I no longer get source IP addresses, since as far as K8s is concerned, all traffic originates from the socat container IPs. That's not an issue for me (yet) on my single-node personal setup, but it would be problematic for organizations looking to block DDoS attacks, gather user geolocation metrics, or other things that utilize source IP addresses. The other (and lesser) downside is that I can't connect from the host to NodePort addresses, so I have to set those as load balancers, and map the host port as a localhost bind rather than a 0.0.0.0 bind. I know I could use iptables for the port forwarding, and that's probably the best solution for everything, but I want to keep things containerized and predictable without messing around with host networking. This setup is mostly so I can get some K8s experience, utilizing its full feature set, so I don't mind so hacking around the edges to make my setup work. (As an aside, kubectl port-forward
also works reasonably well and you can set the timeout to inifinity, so I might just end up using that.)
Editor's note: After posting this article, I found out about a recent feature in KinD which is an additional configuration option extraPortMappings
that lets you bind host ports to KinD container ports, similar to Docker's port forwarding option. This solves the issue and makes my socat hack unnecessary, and it is what I have switched to for forwarding incoming traffic.
Even with the few small issues, this setup does indeed provide a full feature-complete Kubernetes on my single bare-metal server, with the ability to assign load balancers to services. This allows my configuration to be more portable (even though I'll have to remove the manually specified LB IP addresses). And it allows me to work with and practice using Kubernetes in managing my own services. And of course, while there is certainly overhead to all of this, it's not too terrible. Plex still works fine, and I get all the added benefits of Istio, such as easy monitoring and tracing. (Although instead of Istio's ingress gateway, I use Traefik 2 as my ingress.)
There wasn't too much that was difficult about this, but it was still a kind of interesting setup, and I had to google a few times to find all of the information to glue all the bits and pieces together. So I thought it was still worth putting out there.
Plus, I needed content for my new homepage...