| apps/default | ||
| clusters | ||
| secrets | ||
| .sops.yaml | ||
| README.md | ||
Alzalia's Kubernetes
[!ERROR] My control-plane is currently pretty much... dead ? The repository is currently running at a minimal state to try and make it work again, at leats a bit !
Warning
It may be clear for a pro : I do not know 100% (or 60%) what I'm doing. I test around, do things, probably not in the best way. At the time of writing this, I intend on doing a full review of my infrastructure during summer to patch security issues and such.
Hello ! Welcome to my Kubernetes infrastructure ! This file contains a quick summary of what you'll find here, but you will find more details in the different folders !
Note
You can follow my journey here, or at least what I publish at least (because I often forget to write down what I do).
Systems
Let's first start by a short list of systems and applications that I use to make my whole infrastructure work. Here, I'm talking about the main apps, not all the ones I host.
FluxCD
A main component of my whole infrastructure is FluxCD. It allows me to have "GitOps" workflow, which rouglhy translate to : whatever is on my git repo is the current state of my infrastructure.
To be more detailed, it means that I have a local copy of this repository on my computer. If I want to make a cange, I do said changes, then commit and push those changes to this repository, and then FluxCD manages the transition itself.
This is very useful for multiples reasons :
- I get to keep organized files of my current infrastructure's state
- I know that at any point in time, my kubernetes infrastructure is what's on the repository
- I don't have to manage the transitions myself
Using FluxCD doesn't mean I don't have any control over my infra left though. I still use manual commands, and pods, and etc. for different test here and there. Only, when it comes to making the changes permanent, I know there is a trustable way of doing it.
Traefik
Warning
I am currently thinking about changing to Istio. I need IP-Whitelisting for some things, and it seems Traefik gets the service's IP address and it needs a complicated setup for it to work. Istio... might be the same problem though. So I'm looking into it.
Traefik is the backend I use for Kubernetes's Gateway API. There isn't much to say about it : it's very used, so I knew it existed and I just... went with it.
Cert-manager
Cert-manager allows for the automated management of certificates, so that I can access my services through https. For multiple reasons I won't explain here, I ended up configuring DNS-01 as a challenge, instead of HTTP-01.
Note
Because of how cert-manager is done (or at least how it was when I'm writing this, as far as I know), for automatic certificate management to work, I need to create one gateway per HTTPRoute. A bit verbose, but it works !
Longhorn
Longhorn is used to provide for PersistentVolumeClaims - in less weird words, it manages the access to storage throughout my Kubernetes cluster -.
Authentik
Leaving the systems that make up the infrastructure itself, I also use Authentik to have Single Sign On (SSO) across all my applications (well, those that need it at least).
Although I started my journey in the world of Identity Providers with Keycloak, which really snapped for me, I ended up going with Authentik for its simplicity. It's features like integrated LDAP server, or reverse proxy auth, are perfect for my small use : I don't need to host yet another app to do it.
Machines
As of writing this, my Kubernetes cluster relies on those machines :
| Machine Name | Role | RAM | CPU | Disk | Longhorn |
|---|---|---|---|---|---|
| Server3 | Control-plane | 4GB | 2c / 2t | 500GB HDD | Yes |
| Server4 | Worker | 32GB | 14c / 20t | 1TB SSD | Yes |
| VPS1 | Worker | 12GB | 6c / 6t | 100GB | No |
There is a fourth machine, not inside the Kubernetes cluster, a raspberry pi, that I have yet to include into the cluster. It hosts Nextcloud, Forgejo, and is connected to some external SSDs that serve as file servers.
It is part of my plans to one day buy a NAS to be able to properly host Nextcloud's data, and also backup the cluster's data.
Because it is a VPS, VPS1 does not provider for Longhorn. There is limited room, and I don't want my data to be lost if for whatever reason something happens and I lose access to the VPS.
Being hosted at OVH, my VPS has an integrated DDOS attack protection. Which is why I use it as the access point of all my services. Plus, that way, I don't need to worry about dynamic DNS... and nobody knows my IP adress.
Todo
There are a lot of things to be done for me to be satisfied with what I've done.
Main Tasks
- A profound security "audit". By that, I mean doing everything I can to make the cluster secure.
- Replace Traefik with Istio. This should allow for IP-Whitelisting some apps, and also maybe put mTLS between pods.
- Deploy the Prometheus x Grafana stack, to know better what's going on in there.
- Investigate compatibility with IPv6 (to allow people to access my apps through IPv6, which I don't think is the case right now)
Smaller Tasks
- Deploy a runner for Forgejo
- Deploy a Collabora instance to allow online editing for Nextcloud