Intro - Why I do it
There are a lot of cloud provider options today, and many of them are great. Hell, I even work at Amazon, one of the largest of them. So why then do I homelab? For me, there are many reasons and most of them are the usual suspects: privacy, control over my own data, the sweet satisfaction of building something myself. The thought of someone sneakily reading my personal items, my writing, my code, my photos and training an AI on it is a major turn off for sure. However, the biggest reason is learning and expanding my skills. My homelab gives me a playground to try out new ideas, new technologies, to really level up my abilities. I’m the kind of person who winds down from my job in big tech by tinkering (or essentially I wind down from a long day of coding with more coding). I’m always chasing problems to solve and with homelabbing, there is certainly no shortage. This hobby certainly isn’t for everyone: it’s nothing but delicious sidequest after sidequest. You can’t beat the ease of a traditional cloud provider like AWS or even Digital Ocean or something when just spinning up simple apps or services. But since you’re here, I assume you have let your curiosity get the better of you and you’re ready to venture down the rabbit hole yourself. So for you fellow techno-masochists, I am writing this series on how I homelab. I’ll cover my journey, my setup, and some interesting projects like setting up data backups using dynamic credentials, exposing services to the wild internet, and orchestrating anything and everything!
The Hardware
Compute
The core of my homelab is a 2-node Proxmox cluster running on 2 AMD Ryzen 7 mini-PC’s each with integrated graphics and 32GB of RAM. This setup allows me to run a bunch of VMs to my heart’s content. My miniPC’s were cheap when I got them so I got 2 to expand capacity and so I could play around with high availability setups.
Storage
I run two Network Attached Storage (NAS). I started with one 2-drive, 4TB NAS but I quickly outgrew it. I got a second NAS with 4, 8TB drives. The big NAS is my main drive, the smaller one is now exclusively for backups.
Networking
I have an older Asus router running Merlin firmware in Access Point (AP). I let the Asus handle just WiFi, and moved the rest of the routing functionality from the router to a VM running Opnsense on one of the nodes (more on this later). I want to move this from a VM to a dedicated piece of hardware but I haven’t been able to justify the purchase yet with the current inflated prices. I have a couple unmanaged switches to connect everything up as well as to some of my IOT appliances and my desktop.
Network Overview
While I really want to go crazy with more advanced topologies, my network is currently flat right now; I only have a single segment which all my clients run on (192.168.1.0/24). This works perfectly fine for now but I intend to fix this this summer as soon as I figure out how to have the AP respect VLAN tagging, so I can add VLANs for my robot vacuum and other IOT devices, one for guests, etc.
Both of my proxmox nodes are dual NIC. The node running opnsense uses both NICs: one for WAN (the incoming internet connection) and one for LAN (the rest of my internal network). I have another VM which runs my other network resources such as Technitium (my DNS), Caddy (Reverse Proxy), and Tailscale (VPN). I group these together because they’re all concerned with how traffic enters and exits the system.
Here’s how it all fits together (I’ve noted some gaps I’m actively planning to fix): TODO: Add Diagram here
Homelab Network
Current topology · 192.168.1.0/24 · Improvements noted
Router · Firewall · NAT
Removes Node 1 SPOF
Wi-Fi clients
App data volumes
NAS replication target
Hypervisor
I chose Proxmox as my hypervisor. It has a great community, it’s free, and it allows me to quickly spin up VM’s to experiment with new OS’s and new programs quickly and easily. Having Proxmox run in a cluster makes my setup more resilient. While my current setup can’t run true HA (you need 3+ nodes unless you want to run some corosync on a RPI or something), I can still share management across multiple nodes, and perform live migrations between the two nodes which allow me to make sure most of my services can stay running.
Using proxmox also lets me heavily leverage infrastructure as code. All my VM’s are defined in their own directory, with a terraform + butane/ignition configuration, ansible playbooks, and whatever other files or configurations it may need. This makes everything not only repeatable but it’s self documenting! Never again will I spin up something and revisit after months or years only to realize I have no idea what I did or how it works.
I have my setup architected as a bunch of VMs which each run a set of containerized applications. I organized it so each VM runs a set of logically similar services. For example, my “Ingress” VM runs Technitium (my DNS), Caddy (Reverse Proxy), Tailscale (VPN), and Newt (tunnel client for external access); my “media” VM runs jellyfin, my *arr stack, and calibre.
Why Multiple VMs?
Why did I choose to architect my system around multiple VMs instead of having one or two big VMs and running everything on them? Or better yet, run everything with K8s or K3s? I mean, the VMs do add some overhead, they each idle at like 300MB memory usage or so when they’re doing nothing so what’s the point? The short answer: like it or not I break stuff pretty often. I’ll often try to do something fancy or new and next thing I know I lose logging, or networking, or kernel panic. Running multiple VMs allows me to take advantage of the most powerful feature of VMs: isolation. I can have some VMs exposed to the internet while keeping others internal; I can experiment on the VM without having to worry it’ll take out the entire system. Having some hard boundaries between my various sandboxes lets me minimize the blast radius of decisions I make, which is a major relief when I’m tinkering with a new OS flag or something. It also makes it easier to manage resources: there’s less risk of starving one service because an adjacent one goes a little rogue. I may one day start experimenting more with k3s, but a lot of the time it adds too much complexity and is overkill for my use cases. Maybe one day I’ll give it a shot. I do write some of my Quadlets as .kube files specifically so I think that migration would be mostly copy-paste when the time comes. I believe my current approach is a decent middleground, giving me isolation where it matters without adding too much excess operational weight. But of course, I didn’t start here. Here’s the less embarrassing version of how I actually got to this setup.
How did I get here?
Like most homelabbers, I didn’t start with this perfect architecture. In fact, I barely even started with a plan. I started with what was convenient and on-hand at the time and kept migrating forward whenever something broke down or I found a better way. Here’s my journey.
The NAS
My first piece of equipment was the little NAS. I am a big fan of cartoons, and I had a bunch of hard-to-find comfort shows I wanted to make available on my network. Accessing the videos directly from the NAS through VLC or the NAS web interface quickly got old fast. I realized the NAS was more powerful than just a hard drive on the network: it has 8GB of RAM and a quad core processor. Playing around I found out it could run docker so it was obvious: this thing had storage, it had processing power, it was already on the network and always on, why not run some containers on there. For a while, this worked great. Until it didn’t. Running docker containers on my NAS quickly became an absolute masterclass in resource contention. It’d saturate the I/O, the CPU would spike, and suddenly the entire thing would become unresponsive, because some mystery task would run at the most inopportune time and lock everything up. While the NAS’ marketing touted its docker abilities it didn’t seem happy to be running my containers at all. In retrospect the lesson was obvious: storage and compute are two totally different problems, and conflating them means doing both pretty badly.
My first MiniPC
I decided to purge the docker containers from the NAS and the world was right again. Suddenly my NFS shares were more reliable, videos played without a hitch. I decided to pick up a cheap mini PC, dump the basic version of Windows it came with, and load it up with Debian. This was a much more familiar situation for me, and I quickly got to work spinning out docker compose files to bring my services back up. This worked, and for a normal person this would have continued working. For me however, the FOMO hit: what else could I do with this? For example, I was taking some online cybersecurity classes which used VMs for the assignments and it sure would be nice to run the VMs natively instead of fighting with Rosetta on my Macbook. Well, if I was going to run multiple VMs for my classes there must be a better way to manage them right?
Proxmox and LXC
I went back and forth on a few hypervisors like VMWare vSphere. I was familiar with VMWare but they were in the process of merging with Broadcom and that made it kind of unappealing to me. I decided to try Proxmox: It was free, it offered some pretty advanced features, and there was a huge community from which to learn. Like most people I started with the Proxmox Scripts. This site was awesome (albeit a little unnerving from a security aspect). It allowed me to quickly get various stuff up and running in no time, at a crucial time when it would have been easy to get bogged down and discouraged enough to abandon the whole thing and return back to the safe bosom of Debian. I immediately honed in on LXC’s: they seemed like the perfect complement to the VMs. LXCs are small, lightweight, and seemed like the perfect place to run my services like paperless or jellyfin. Turns out, it’s not ideal. Running LXCs unprivileged and wrestling with cgroups and namespaces is a nightmare, way more frustrating than on a regular machine. The easy fix is to just run everything as privileged which feels like the wrong direction from a security standpoint. Later, when Proxmox released the feature to run LXCs directly from docker images the problem was even worse. I encountered so many weird silent failures where things would report all green in the logs but then nothing would work. I spent more time wrestling with LXCs than actually running my services.
Proxmox and VMs
Moving everything to VMs solved all my problems instantly. I ran a VM per service and it was bliss. Everything just works exactly as intended and as expected. But now I had a new problem: every VM was a snowflake and I was stirring up a blizzard. I’d have to SSH into machines to set them up, tweak a config or two, update packages manually, and once everything was happily running I’d forget about it. Then six months later I would have to log in again and I’d have no idea what was on the box, where things were stored, or how anything worked. Worse, if I had a box go down I’d have no way of easily recreating it so I’d have to rebuild it from scratch. There was so much that required documentation and who has time for that? My system was becoming something I had to tend rather than a reliable tool.
Proxmox, VMs and Podman
One thing that is great about homelabbing is there’s always something to learn, and I’m voracious in finding new things to learn and new tricks to try. I was getting a little bored and frustrated with Docker. The daemon was getting annoying, and I wanted to move away from the root user but the daemon needs to run as root for anything to work, and it had a habit of getting in the way during upgrades. Then one day I was perusing HackerNews and I saw an article about Podman Quadlets. I almost totally scrolled past it.
The daemonless architecture caught my eye. No persistent root process, no dumb daemon to babysit? And it offers rootless containers as a feature, not some annoying container configuration? This appealed a lot to my security paranoia. After my LXC experience of “oh just run it privileged” was always the answer, this felt like a much better direction. It was also fully Docker-compatible so I didn’t have to invest anything beyond an “apt-get” to try it. It was uncanny how many boxes it was checking for me so I gave it a shot.
Then I learned about Quadlets, and that’s when I was fully sold. Quadlets are a Podman feature which let you define your containers as systemd unit files - little declarative text files that describe what you want to run, all its dependencies, and how it should behave. Podman’s systemd plugin picks these files up and manages them just like any other system service. Sure, it sounds like a minor technical convenience but it’s way bigger than that: containers now get the full power of systemd’s dependency graph, restart policies, and centralized logging all for free with no extra tooling beyond what comes built-into linux. So what, right? What does this even mean? It means I can declare a container (like jellyfin for example) must not start until an NFS is available, and systemd will make it so. It means every container logs to the journal alongside every other service in the system. One place, one tool: journalctl. It means that deploying and managing these services with Ansible now becomes trivially simple, because I can just drop these unit files and then reload systemd rather than wrestle with Docker contexts or Compose lifecycle. Quadlets support .container, .pod, and .kube files. The last one is interesting: A .kube quadlet is a Kubernetes manifest which means if I ever want to migrate to k3s the work is already done.
And here’s an example of a quadlet with my actual caddy.container file:
And here’s an example of a systemd mount file:
CoreOS
With Quadlets solving the container management problem, I started wondering about the OS underneath. If I was going all-in on containerization, why was I still running a general purpose distro with all the baggage that comes with it? That’s when I decided to try Fedora CoreOS: the OG of container-optimized operating systems - and it turned out to be the final piece of the puzzle. CoreOS is immutable. You can’t make ad-hoc changes to the filesystem the way you can on a normal distro. Instead the OS is configured on provision through a tool called Ignition, using a human readable format called Butane. My Terraform provisioning consumes my Butane file and transpiles it to an Ignition file and hands it over to the VM on first boot. The machine materializes already knowing all the users, their SSH keys, NFS mount points, and the automatic update schedule. It’s born fully configured, not configured after the fact. This sounds like a constraint - it is - but it forces good habits. There’s no SSH-ing to tweak something “real quick”, and then forgetting about it a few months later, no mystery configurations. If it’s not in the Butane file or the Ansible playbook, then it’s not on the machine. And if something goes wrong? I don’t fix it, I terraform destroy and reprovision. The whole machine comes back up in minutes, fresh and identical to before. That shift - from fixing broken VMs to provisioning - that’s what made my homelab feel like something I built rather than something I inherited that constantly needed rescue. Less like herding cats, more like running a herd of cattle. It closes the loop on the snowflake problem: every VM is now a first-class, fully-documented, reproducible artifact. The Ansible playbook handles the application layer on top, so the full picture of any given VM lives in a handful of files which read like an instruction manual.
Every phase taught me something. The NAS phase taught me to separate storage from compute. The LXC phase taught me that lightweight isn’t worth it if it means fighting your tooling. The snowflake VM phase taught me that undocumented state is technical debt, even at home. The current setup is a direct response to every one of those lessons - and I’m sure the next phase will teach me something the current one doesn’t.
The Services
At this point you might be wondering what I actually run on all this infrastructure. Here’s the full roster, organized by VM — because the groupings themselves tell you something about how I think about the architecture.
Ingress
The Ingress VM is the only one with a foot in both worlds: my internal network and the public internet. These services live together because they’re all concerned with the same question: how does traffic get to where it’s going, and who decides that?
| Service | What it does |
|---|---|
| Caddy | Reverse proxy with automatic TLS via Let’s Encrypt. Provides subdomain access to all services over HTTPS. |
| Technitium | DNS for the whole network. Local hostnames, network-level adblocking. |
| Tailscale | VPN exit node and route advertiser. Full remote access without opening any firewall ports. |
| Newt | Outbound tunnel client for Pangolin. More on this in the External Access section. |
Auth
Arguably the most critical VM in the whole setup — if this goes down, authentication stops working everywhere. It gets its own post later in this series.
| Service | What it does |
|---|---|
| Authentik | OIDC identity provider for every service that supports it. Centralized MFA, session management, and access control in one place. |
| Step CA | Internal certificate authority. Issues real TLS and SSH certificates across the whole network. |
| PostgreSQL | Authentik’s database. |
| Redis | Authentik’s cache. |
OpenBao
OpenBao gets its own VM, deliberately isolated from everything else. It’s a fork of HashiCorp Vault that stayed open source after the licensing drama. Nothing sensitive is hardcoded or sitting in plaintext anywhere in my setup — API keys, database credentials, TLS certificates, all of it lives here. I do a lot with this tool and it gets its own posts later in the series.
Monitoring
This VM exists because “Hmm, I wonder if that’s still running” is not a fun question to answer by SSH-ing into boxes. Knowing what’s happening across the whole system before something breaks is worth the overhead.
| Service | What it does |
|---|---|
| Grafana | Metrics dashboards. |
| Loki | Log aggregation. |
| Uptime Kuma | Uptime monitoring and status page. |
| Beszel | System resource monitoring across every host. |
Media
The VM that gets the most day-to-day use, and the one most likely to spike CPU or fill local disk with transcode cache. Isolated for exactly that reason.
| Service | What it does |
|---|---|
| Jellyfin | Media server. Streaming services kept removing my comfort shows (RIP Swat Kats) so I run my own library now. |
| Jellyseerr | Media request UI — clean interface for adding content without touching the arr stack directly. |
| Radarr / Sonarr | Automated movie and TV show management. |
| Prowlarr | Indexer manager and search proxy for the arr stack. |
| SABnzbd | Usenet download client. |
Paperless
I live in a tiny apartment, so no room for filing cabinets. Paperless-NGX OCRs, indexes, and makes every document in my life fully text-searchable. The moment I could find any document by typing a few words instead of digging through nested folders was the moment I understood why people get unreasonably enthusiastic about this tool. Also notable as my first foray into using .kube files as Quadlets.
Gitea
My self-hosted Git server. All my Terraform configs, Ansible playbooks, Butane files, and personal projects live here rather than solely on GitHub. The Gitea Actions Runner handles CI: building images, compiling binaries for random tools. Keeping infrastructure code in a repo I control just hits right, given everything I’ve said about data ownership.
| Service | What it does |
|---|---|
| Gitea | Self-hosted Git server and web UI. |
| Gitea Actions Runner | CI runner for builds and automation. |
n8n
Workflow automation — like Zapier but self-hosted and significantly more powerful. Started as a thinly-veiled excuse to play with a new tool. Quietly became load-bearing infrastructure.
| Service | What it does |
|---|---|
| n8n | Workflow automation server, worker, and runners. |
| PostgreSQL | n8n’s database. |
| Redis | n8n’s task queue. |
Netboot
Netboot.xyz is a PXE network boot server. Invaluable when provisioning a new VM or adding a node to the cluster. Sits quietly doing nothing most of the time, but when you need it, it’s really nice to have.
Utils
My VM for miscellaneous workloads. Currently runs Borg backup jobs for photos and Paperless documents: one containerized job per target, plus supporting containers. Easily the most interesting box in my setup right now, enough so that it’s getting its own dedicated post.
Not Yet Migrated
OPNsense, Proxmox Backup Server, Home Assistant, and Immich are still on legacy infrastructure. They work, they’re stable, and rushing migrations for the sake of consistency is how you introduce incidents. They’ll join the family eventually: OPNsense is moving to dedicated hardware first, which is a natural point to rethink its configuration anyway.
External Access - Zero Open Firewall Ports!
All the services I’ve described thus far live safely inside my home network. Sometimes though, I want to reach them from anywhere: Jellyfin from the TV at a friend’s house; Immich when I want to share a photo or album with my family. The naive solution is to forward some ports on the router, clap your hands and call it a day. I’ve never liked that answer. Port forwarding means your home IP is a public target, services are directly reachable, and one poor configuration can mean a very bad day. There is a better way.
The VPS
I rent a cheap Debian Linode, like a few dollars a month, which lives on the public internet and does nothing but run Pangolin. The VPS itself is hardened and locked down: Fail2ban, SSH Key only, root login disabled, automatic security updates, the works. It’s a minimal attack surface by design. The less the machine does, the less there is to compromise.
Pangolin and Newt
Pangolin is a self-hosted, tunneling reverse proxy. The key word here is tunneling. Rather than my home network accepting inbound connections, my ingress VM runs a Quadlet called Newt which establishes an outbound tunnel to the Pangolin VPS. Pangolin then proxies all incoming traffic back through the tunnel to the appropriate internal service. My home firewall has no inbound rules for any of this. No ports forwarded, no public IP exposed, no direct path from the internet to my home network. The connection is always initiated from inside the house, the VPS is just at the other end of the rope.
This means that if the VPS is (God forbid) compromised, an attacker gets access only to whatever I’ve exposed through Pangolin. They don’t get a foothold into the rest of my network. What I Expose Not everything gets exposed externally, and the things that are exposed are not all exposed the same way. I run two auth models depending on the service:
Pangolin auth via Authentik forward auth - Jellyfin sits behind Pangolin’s built-in auth layer which validates sessions by forwarding auth checks to Authentik. A user trying to hit Jellyfin will be intercepted by Pangolin which requires Authentik authentication before allowing the connection to be established. The service will never see an unauthenticated request.
Native Auth Only - Immich handles its own authentication, no need for Pangolin to get involved. I mean it still proxies the traffic but it doesn’t gate anything. Immich’s own login screen is what users will see.
Authentik itself - Authentik is technically behind Pangolin but it is also exempt from Pangolin’s auth gating, for the obvious reason that Authentik is the Auth layer. Gating Authentik behind Pangolin would be a great way to lock everyone out permanently. It handles its own login, MFA, and brute force detection.
Why not just Tailscale?
Tailscale is also configured on my network, and I use it all the time for personal access. Tailscale requires a client for every one participating in the network, which is totally fine for me but not fine when I want to share a service with friends or family. Pangolin handles the cases where I need something genuinely public-facing without requiring anything from the client. So both tools have their place, solving different problems. Security Posture
To summarize what this setup actually buys me: my home firewall has no inbound rules related to these services. My home IP is not in any DNS record. The only thing publicly reachable is the Linode VPS, which runs a single service, is hardened, and only knows how to forward traffic to specific destinations through an encrypted tunnel. Authentik sits in front of anything worth protecting. The blast radius of a compromised VPS is limited to whatever Pangolin proxies, not my entire system. Is it more complex than just port forwarding? Yes. Is it worth it? Definitely yes.
What I’d do differently
No homelab is perfect or ever complete, and mine is certainly no exception. There are things I’d change if I were starting over today, and being honest about them feels more useful than pretending the current setup is the most obvious correct answer.
OPNsense should have been on dedicated hardware from day one Coming out of the gate with the point that bothers me the most. Running OPNsense as a VM on my Proxmox cluster means router and hypervisor share a fate. If the node goes down for any reason I lose internet for my entire network simultaneously. Every service running on the other node is still happily running, but will be unreachable. At the time it was cool and easy to rationalize, but since then it’s only given me anxiety. It isn’t just the homelab that goes down, without OPNsense my whole apartment loses internet access. I should have bought a small appliance at the start instead of taking what I thought was the clever path. Don’t worry though, this is getting fixed, it’s just a matter of finding the right hardware at the right price.
VLAN should have been planned at the beginning I have a robot vacuum, smart TV, smart plugs, smart humidifier even, all on the same flat subnet as my proxmox nodes, my NAS, and my OpenBao. That’s not an ideal situation. IoT devices are notoriously poorly maintained from a security standpoint, often running old firmware, phoning home, are exactly the type of things that are great jump-off points for a network compromise. The correct answer is isolation: IoT on it’s own VLAN, wifi guests on their own VLAN, services on their own VLAN with OPNsense enforcing strict boundaries and inter-VLAN routing. I knew this setting up but I did it anyway because setting up an advanced network topology felt like a project in itself and I was more interested in getting services up and running. It is indeed a project, and one I intend to tackle this summer once I figure out how to get my AP to do it. Single DNS is a quiet risk Technitium running on a single VM is a gap I think about more than I’d like to. DNS is one of those services that feels invisible until it’s gone, at which point everything breaks simultaneously and in confusing ways. Every service that uses a hostname, which is all of them, stops resolving. Clustering Technitium across both nodes or maybe a raspberry pi is straightforward in theory and somehow keeps not making it to the top of the priority list. It will. In the meantime I’m aware that a bad CoreOS update on the wrong VM is a bad evening.
RAID 5 made sense at the time My primary NAS runs RAID 5, which gives me single drive redundancy across the array. It felt like the right balance between capacity and protection. What I neglected to understand was rebuild behavior: RAID 5 rebuild times on large drives are measured in days, and during a rebuild the array is under a lot of sustained stress which can result in another drive failure right when the NAS is most vulnerable. There’s also the URE (Unrecoverable read Error), which during a rebuild would be likely enough to also be a real concern. RAID 6 would give me 2-drive redundancy and survive a second failure during rebuild. A ZFS-based setup would give me checksumming, snapshots, and better resilience. I’m not in immediate danger, but if I were building a NAS today I might rethink this approach.
The migration debt is real OPNsense, Home Assistant, Immich, and Proxmox Backup Server are still running on legacy infrastructure outside the CoreOS + Quadlets setup. They work, which is both the reason they haven’t been migrated and a mild source of cognitive dissonance given everything I’ve said about reproducibility and IaC. The honest answer is that migrating stable, working services introduces risk and requires time, and other things have been higher priority. They’ll get there. But every day they don’t is a day those services are managed differently from everything else, with less documentation and less reproducibility than I’d like.
Wrapping Up
If there’s a theme to everything I’ve described here, it’s this: every good decision in my current setup is a scar from a bad one. The IaC exists because I got burned by snowflakes. The VM isolation exists because I break things constantly. The zero-port forwarding is because I’m too scared of the deep blue internet. Good architecture, at least for me, isn’t something you design from first principles, it’s something you arrive at by solving real world problems as they come, sometimes badly at first then less badly next time.
The homelab is never done. Right now I’m working on VLAN segmentation, moving OPNsense, and building a mobile app backend for some new ideas. By the time you read this those might be done, which probably means something else will be broken or on fire.
If you’re just starting your journey, don’t be scared or overwhelmed by my setup. Running everything on a single NAS or mini PC or in a steaming pile of Docker Compose files, that’s not wrong, that’s just the first step on a long road. You’ll hit walls and when you do, hopefully something here will give you a shorter path to the other side.
Next up: Step CA, what you can do with it, and how I give every service on my network real TLS without a single warning!
