Context
I want to host public-facing applications on a server in my home, without compromising security. I realize containers might be one way to do this, and want to explore that route further.
Requirements
I want to run applications within containers such that they
- Must not be able to interfere with applications running on host
- Must not be able to interfere with other containers or applications inside them
- Must have no access or influence on other devices in the local network, or otherwise compromise the security of the network, but still accessible by devices via ssh.
Note: all of this within reason. I understand that sometimes there may be occasional vulnerabilities, like in kernel for example, that would eventually get fixed. Risks like this within reason I am willing to accept.
What I found so far
- Running containers in rootless mode: in other words, running the container daemon with an unprivileged host user
- Running applications in container under unprivileged users: the container user under which the container is ran should be unprivileged
- Networking: The container’s networking must be restricted. I am still not sure how to do this and shall explore it more, but would appreciate any resources.
Alternative solution
I have seen bubblewrap presented as an alternative, but it seems like it is not intended to be used directly in this manner, and information about using it for this is scarce.
I think the container piece is probably the least of your concerns here honestly. The biggest thing you’ll want to focus on is the ingress networking layer, but that won’t really be any different than if you were running the app normally. Generally exposing ports from your home network to the internet is not a great idea, and you try to use something like cloudflare or get a cheap cloud VPS with a reverse proxy connected to the container host via VPN.
But for general container security practice, what you mentioned is good. You could also look at the Docker CIS Benchmark for more good security practices. And container scanning tools like trivy or anchore syft/grype to identify vulnerabilities in your containers. But again this is secondary to the networking layer in my opinion.
Secure your network. Worry less about escalations in your containers. You’re thinking too deeply about what is essentially a rabbit hole with a dead end for the most part, and if you don’t understand why in the first place, you should read more to understand exactly what you’re afraid of.
If you’re thinking that on your personal home network (which should be reasonably secured anyway) that someone will get physical access, then get on your network and start scanning everything, then find the ports you have open on every host, then identify the specific versions of the http servers hosting your software, then run exploits to get past any authentication which should be there, THEN have superhax ready to escalate privileges on the container runtimes so they can run remote executions…that’s all they’ll be able to do unless you have volume mounts allowing access to your stuff everywhere in said containers.
If you live in fear of everything, you’ll get nothing done.
You already mentioned the most important things.
I will add, at the cost of being pedantic:
- build the image properly, or use good images. This means limit dependencies as much as possible, as minimal images as possible (less updates due to CVEs, less tooling).
- do not mount host volumes, if you really have to, use a dedicated subpath owned by the user of the container. Do not use homedirs etc.
- do not run in host namespaces, like host network etc. Use port mapping to send traffic to the container.
If you want to go hardcore:
- analyze your application, and if feasible, build and use a more restrictive seccomp profile compared to the default. This might limit additional syscalls that might be used during an exploitation but that your app doesn’t need.
- run falco on the node. Even with the default set of rules (nothing custom), many exploitation or posts-exploitation steps would be caught, such as “shell spawned” etc.
Never heard of falco, why would you recommended it over other similar solutions out there?
It’s the de-facto standard for runtime container security (sysdig is based on it). The only competitor afaik is aqua security’s tracee, which is way less mature. It is very well supporter, there are tons of rules maintained by the community and it is a CNCF project used by enterprise solutions (I.e., shouldn’t disappear overnight).
After you’ve gone through all the container hardening guides, cap off the exercise with OWASP’s docker recommendations.
My solution that took awhile to figure out is fantastic IMO. Docker containers unprivileged, with nobody permissions, with their own IPs on macvlan, with matching vlan and good firewall rules. A docker network proxy container, Traefik, Authelia, CrowdSec, and a CrowdSec Traefik Bouncer containers.
Both Docker and Podman pretty much handle all of those so I think you’re good. The last aspect about networking can easily be fixed with a few iptables/nftables/firewalld rules. One final addition could be NGINX in front of web services or something dedicated to handling web requests on the open Internet to reduce potential exploits in the embedded web servers in your apps. But other than that, you’ve got it all covered yourself.
There’s all the options needed to limit CPU usage, memory usage or generally prevent using up all the system’s resources in docker/podman-compose files as well.
If you want an additional layer of security, you could also run it all in a VM, so a container escape leads to a VM that does nothing else but run containers. So another major layer to break.
Quick check list for outward facing servers:
- Isolate them from your main network. If possible have then on a different public IP either using a VLAN or better yet with an entire physical network just for that - avoids VLAN hopping attacks and DDoS attacks to the server that will also take your internet down;
- If you’re using VLANs then configure your switch properly. Decent switches allows you to restrict the WebUI to a certain VLAN / physical port - this will make sure if your server is hacked they won’t be able to access the Switch’s UI and reconfigure their own port to access the entire network. Note that cheap TP-Link switches usually don’t have a way to specify this;
- Only expose required services (nginx, game server, program x) to the Internet. Everything else such as SSH, configuration interfaces and whatnot can be moved to another private network and/or a WireGuard VPN you can connect to when you want to manage the server;
- Use custom ports with 5 digits for everything - something like 23901 (up to 65535) to make your service(s) harder to find;
- Disable IPv6? Might be easier than dealing with a dual stack firewall and/or other complexities;
- Use nftables / iptables / another firewall and set it to drop everything but those ports you need for services and management VPN access to work - 10 minute guide;
- Use your firewall to restrict what countries are allowed to access your server. If you’re just doing it for a few friends only allow incoming connection from your country (https://wiki.nftables.org/wiki-nftables/index.php/GeoIP_matching)
Realistically speaking if you’re doing this just for a few friends why not require them to access the server through WireGuard VPN? This will reduce the risk a LOT and won’t probably impact the performance. This is a decent setup guide https://www.digitalocean.com/community/tutorials/how-to-set-up-wireguard-on-debian-11 and you might use this GUI to add/remove clients easily https://github.com/ngoduykhanh/wireguard-ui
Not to replace the great advice here but if you can use a distroless image (you likely need to make it yourself) then an attacker would have a hell of a time exploiting your system. When attackers find a weakness their goal is usually to gain access to a shell; distroless images don’t have one. By the time they figure this out (or hopefully before) you should’ve detected their presence.
Also, check your logs regularly. Prevention is good but it doesn’t replace monitoring.
Running a container as an unprivileged user with podman is already quite good. Even if they break out of the container, the attacker will now be an unprivileged user. You’ll have to look up how to secure users in linux (I don’t know how).
As for networking, that’s where the firewall comes in.
iptables
are supposedly superseded bynftables
. The easiest way to configure that is either with a GUI or withfirewalld
. If I’m not mistaken, basically, what you want to do is limit the unprivileged user to creating a network namespace with a certain IP range (not sure if a virtual network device is created? probably). Then you can use the firewall to say:- allow all incoming and outgoing connections from the gateway (whatever device is exposed to the public internet through which your computer connects and receives traffic)
- block all connections outside of network namespace to IPs in your home network unless the connection was established from that IP. In other words your container won’t be able to connect to devices in your home network unless those devices initiated the connection themselves
You can find more information about
iptables
on wikibooks. I cannot remember which table to use, but I think it’s the filter table.- rule1: INPUT chain ALLOW all from gateway
- rule2: OUTPUT chain ALLOW all to gateway
- rule3: INPUT chain ALLOW all from home network
- rule4: OUTPUT chain ALLOW all
ESTABLISHED
connection to the home network
Can’t think of anything else. But it might help to draw a diagram with the network traffic flows.
Containers are meant to simplify operational aspects of development and deployment. For proper isolation you should use virtual machines.
By default a container runs with network, storage and resources isolated from the host. What about this isolation is not “proper”?
Because OP is looking for security isolation, which isn’t what containers are for. Much like an umbrella stops rain, but not bullets. You fool.
I still don’t understand why you think containers aren’t adequate.
Say you break into a container, how would you break out?
Kernel exploits. Containers logically isolate resources but they’re still effectively running as processes on the same kernel sharing the same hardware. There was one of those just last year: https://blog.aquasec.com/cve-2022-0185-linux-kernel-container-escape-in-kubernetes
Virtual machines are a whole other beast because the isolation is enforced at the hardware level, so you have to exploit hardware vulnerabilities like Spectre or a virtual device like a couple years ago some people found a breakout bug in the old floppy emulation driver that still gets assigned to VMs by default in QEMU.
You don’t design security solutions on the premise that they’re not working.
Security comes in layers, so if you’re serious about security you do in fact plan for things like that. You always want to limit the blast radius if your security measures fail. And most of the big cloud providers do that for their container/kubernetes offerings.
If you run portainer for example and that one gets breached, that’s essentially free container escape because you can trick Docker into mounting and exposing what you need from the host to escape. It’s not uncommon for people to sometimes give more permissions than the container really needs.
It’s not like making a VM dedicated to running your containers cost anything. It’s basically free. I don’t do it all the time, but if it’s exposed to the Internet and there’s other stuff on the box I want to be hard to get into, like if it runs on my home server or desktop, then it definitely gets a VM.
Otherwise, why even bother putting your apps in containers? You could also just make the apps themselves fully secure and unbreachable. Why do we need a container for isolation? One should assume the app’s security measures are working, right?
If they can find a kernel exploit they might find a hardware exploit too. There’s no rational reason to assume containers are more likely to fail than VMs, just bias.
Oh and you can fix a kernel exploit with an update, good luck fixing a hardware exploit.
Now you’re probably going to tell me how a hardware exploit is so unlikely but since we’re playing make believe I can make it as likely it suits my argument, right?
Old thread, but case in point: https://snyk.io/blog/leaky-vessels-docker-runc-container-breakout-vulnerabilities/
The potential attack surface of a container will always be much larger than a VM, because a VM is its own kernel and own memory space, there’s no implicit sharing with the host only explicit message passing.
Disclaimer: I don’t know much about securing the container itself. The considerations I discuss here are mostly networking.
What I’ve personally been doing is using k3s with Cloudflare Tunnel (routed using DNS like in this documentation) as an ingress.
With Cloudflare Tunnel, if you create an application in front of it, you can require authentication and add a list of allowed emails.
I could replace k3s with a different Kubernetes distribution, and/or replace Cloudflare Tunnel with a different ingress (e.g., Tailscale Funnel or more common ingresses like nginx).
Why does it need to be public-facing? There may be solutions that don’t require exposing it to billions of people.
Security is always about layers. The more independent layers there are, the fewer the chances someone will break through all of them. There is no one technology that will make your hosting reasonably secure, it’s the combination of multiple.
You’ve already mentioned software ran inside an unprivileged sandbox.
There’s also:
- Sandbox ran unprivileged inside a VM
- VM ran inside unprivileged sandbox
- Firewall only allowing applications to open certain ports
- Server running all of that hosted by someone else on their network with their own abstractions
Easy solution: cloudflare tunnels