In 2026, running your own AI stack is no longer a weekend experiment for power users. A $8/month VPS, three open-source tools, and one Docker Compose file is all it takes to replace $100+/month in SaaS subscriptions — with full control over your data, no rate limits, and no per-token costs.
This guide walks you through deploying Ollama (local LLM inference), Open WebUI (ChatGPT-like interface), n8n (AI workflow automation), and PostgreSQL on your own server — production-ready, SSL-secured, and running in under 30 minutes.
What You'll Build
A fully functional AI stack you own:
- Ollama — local LLM inference (Llama 3, Mistral, Qwen)
- Open WebUI — ChatGPT-like interface connected to your models
- n8n — AI workflow automation with 1,400+ integrations
- PostgreSQL — persistent storage for workflows and chat history
- Traefik — reverse proxy with automatic SSL
Total cost: $6–20/month on Hetzner, DigitalOcean, or any Linux VPS.
Why Self-Host in 2026?
Data sovereignty. Your prompts, documents, and workflow logic never leave your server.
Cost at scale. ChatGPT Plus runs $20/user/month. Zapier hits $100+/month at moderate usage. One VPS replaces both for your entire team.
No rate limits. Your inference, your rules.
Prerequisites
- VPS with Ubuntu 22.04+, minimum 8GB RAM, 4 vCPUs (Hetzner CX32 = ~$8/month)
- Docker and Docker Compose v2 installed
- A domain pointing to your server
The Stack: docker-compose.yml
version: "3.9"
services:
traefik:
image: traefik:v3.0
command:
- "--providers.docker=true"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.le.acme.email=${ACME_EMAIL}"
- "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
- "--certificatesresolvers.le.acme.tlschallenge=true"
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- letsencrypt:/letsencrypt
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- OLLAMA_BASE_URL=http://ollama:11434
labels:
- "traefik.enable=true"
- "traefik.http.routers.n8n.rule=Host(`${N8N_DOMAIN}`)"
- "traefik.http.routers.webui.tls.certresolver=le"
- "traefik.http.services.webui.loadbalancer.server.port=8080"
depends_on:
- ollama
restart: unless-stopped
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: n8n
POSTGRES_USER: n8n
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
n8n:
image: n8nio/n8n:latest
environment:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=postgres
- DB_POSTGRESDB_DATABASE=n8n
- DB_POSTGRESDB_USER=n8n
- DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
- N8N_HOST=${N8N_DOMAIN}
- N8N_PROTOCOL=https
- WEBHOOK_URL=https://${N8N_DOMAIN}/
- N8N_BASIC_AUTH_ACTIVE=true
- N8N_BASIC_AUTH_USER=${N8N_USER}
- N8N_BASIC_AUTH_PASSWORD=${N8N_PASSWORD}
labels:
- "traefik.enable=true"
- "traefik.http.routers.n8n.rule=Host(`n8n.yourdomain.com`)"
- "traefik.http.routers.n8n.tls.certresolver=le"
depends_on:
- postgres
volumes:
- n8n_data:/home/node/.n8n
restart: unless-stopped
volumes:
ollama_data:
postgres_data:
n8n_data:
letsencrypt:yaml
Deploy in 4 Commands
# 1. Create your env file
cat > .env <<EOF
POSTGRES_PASSWORD=your_secure_password
N8N_USER=admin
N8N_PASSWORD=your_secure_password
N8N_DOMAIN=n8n.yourdomain.com
ACME_EMAIL=you@example.com
EOF
# 2. Start the stack
docker compose up -d
# 3. Pull your first model
docker compose exec ollama ollama pull llama3.2
# 4. Check everything is running
docker compose psyaml
Your AI interface is live at https://ai.yourdomain.com and your automation engine at https://n8n.yourdomain.com.
First Workflow: AI-Powered Document Summarizer
Once n8n is running, create this workflow in 5 minutes:
- Trigger — webhook or email
- HTTP Request node → call Ollama's API at
http://ollama:11434/api/generate - Set — format the response
- Send Email / Slack — deliver the summary
n8n and Ollama run on the same Docker network, so no external API calls — zero latency, zero cost per token.
Model Recommendations for This Hardware
Model RAM Required Best ForLlama 3.2 3B 4 GB Fast responses, basic tasksMistral 7B 6 GB General purpose, good qualityQwen 2.5 14B 12 GB High quality, needs 16GB+ VPS
Start with llama3.2 on any 8GB server. Upgrade when you hit quality limits.
Security Checklist
- Never expose Ollama port 11434 publicly — it has no auth by default
- Use
.envfiles, never hardcode passwords incompose.yml - Enable Open WebUI's user management for team access
- Set up automatic backups for the
postgres_dataandn8n_datavolumes
Frequently Asked Questions
Can I run this without a GPU?
Yes. CPU inference works for models up to 7B parameters. Responses are slower (5–15 seconds) but fully functional. A GPU (even a used RTX 3090) drops response time to under 1 second.
What's the difference from using the OpenAI API?
Your data stays on your server. No per-token costs. Models run offline. Tradeoff: frontier models like GPT-4o are still ahead in reasoning quality — use the self-hosted stack for 90% of tasks, keep a cloud API key for edge cases.
Can I connect n8n to external AI APIs too?
Yes. n8n has native OpenAI, Anthropic, and Google AI nodes. Mix local and cloud inference in the same workflow.
Ready to Deploy Your Own AI Stack?
This tutorial covers the foundation. If you want a pre-configured, production-ready version of this stack — with management, monitoring, and one-click deployment — explore our ready-made Docker stacks: