Kubernetes Assistant: Fine-Tuned LLM
- Product:
- k8s-model-fine-tuning
- Document Type:
- Python / Machine Learning
- Last Publish Date:
- February 2026
- Tools Used:
- Python, PyTorch, QLoRA, Unsloth, Claude API, Hugging Face, Ollama, WSL2
Overview
This project fine-tunes Llama 3.1 8B on a dataset of 2,000 Kubernetes examples across three task categories: YAML manifest generation, kubectl command explanation, and error troubleshooting. The goal was to produce a small, locally-runnable model that outperforms the base model on Kubernetes-specific tasks.
The fine-tuned model is available via Ollama as k8s-assistant.
The source code is on GitHub.
Results
Evaluated on a 30-example smoke test across all three categories:
| Metric | Base Model | Fine-Tuned |
|---|---|---|
| YAML Validity (K8s fields) | 0% (0/6) | 83% (5/6) |
| kubectl Accuracy | 80% (8/10) | 100% (10/10) |
| Overall | 73% (22/30) | 97% (29/30) |
| Training Time | – | ~10 hrs (RTX 2080) |
| Dataset Cost | – | ~$15 (Claude API) |
How It Works
Dataset generation runs on a Mac using the Claude API. A generation script produces 2,000 labelled examples covering the three task types, with a validation pipeline to filter out bad outputs. Total cost was around $15.
Training runs on a Windows PC via SSH and WSL2 Ubuntu, using QLoRA to fit training within the 8GB VRAM budget of an RTX 2080. Training took roughly 10 hours.
Inference uses Ollama. The fine-tuned model is converted to GGUF format and loaded as a local Ollama model.
This two-machine setup (Mac for scripting and API calls, PC for GPU training via SSH) is documented in detail in docs/SETUP.md.
Documentation
The repo includes full methodology docs covering environment setup, dataset generation, training configuration, evaluation methodology, and a comparison of fine-tuning vs. retrieval-augmented generation for this use case.