Kubernetes Assistant: Fine-Tuned LLM

k8s-model-fine-tuning | Python / Machine Learning

Product:: k8s-model-fine-tuning
Document Type:: Python / Machine Learning
Last Publish Date:: February 2026
Tools Used:: Python, PyTorch, QLoRA, Unsloth, Claude API, Hugging Face, Ollama, WSL2

Overview

This project fine-tunes Llama 3.1 8B on a dataset of 2,000 Kubernetes examples across three task categories: YAML manifest generation, kubectl command explanation, and error troubleshooting. The goal was to produce a small, locally-runnable model that outperforms the base model on Kubernetes-specific tasks.

The fine-tuned model is available via Ollama as k8s-assistant.

The source code is on GitHub.

Results

Evaluated on a 30-example smoke test across all three categories:

Metric	Base Model	Fine-Tuned
YAML Validity (K8s fields)	0% (0/6)	83% (5/6)
kubectl Accuracy	80% (8/10)	100% (10/10)
Overall	73% (22/30)	97% (29/30)
Training Time	–	~10 hrs (RTX 2080)
Dataset Cost	–	~$15 (Claude API)

How It Works

Dataset generation runs on a Mac using the Claude API. A generation script produces 2,000 labelled examples covering the three task types, with a validation pipeline to filter out bad outputs. Total cost was around $15.

Training runs on a Windows PC via SSH and WSL2 Ubuntu, using QLoRA to fit training within the 8GB VRAM budget of an RTX 2080. Training took roughly 10 hours.

Inference uses Ollama. The fine-tuned model is converted to GGUF format and loaded as a local Ollama model.

This two-machine setup (Mac for scripting and API calls, PC for GPU training via SSH) is documented in detail in docs/SETUP.md.

Documentation

The repo includes full methodology docs covering environment setup, dataset generation, training configuration, evaluation methodology, and a comparison of fine-tuning vs. retrieval-augmented generation for this use case.

View Online

← Back to GitHub