Oracle Cloud Native Environment Assistant: Fine-Tuned LLM
- Product:
- ocne-model-fine-tuning
- Document Type:
- Python / Machine Learning
- Last Publish Date:
- March 2026
- Tools Used:
- Python, PyTorch, QLoRA, Hugging Face, Ollama, vLLM
Overview
This project fine-tunes Llama 3.1 8B Instruct on Oracle Cloud Native Environment Release 2 documentation. The result is a locally-runnable model that can answer questions about Oracle Cloud Native Environment CLI usage, cluster administration, architectural concepts, and quick start procedures.
The dataset was created without external APIs. Two automation scripts handle the work: one scrapes all nine sections of the Oracle Cloud Native Environment Release 2 documentation, and another generates Q&A pairs from the scraped content using a local Ollama model.
The source code is on GitHub.
Dataset
285 hand-curated Q&A pairs covering:
- CLI commands and flags
- Cluster administration procedures
- Architectural concepts
- Quick start and installation
The pairs are stored as JSONL and are included in the repository.
Training
Training uses 4-bit QLoRA, which keeps peak VRAM usage around 9 GB and completes in about 8 minutes on a 16 GB GPU (tested on an RTX 5080). Configuration: 10 epochs, learning rate 2e-4, LoRA rank 16, effective batch size 8. Final loss is typically 0.35–0.45.
The training script is self-contained with no cloud dependencies or proprietary software beyond the base model weights.
Inference
The model can be used in several ways after training:
- Python inference script with CUDA, MPS, and CPU support
- Direct LoRA adapter loading
- Ollama deployment after GGUF conversion
- vLLM for higher-throughput serving
A shell script handles the GGUF conversion and Ollama model registration.