Why This Matters
When an LLM does not do what you want, you have two options: change the input (prompt engineering) or change the model (fine-tuning). Prompt engineering is fast, cheap, and requires no training data. Fine-tuning is slower and more expensive, but it can teach the model behaviors that no prompt can replicate.
Knowing when to prompt and when to fine-tune is a critical engineering decision. Get it wrong, and you either waste months fine-tuning when a good prompt would have worked, or you burn tokens on elaborate prompts when a quick fine-tune would have been more reliable and cheaper at scale.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
Two paths: prompting (fast, no training) or fine-tuning (custom behavior, requires data).
Code Example
// Decision framework: prompting vs fine-tuning
function shouldFineTune(scenario) {
const scores = {
promptQuality: scenario.promptAccuracy >= 0.9 ? -2 : 2,
dataAvailable: scenario.labeledExamples >= 100 ? 1 : -2,
costSensitive: scenario.queriesPerDay >= 10000 ? 2 : -1,
formatCritical: scenario.needsExactFormat ? 2 : 0,
domainSpecific: scenario.specializedDomain ? 1 : 0,
};
const total = Object.values(scores).reduce((a, b) => a + b, 0);
return {
recommendation: total > 0 ? "Fine-tune" : "Keep prompting",
score: total,
breakdown: scores
};
}
// Example scenarios
console.log(shouldFineTune({
promptAccuracy: 0.95,
labeledExamples: 50,
queriesPerDay: 100,
needsExactFormat: false,
specializedDomain: false
}));
// -> Keep prompting (prompts already work well)
console.log(shouldFineTune({
promptAccuracy: 0.7,
labeledExamples: 5000,
queriesPerDay: 50000,
needsExactFormat: true,
specializedDomain: true
}));
// -> Fine-tune (prompts insufficient, data available)
// LoRA parameter savings
function loraParameters(modelParams, rank) {
// LoRA adds two small matrices per layer: A (d x r) and B (r x d)
// Instead of updating all d x d weights
const d = Math.sqrt(modelParams); // simplified
const fullParams = d * d;
const loraParams = 2 * d * rank;
console.log(`Full fine-tuning: ${fullParams.toLocaleString()} params`);
console.log(`LoRA (rank ${rank}): ${loraParams.toLocaleString()} params`);
console.log(`Reduction: ${(loraParams / fullParams * 100).toFixed(2)}%`);
}
loraParameters(1000000, 8); // 1M param layer, rank 8Interactive Experiment
Try these exercises:
- Pick a task you have used an LLM for. Score it on the decision framework above. Does it suggest prompting or fine-tuning?
- Calculate: if your prompt template is 1,500 tokens and you make 10,000 queries/day at $0.03/1K tokens, what is your monthly cost? How much would you save if fine-tuning eliminated the template (reducing to 200 tokens per query)?
- LoRA uses rank-8 matrices by default. If a layer has 4,096 input and output dimensions, how many parameters does full fine-tuning change vs. LoRA?
- What happens if you fine-tune on bad training data? How would the model behave?
Quick Quiz
Coding Challenge
Write a function called `compareCosts` that takes: promptTokensPerQuery, fineTunedTokensPerQuery, queriesPerMonth, costPerToken, and trainingCost. Return an object with promptMonthlyCost, fineTunedMonthlyCost, and breakEvenMonths (how many months until fine-tuning becomes cheaper, or -1 if it never does).
Real-World Usage
The prompting-vs-fine-tuning decision shapes every production LLM system:
- OpenAI fine-tuning API: Companies fine-tune GPT models to match their brand voice, classification schemas, or domain terminology without including examples in every prompt.
- LoRA in open source: Hugging Face hosts thousands of LoRA adapters for Llama and Mistral, each specializing the base model for a different task (SQL generation, medical Q&A, code review).
- RLHF alignment: ChatGPT and Claude are fine-tuned with Reinforcement Learning from Human Feedback to be helpful, harmless, and honest — behaviors that cannot be achieved by prompting alone.
- Domain adaptation: Legal, medical, and financial companies fine-tune models on proprietary data to handle specialized terminology and reasoning patterns.
- Cost optimization: Companies that start with long prompt templates often switch to fine-tuned models as query volume grows, reducing per-query cost by 5-10x.