e sequence: Ingest β Segment β Grade β Report. Each stage is designed to run entirely on-premises, using open-weight models and deterministic parsing.
Architecture Decisions & Rationale
- LaTeX-Native Ingestion: Engineering and physics coursework heavily relies on LaTeX for mathematical notation, diagrams, and structured problem statements. Parsing raw PDFs introduces rendering ambiguity. By accepting
.tex source files, the pipeline preserves semantic structure, enabling precise section extraction and formula preservation.
- YAML-Driven Rubrics: Binary per-item scoring eliminates subjective grading drift. YAML provides a human-readable, version-controllable format that instructors can modify without touching code. Each rubric item maps to a specific learning objective, ensuring traceability.
- Local Chain-of-Thought Inference: Using
gpt-oss:120b hosted locally allows extended reasoning traces without token costs. The model compares student submissions against an instructor-authored reference solution, generating step-by-step validation before collapsing to a binary pass/fail per rubric item.
- Deterministic Reporting: Grading outputs are structured as JSON artifacts that feed directly into learning management systems (LMS) or custom dashboards. This enables automated feedback delivery, regrade requests, and audit logging.
Implementation (TypeScript)
The following implementation demonstrates the pipeline structure. It uses a modular class design, explicit interfaces, and local inference routing. Variable names and architecture differ from typical Python-based academic scripts to emphasize production-grade TypeScript patterns.
import { execSync } from 'child_process';
import { readFileSync, writeFileSync } from 'fs';
import { join } from 'path';
import { parse as parseYaml } from 'yaml';
// Domain interfaces
interface RubricItem {
id: string;
description: string;
weight: number;
binaryThreshold: number; // 0 or 1
}
interface Submission {
studentId: string;
texPath: string;
referencePath: string;
}
interface GradingResult {
studentId: string;
rubricScores: Record<string, number>;
totalScore: number;
reasoningTrace: string;
status: 'passed' | 'failed' | 'partial';
}
// Pipeline orchestrator
class LocalGradingPipeline {
private modelEndpoint: string;
private rubricConfig: RubricItem[];
constructor(modelUrl: string, rubricYamlPath: string) {
this.modelEndpoint = modelUrl;
const rawYaml = readFileSync(rubricYamlPath, 'utf-8');
this.rubricConfig = parseYaml(rawYaml).rubric_items;
}
// Stage 1: Ingest & Validate
private async ingestSubmission(submission: Submission): Promise<string> {
const texContent = readFileSync(submission.texPath, 'utf-8');
const refContent = readFileSync(submission.referencePath, 'utf-8');
// Basic LaTeX syntax validation
if (!texContent.includes('\\begin{document}') || !texContent.includes('\\end{document}')) {
throw new Error(`Invalid LaTeX structure for student ${submission.studentId}`);
}
return `${texContent}\n\n---REFERENCE_SOLUTION---\n${refContent}`;
}
// Stage 2: Segment & Prepare Prompt
private buildGradingPrompt(combinedContent: string): string {
const rubricBlock = this.rubricConfig
.map(r => `- [${r.id}] ${r.description} (Score: 0 or 1)`)
.join('\n');
return `You are an academic grader. Compare the student submission against the reference solution.
Evaluate each rubric item independently. Provide a chain-of-thought analysis, then output a JSON object with scores.
RUBRIC:
${rubricBlock}
CONTENT:
${combinedContent}
OUTPUT FORMAT:
{
"reasoning": "<step-by-step validation>",
"scores": { "<rubric_id>": 0|1, ... }
}`;
}
// Stage 3: Grade via Local Inference
private async invokeLocalModel(prompt: string): Promise<{ reasoning: string; scores: Record<string, number> }> {
// Simulates local Ollama/vLLM endpoint call
const payload = JSON.stringify({
model: 'gpt-oss:120b',
prompt: prompt,
temperature: 0.1,
max_tokens: 2048,
stream: false
});
const response = execSync(`curl -s -X POST ${this.modelEndpoint}/api/generate -d '${payload}'`);
const parsed = JSON.parse(response.toString());
// Extract structured output from model response
const jsonMatch = parsed.response.match(/\{[\s\S]*\}/);
if (!jsonMatch) throw new Error('Model failed to return structured JSON');
return JSON.parse(jsonMatch[0]);
}
// Stage 4: Report & Aggregate
private generateReport(submission: Submission, modelOutput: any): GradingResult {
const rubricScores: Record<string, number> = {};
let totalScore = 0;
for (const item of this.rubricConfig) {
const score = modelOutput.scores[item.id] ?? 0;
rubricScores[item.id] = score;
totalScore += score * item.weight;
}
const status = totalScore >= 0.8 ? 'passed' : totalScore >= 0.5 ? 'partial' : 'failed';
return {
studentId: submission.studentId,
rubricScores,
totalScore,
reasoningTrace: modelOutput.reasoning,
status
};
}
// Public execution method
async evaluate(submission: Submission): Promise<GradingResult> {
const combined = await this.ingestSubmission(submission);
const prompt = this.buildGradingPrompt(combined);
const modelOutput = await this.invokeLocalModel(prompt);
return this.generateReport(submission, modelOutput);
}
}
// Usage example
const pipeline = new LocalGradingPipeline('http://localhost:11434', './rubric_config.yaml');
const submission: Submission = {
studentId: 'ENG-2026-0841',
texPath: './submissions/0841_solution.tex',
referencePath: './references/me373_hw04_ref.tex'
};
pipeline.evaluate(submission).then(result => {
writeFileSync(`./reports/${result.studentId}_grade.json`, JSON.stringify(result, null, 2));
console.log(`Grading complete for ${result.studentId}: ${result.status} (${result.totalScore.toFixed(2)})`);
}).catch(err => console.error('Pipeline failure:', err.message));
Why this structure works:
- Separation of concerns: Ingestion, prompt construction, inference, and reporting are isolated. This enables independent testing, mock inference during development, and safe rubric updates.
- Deterministic scoring: Binary rubric items prevent grade inflation. The
weight field allows instructors to prioritize critical learning objectives without complicating the model's decision boundary.
- Local inference routing: The
curl-based endpoint call abstracts the underlying runtime (Ollama, vLLM, or llama.cpp). Swapping hardware or model versions requires zero code changes.
- Audit-ready output: JSON reports contain full reasoning traces, enabling instructors to verify grading logic and students to request targeted regrades.
Pitfall Guide
Local LLM grading introduces distinct failure modes that differ from cloud-based deployments. The following pitfalls are drawn from production deployments in compliance-heavy environments.
| Pitfall | Explanation | Fix |
|---|
| Unstructured Model Output | LLMs occasionally return markdown, extra text, or malformed JSON, breaking the parser. | Enforce strict JSON schema validation post-inference. Use regex extraction with fallback retry logic. Never trust raw model output. |
| Rubric Ambiguity | Vague rubric descriptions cause inconsistent binary scoring across submissions. | Write rubric items as observable, verifiable statements. Example: "Includes boundary condition derivation" instead of "Shows good understanding". |
| VRAM Exhaustion | 120B parameter models require substantial VRAM. Batch processing without memory management causes OOM crashes. | Implement queue-based submission processing. Use quantized weights (Q4_K_M) if VRAM is constrained. Monitor nvidia-sml or metal memory usage. |
| LaTeX Rendering Drift | Students use custom packages or undefined macros, causing compilation or parsing failures. | Strip non-essential packages during ingestion. Validate against a known-good preamble. Fail fast with clear error messages instead of silent degradation. |
| Chain-of-Thought Leakage | Extended reasoning traces may contain grading heuristics that students could reverse-engineer. | Separate the reasoning trace (internal) from the student-facing feedback. Only expose rubric scores and actionable comments. |
| Ignoring Partial Submissions | Incomplete LaTeX files or missing reference solutions cause pipeline hangs or false negatives. | Add pre-flight validation checks. Reject submissions missing required sections before inference. Log rejection reasons for TA review. |
| Over-Reliance on Single Model | Assuming gpt-oss:120b is infallible leads to unvalidated grading drift over time. | Implement periodic human-in-the-loop audits. Sample 5% of graded submissions weekly. Retrain or adjust prompts if error rate exceeds 0.05%. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small seminar (<50 students) | Local pipeline with Q4 quantization | Low concurrency, minimal VRAM needed, full compliance | $0 marginal, one-time hardware |
| Large lecture (200+ students) | Local pipeline with GPU cluster or Mac Studio array | Batch processing requires parallel inference queues | $0 marginal, hardware amortization |
| High-stakes final exam | Local pipeline + mandatory human audit | Regulatory scrutiny requires verifiable grading trails | +15% TA time for audit sampling |
| Iterative draft feedback | Local pipeline with relaxed temperature (0.3) | Encourages exploratory reasoning, faster feedback loops | $0 marginal, same hardware |
Configuration Template
Rubric Definition (rubric_config.yaml)
course_id: ME373_W26
assignment: hw04_dynamics
rubric_items:
- id: "DERIVATION"
description: "Includes complete free-body diagram and equation setup"
weight: 0.3
binary_threshold: 1
- id: "SOLUTION"
description: "Final numerical answer matches reference within 2% tolerance"
weight: 0.4
binary_threshold: 1
- id: "UNITS"
description: "All intermediate and final values include correct SI units"
weight: 0.15
binary_threshold: 1
- id: "SIGNIFICANCE"
description: "Reports results with appropriate significant figures"
weight: 0.15
binary_threshold: 1
Pipeline Configuration (pipeline_config.ts)
export const PipelineConfig = {
model: {
name: 'gpt-oss:120b',
endpoint: 'http://localhost:11434',
temperature: 0.1,
maxTokens: 2048
},
processing: {
maxConcurrency: 3,
retryAttempts: 2,
timeoutMs: 180000 // 3 minutes per submission
},
output: {
directory: './grading_reports',
format: 'json',
includeReasoning: true // Set false for student-facing exports
}
};
Quick Start Guide
- Install Local Inference Runtime: Download Ollama or vLLM. Run
ollama pull gpt-oss:120b or equivalent. Verify the service is listening on http://localhost:11434.
- Prepare Rubric & Reference: Create a
rubric_config.yaml file using the template above. Place instructor reference solutions in a dedicated directory.
- Initialize Pipeline: Clone the TypeScript implementation, install dependencies (
npm install yaml), and update pipeline_config.ts with your paths.
- Run Test Submission: Execute the pipeline against a single
.tex file. Verify JSON output matches expected rubric scores. Adjust prompt templates if parsing fails.
- Scale to Cohort: Deploy the queue system. Monitor VRAM usage and processing times. Begin batch grading weekly assignments. Audit 5% of outputs weekly for the first month.
Local LLM grading is no longer an experimental concept. It is a production-ready architecture that satisfies regulatory requirements, eliminates marginal costs, and delivers pedagogically superior feedback loops. By isolating data residency, enforcing deterministic rubrics, and leveraging local inference efficiently, institutions can scale automated grading without compromising compliance or academic integrity.