#0843
`Forge` Pushes Local 8B Agent Reliability Near Frontier APIs
70radar
ForgeLLM guardrail runtime — improves local tool-call reliability
Guardrails, not bigger weights, drive the jump. The useful takeaway is architectural: retries, recovery, and serving backend choice can matter more than model size.
Ministral 8BwithForgehit 99.3%, versusClaude Sonnetwith guardrails at 100% across the reported eval setup.- Without retry nudges, scores dropped 24-49 points. Reliability work belongs in the agent runtime, not only in model selection.
- Serving backend changed the same
Mistral-Nemo 12Bweights from 7% onllama-servernative function calling to 83% onLlamafileprompt mode. - Error recovery scored 0% for every tested model without retry logic. Tool agents need explicit recovery paths before production use.
Source: github.com/antoinezambelli/forgeRead original →