Iwalton3 Builds sycofact To Catch Biased AI Replies

SycoFact is a lightweight four-billion-parameter model built to identify biased agreement and unsafe responses in large language outputs. The system scores replies across multiple safety categories while delivering a single composite rating for alignment quality.
Izzie Walton developed the tool using contrastive training data and activation geometry, completely removing the need for manual human labels. Operators running artificial intelligence locally can integrate this evaluator to catch problematic behavior before it affects downstream workflows.
Model Size: from 4.13GB & VRAM GPU: requirements vary
Detecting Harmful Agreement and Improving Output Quality
- Scans responses across six distinct safety and alignment dimensions.
- Detects escalating confirmation bias in multi-turn conversations.
- Generates optional, detailed feedback explaining each numerical score.
- Operates without human-labeled training data by using activation paths.
Teams managing automated content pipelines can use this evaluator to automatically flag unsafe patterns before they reach end users. By running the fast scoring mode locally, operators quickly filter out low-quality generations while maintaining strict control over their data processing steps.
Building an Open Evaluator from Scratch
The training process relied on steering a twenty-seven-billion parameter teacher model to create contrasting response pairs. All grading signals came directly from mathematical activation pathways rather than manual human review, which significantly reduced labeling overhead. The system currently focuses solely on safety classification rather than general quality ranking, and its performance naturally drops when handling specialized code verification or dense mathematical tasks.
'You can filter junk out of your training pipeline before it damages your model,'
noted the creator in a community update. A detailed technical writeup covering the complete geometric methodology remains scheduled for future publication.
Get sycofact on Hugging Face.