Original frameworks, architectures, and field notes from 28 years of shipping AI to production. Every artifact below is open-source, code-first, and battle-tested before it was named.
Production AI evaluation and security primitives we've designed, implemented, and released as open source.
Framework · 2025An evaluation framework that scores every LLM response across Groundedness, Accuracy, Reliability, Variance, Inference cost, and Safety. Composite score, sub-5ms heuristic implementation, weight profiles for enterprise / healthcare / finance / consumer / agentic. Released as pip install argus-ai.
Three metrics for autonomous workflow monitoring that BLEU, ROUGE, and perplexity were never designed to measure. Agent Stability Factor, Error Recovery Rate, Cost Per Completed Step — production thresholds for each.
An end-to-end agent security architecture: input sanitizer (zero-permission isolate) → injection detector (BERT + pattern catalog) → compartmentalized RBAC (zero trust for agents) → human confirmation gate (async, multi-channel) → encrypted audit trail (Fernet, 7-year retention). MCP-native, vendor-neutral. Released as pip install bulwark-agent-security.
Long-form writing on production AI engineering — what the failure modes actually look like, and how the architectural choices land.
Subscribe on Medium for new field notes on production agentic AI, evaluation, and security.
Subscribe on Medium →Every framework, every architecture, every paper — code-first, Apache 2.0, on GitHub.
OSS · Apache 2.0Five-layer agent security framework. Sanitizer + detector + RBAC + audit + human gate. pip install bulwark-agent-security
Reference implementation of G-ARVIS. Six-dimension LLM scoring + agentic metrics. pip install argus-ai
MLOps platform for the production model lifecycle: experiment tracking, model registry, automated evaluation pipelines, drift detection, governance dashboards.
PyTorch production framework — standardized training loops, model registry, one-command deployment. Built because most PyTorch models never reach production.
Explainable AI framework for regulated industries — human-readable explanations with audit-grade traceability for healthcare, finance, and compliance environments.
Vitess-based horizontal MySQL sharding for healthcare and genomics. Zero-downtime migrations from monolithic to sharded production.