ai-ta

Autonomous Infrastructure Triage Agent

AI investigates. Humans decide. Infrastructure learns.

The problem

Infrastructure runs 24/7. Triage rarely does.

Investigation work doesn't scale linearly with alert volume. Recurring patterns get re-investigated from scratch, hard-won lessons walk out with the people who learned them, and outside business hours someone is either on-call or the alert waits. NIS2 Article 20 makes the board personally liable for incident-response capability that has to hold up at every hour — not just the staffed ones.

How it works

Alert → Investigate → Resolve → Learn → Repeat

AlertGrafana / Any webhook

→

InvestigateSSH, metrics, logs, Docker

→

Resolve or Escalatedestructive = human approval

→

Learnknowledge accumulates

→

Repeat24/7

	Without ai-ta	With ai-ta
Off-hours alert	Wake an engineer or wait until morning	Agent investigates, requests approval if needed, leaves a morning summary
Recurring issue	Reinvestigated from scratch every time	Auto-resolved in milliseconds from learned knowledge — no LLM cost
NIS2 audit	Scramble to reconstruct timeline	Timestamped trail: detection, investigation, action, approval
New team member	Weeks of shadowing for tribal knowledge	Queries accumulated knowledge from any AI tool

Dashboard — situation briefing, knowledge auto-resolve, triage history

Settings — integrations, governance, single YAML config

By the numbers

Real data from production

251Triages last 30dfully autonomous

84Auto-resolvedzero human effort

~55sAvg investigationvs 30-45 min manual

~$52Monthly LLM costall-in, 251 triages

99.91%Uptime (YTD)production SLA

Compliance

NIS2 & GDPR ready

Incident handling (Art. 21.2b)	Full lifecycle: detect, investigate, respond, document
Continuous monitoring (Art. 21.2a)	24/7 autonomous triage with trend detection
Incident reporting (Art. 23)	24h early warning, 72h detail, monthly summary
Supply chain oversight (Art. 21.2d)	Independent monitoring of MSP-managed infra
Board accountability (Art. 20)	AI policy version-controlled, commit-stamped
GDPR data minimization	Configurable retention + automated purge per data type

Integration

Works with what you have

MonitorPrometheusGrafanaLokiHealthchecks.io

InfraDockerKubernetesOpenShift

NotifySlackTeamsServiceNowntfyEmail

AIMCP protocolCode modeMCP elicitation

AuthOIDCBearer tokensApproval gates

Where operations is going

Software stops being the constraint

The operator's job becomes steering processes, owning accountability, and managing risk — not navigating dashboards. Three shifts are converging:

Machine-to-machine by default

Monitoring talks to triage, triage talks to remediation, remediation talks to approval gates. Humans intervene at decision points, not execution points. Contracts and schemas enforce determinism between autonomous systems.

Frontends become contextual projections

What matters isn't how the UI looks — it's that the right information reaches the right person at the right moment. A morning email, a mobile approval, an MCP query from another agent — all valid interfaces.

Governance becomes the product

When AI acts autonomously, the value is the audit trail, the approval gates, the policy enforcement, the explainability. The strictest compliance environments aren't obstacles — they're the reason it must be built this way.

ai-ta is built for this future

Governance-first autonomous operations

Contract-driven

MCP protocol: any AI agent queries and acts on infrastructure knowledge through structured contracts. Not screen scrapes. Not API wrappers.

Process-native

The same triage result renders as an HTML email, an approval webhook, an MCP response, or a terminal session — whatever the process demands.

Self-learning

Every triage cycle feeds the next. Known patterns resolved faster. Cross-run trends flagged before alerts fire. Knowledge survives team turnover.

Compliance-first

Every action timestamped. Every decision traceable. Every destructive command gated. Every AI execution governed by version-controlled organizational policy.

Cross-industry

Declarative, pluggable governance. A maritime operator, a hospital, and a fintech run the same agent with different policy files.

Scope & boundaries

ai-ta contains incidents. It never makes irreversible decisions.

Observes across every layer. Contains with human approval. The boundary is reversibility and blast radius.

What ai-ta does
Observe	L7 edge (Cloudflare WAF), L3/L4 perimeter (firewall), L2 network, host & container — all layers, no limits
Contain	Block at edge, isolate at network, restart services — every action human-approved, auto-expiring
Learn	Accumulate knowledge, detect patterns, skip known-good investigations, flag trends
Document	Full audit trail from detection to resolution — timestamped, traceable, exportable

What ai-ta never touches
Data	No database operations, backup restores, or storage modifications
Identity	No credential rotation, access policy, or permission changes
DNS	Slow to reverse, high blast radius — stays with the human
Hypervisor	Bare metal and VM lifecycle are last-resort human decisions

ai-ta component topology — single binary, single config, zero orchestration dependencies

Single container

One Go binary, one YAML config, one SQLite database
Deploys in minutes — Docker, Kubernetes, or OpenShift
Your data stays in your network
AI governance from a git repo your CISO controls
Helm chart ready for enterprise orchestrators

Get started

Deploy in minutes. No sales call.

ai-ta is source-available and free to evaluate. Pull the image, point it at your infrastructure, get your first triage report. Subscribe when you're ready for production support.

Evaluate

Free, no agreement

docker compose up. Connect Grafana. First triage report in 15 minutes. Full product, no feature gates.

Production support

Priority updates, security patches, guaranteed LLM compatibility. Per-host or flat-rate pricing.

Enterprise

Custom SLA

Dedicated onboarding, governance co-development, NIS2 compliance review, DPA.

AITA for letting an AI handle my on-call?

NTA. Your infra, your rules.

Live demo on running infrastructure. Not slides — the real system investigating real alerts.

Download pitch deck (PDF)

Let's talk

Ready to see autonomous infrastructure triage on your infrastructure?