glitch — automated CI failure remediation

╱╲ the pipeline

Three phases, one goal: reduce MTTR. Discovery scores flakiness locally. Collection captures telemetry on failure. Analysis classifies root causes and generates patches — all with human-in-the-loop approval.

Phase 1

🔍 DISCOVERY

local · heuristic

→

Phase 2

📦 COLLECTION

CI-side · telemetry

→

Phase 3

🧠 ANALYSIS

classify + auto-fix

📊

Phase 1

Discovery

Local CLI that scores and ranks flaky tests using pass/fail volatility, retry rates, timing variance, change-independence, and recency-weighting. Zero telemetry artifacts required.

uv run glitch discover --workflow "CI" --output table

📦

Phase 2

Collection

Captures comprehensive telemetry when tests fail: runner logs, charm logs, Kubernetes events, Ceph status, LXD state, and test artifacts — bundled as a single artifact for analysis.

uv run glitch collect

🧠

Phase 3

Analysis

Ingests flakiness scores + telemetry bundle to classify failures by root cause (flaky, charm-bug, test-bug, infrastructure, environment) with confidence scores, then generates patches.

uv run glitch analyze --artifact bundle.tar.gz

╱╲ see it in action

📺

[ asciinema demo coming soon ]

A terminal recording of glitch discovering flaky tests, collecting failure telemetry, and generating an automated patch — all in under 60 seconds.

╱╲ getting started

Prerequisites

Python 3.11+, uv, and a GitHub token with repo and actions:read scopes.

Installation

git clone https://github.com/MichaelThamm/glitch.git && cd glitch

uv sync

Quick Start

Set your GitHub token

export GITHUB_TOKEN=ghp_your_token_here

Discover flaky tests

Score and rank tests by flakiness using CI run history:

uv run glitch discover --repo owner/repo --workflow "CI"

Collect telemetry on failure

When tests fail in CI, capture everything needed for diagnosis:

uv run glitch collect

Analyze and get patches

Feed scores and artifacts into the analysis engine:

uv run glitch analyze --artifact bundle.tar.gz --scores discover.json

Configuration

glitch works out of the box, but you can customize collectors, scoring weights, and analysis thresholds. See the README for details.

╱╲ classification taxonomy

🎲 flaky

Non-deterministic; likely timing or environment sensitivity. Proposes retry policies and ordering guards.

🐛 charm-bug

Defect in the charm's logic or configuration. Generates a concrete patch — human approval required before landing.

🧪 test-bug

Defect in the test itself: bad assertion, wrong assumption. Patch generated with re-run verification.

🏗️ infrastructure

CI runner, Kubernetes, Ceph, or LXD-level issue. Files an annotated issue with reproduction steps.

🌐 environment

Transient external dependency: network, registry, upstream package. Logged for trend analysis.

❓ unknown

Insufficient signal to classify with confidence. Queued for human review with all available context.

░▒▓█ g l i t c h █▓▒░ ╔══════════════════════════════════════════╗ ║ ▐▀▄ ▐▀▀▄ ▄▀▀ ▐▌▐▌ ▐▌▄ ▄▌ ▐▌ ▐▌ ║ ║ ▐▌ ▐▌ ▐▌ ▐▌ ▀▀▄ ▐▌▐▌ ▐▌▐▌▐▌ ▐▌ ▐▌ ║ ║ ▐▛▀▐▌ ▐▌ ▐▌ ▄▄▀ ▐▀▜▖ ▐▌ ▀ ▐▌ ▐▀▜▖▐▌ ║ ╚══════════════════════════════════════════╝