╱╲ the pipeline
Three phases, one goal: reduce MTTR. Discovery scores flakiness locally. Collection captures telemetry on failure. Analysis classifies root causes and generates patches — all with human-in-the-loop approval.
Discovery
Local CLI that scores and ranks flaky tests using pass/fail volatility, retry rates, timing variance, change-independence, and recency-weighting. Zero telemetry artifacts required.
uv run glitch discover --workflow "CI" --output tableCollection
Captures comprehensive telemetry when tests fail: runner logs, charm logs, Kubernetes events, Ceph status, LXD state, and test artifacts — bundled as a single artifact for analysis.
uv run glitch collectAnalysis
Ingests flakiness scores + telemetry bundle to classify failures by root cause (flaky, charm-bug, test-bug, infrastructure, environment) with confidence scores, then generates patches.
uv run glitch analyze --artifact bundle.tar.gz╱╲ see it in action
[ asciinema demo coming soon ]
A terminal recording of glitch discovering flaky tests, collecting failure telemetry, and generating an automated patch — all in under 60 seconds.
╱╲ getting started
Prerequisites
Python 3.11+, uv, and a GitHub token with repo and actions:read scopes.
Installation
git clone https://github.com/MichaelThamm/glitch.git && cd glitchuv syncQuick Start
export GITHUB_TOKEN=ghp_your_token_hereScore and rank tests by flakiness using CI run history:
uv run glitch discover --repo owner/repo --workflow "CI"When tests fail in CI, capture everything needed for diagnosis:
uv run glitch collectFeed scores and artifacts into the analysis engine:
uv run glitch analyze --artifact bundle.tar.gz --scores discover.jsonConfiguration
glitch works out of the box, but you can customize collectors, scoring weights, and analysis thresholds. See the README for details.
╱╲ classification taxonomy
🎲 flaky
Non-deterministic; likely timing or environment sensitivity. Proposes retry policies and ordering guards.
🐛 charm-bug
Defect in the charm's logic or configuration. Generates a concrete patch — human approval required before landing.
🧪 test-bug
Defect in the test itself: bad assertion, wrong assumption. Patch generated with re-run verification.
🏗️ infrastructure
CI runner, Kubernetes, Ceph, or LXD-level issue. Files an annotated issue with reproduction steps.
🌐 environment
Transient external dependency: network, registry, upstream package. Logged for trend analysis.
❓ unknown
Insufficient signal to classify with confidence. Queued for human review with all available context.