AztecProtocol / debug-e2e
Install for your project team
Run this command in your project directory to install the skill for your entire team:
mkdir -p .claude/skills/debug-e2e && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/4310" && unzip -o skill.zip -d .claude/skills/debug-e2e && rm skill.zip
Project Skills
This skill will be saved in .claude/skills/debug-e2e/ and checked into git. All team members will have access to it automatically.
Important: Please verify the skill by reviewing its instructions before using it.
Interactive debugging for failed e2e tests. Orchestrates the debugging session but delegates log reading to subagents to keep the main conversation clean. Use for ping-pong debugging sessions where you want to form and test hypotheses together with the user.
0 views
0 installs
Skill Content
---
name: debug-e2e
description: Interactive debugging for failed e2e tests. Orchestrates the debugging session but delegates log reading to subagents to keep the main conversation clean. Use for ping-pong debugging sessions where you want to form and test hypotheses together with the user.
argument-hint: <hash, PR, URL, or test name>
---
# E2E Test Debugging
Interactive debugging for failed e2e tests. This skill orchestrates the debugging session but **never reads logs directly** - it delegates to subagents to keep the conversation context clean.
## Invocation
The user can invoke this skill with:
- **CI log hash**: `/debug-e2e 343c52b17688d2cd`
- **PR number**: `/debug-e2e #19783` or `/debug-e2e 19783`
- **CI URL**: `/debug-e2e http://ci.aztec-labs.com/...`
- **Test name**: `/debug-e2e epochs_l1_reorgs` (for general investigation)
- **No argument**: `/debug-e2e` then ask the user what they want to debug
## When to Use
- Debugging flaky or failing e2e tests
- Investigating CI failures that need deep analysis
- When you want to collaborate with the user on forming hypotheses
- When comparing failed and successful runs
## When NOT to Use
- **Obvious assertion failures**: If the test output clearly shows `expected 5, got 3`, just investigate the code directly
- **Build/compilation errors**: Use standard debugging, not log analysis
- **Simple configuration issues**: Missing env vars, wrong paths, etc.
- **When user just wants a quick answer**: This skill is for interactive ping-pong debugging sessions
## Key Principle
**Never read logs directly in this conversation.** Logs can be 50k+ lines and would pollute the context. Instead:
1. Use `identify-ci-failures` subagent to find failures and download logs
2. Use `analyze-logs` subagent to deep-dive specific logs
3. Work with the summaries they return
## Workflow
### Step 1: Identify Failures
Spawn the `identify-ci-failures` subagent:
```
Use Task tool with subagent_type: "identify-ci-failures"
Prompt: "Identify CI failures for [PR number / CI URL / hash]"
```
This returns:
- List of failures with types
- Local file paths for downloaded logs (e.g., `/tmp/<hash>.log`)
- History URL for finding successful runs
### Step 2: Discuss with User
Present findings to the user:
- What tests failed?
- What type of failure (timeout, assertion, error)?
- Form initial hypotheses together
### Step 3: Deep Dive with analyze-logs
Spawn the `analyze-logs` subagent with the **local file path**:
```
Use Task tool with subagent_type: "analyze-logs"
Prompt: "Analyze /tmp/<hash>.log focusing on test '<test_name>'. Look for [specific thing based on hypothesis]"
```
For comparison:
```
Prompt: "Compare /tmp/<failed>.log with /tmp/<success>.log for test '<test_name>'. Find divergence points."
```
### Step 4: Refine Hypothesis
Based on the summary:
- Does the evidence support the hypothesis?
- What contradicts it?
- What new questions arise?
Discuss with user, then spawn another `analyze-logs` if needed.
### Step 5: Investigate Codebase
Once you have a theory, search the codebase:
- Use Grep to find where specific log messages are generated
- Read the code context around log emission points
- Trace execution paths
### Step 6: Suggest Fix or Local Test
Either:
- Propose a code fix based on findings
- Suggest running the test locally to verify:
```bash
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
```
## Hypothesis Formation
Take time to think deeply before proposing theories.
For each hypothesis:
1. **Clearly state the theory**: "The test fails because X happens when Y"
2. **Identify expected evidence**: "If this is correct, we should see log entries for Z"
3. **Ask analyze-logs to verify**: Spawn subagent to look for specific evidence
4. **Look for contradictions**: What would disprove this theory?
5. **Assign confidence**: high / medium / low based on evidence
Formulate multiple competing hypotheses when the cause is unclear.
## Investigation Principles
- **Be systematic**: Follow the workflow, don't jump to conclusions
- **Be evidence-based**: Every theory must be backed by log entries or code
- **Be critical**: Actively seek to disprove your own hypotheses
- **Be thorough**: Check timing, sequence, missing events, code context
- **Be clear**: Use specific timestamps and quotes from summaries
- **Be practical**: Suggest fixes that address root causes
## History Investigation
To understand when a test started failing:
1. Look for the `history:` marker at the **beginning** of the log file (first few lines)
2. The history shows recent runs of this exact test with PASSED/FAILED/FLAKED status:
```
01-23 17:10:11: PASSED (2614d91ec48f4047): ... (Author: commit message (#PR))
01-23 17:08:30: FLAKED (10d5f47f04025f1c): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))
01-23 16:51:21: FLAKED (512e978edff9e471): ... (code: 1) group:e2e-p2p-epoch-flakes (Author: commit message (#PR))
```
3. Identify the transition point where test started failing/flaking
4. Check the PR mentioned in the commit message to understand what changed
5. Download logs from both passing and failing runs to compare:
- Use hash from history (e.g., `2614d91ec48f4047` for passed, `10d5f47f04025f1c` for failed)
- `yarn ci dlog <hash> > /tmp/<hash>.log 2>&1` downloads the log to a local tmp file
**Important**: Do NOT use `gh run list` - the history in the log file is more accurate for this specific test.
## Local Test Running
To run tests locally for verification:
```bash
# Run specific test
yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With verbose logging
LOG_LEVEL=verbose yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With debug logging (very detailed)
LOG_LEVEL=debug yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
# With specific module logging
LOG_LEVEL='info; debug:sequencer,p2p' yarn workspace @aztec/end-to-end test:e2e <file>.test.ts -t '<test name>'
```
## Log Structure
### Timestamp Format
Logs use ISO timestamps: `2024-01-23T17:08:30.123Z` - useful for correlating events across nodes.
### Log Levels
- `ERROR` - Failures, exceptions
- `WARN` - Potential issues, recoverable problems
- `INFO` - Key events, state transitions
- `VERBOSE` - Detailed operational info
- `DEBUG` - Fine-grained debugging (very noisy)
### Component Prefixes
Log lines are prefixed with the component name (e.g., `aztec:sequencer`, `aztec:p2p`, `aztec:archiver`). These map to the **Key Packages** section in CLAUDE.md - use that as a reference for understanding what each component does.
## Multi-Node Debugging
E2E tests often spawn multiple nodes. Key tips:
### Identifying Nodes
- Look for node identifiers in log prefixes: `node-0`, `node-1`, `validator-0`, etc.
- Each node has its own log stream but they're interleaved in the combined output
- Ask `analyze-logs` to filter by node when needed
### Cross-Node Correlation
- Use timestamps to correlate events across nodes
- Look for message propagation: "Node A sends X" → "Node B receives X"
- Check for missing events: if Node A sent but Node B never received, that's a clue
### Common Multi-Node Issues
- **Network partition**: Nodes can't reach each other
- **Clock skew**: Timestamps don't align, causing validation failures
- **Split brain**: Nodes have divergent views of state
- **Message ordering**: Events arrive in unexpected order
## Common Failure Patterns
### Timeout Failures
- Action executed, expected reaction didn't occur
- Ask analyze-logs to find: what was the last action? What was expected?
- Check for blocked operations, missing events, stuck processes
### Assertion Failures
- Test expectation not met
- Compare actual vs expected values
- Trace back to find where the wrong value originated
### State Pollution
- Previous test left system in bad state
- Compare logs from the failing test with the same test in isolation
- Check `beforeEach`/`afterEach` hooks
### Race Conditions
- Timing-dependent failures
- Look for operations completing in different order
- Check for missing synchronization
## Example Session
```
User: Debug this flaky test: 343c52b17688d2cd
Claude: [Spawns identify-ci-failures subagent]
Returns: epochs_l1_reorgs.test.ts failed with timeout after 383s
Log downloaded to /tmp/343c52b17688d2cd.log
History shows test started flaking recently
Claude: The test "handles reorgs spanning epoch boundaries" timed out. Looking at
the history, this test has been flaking since yesterday. The failures correlate
with PR #19783 "adding MAXIMUM_GOSSIP_CLOCK_DISPARITY for p2p validation".
Let me check a passing run to compare. [Spawns analyze-logs for both]
User: What do you see as differences?
Claude: [Spawns analyze-logs with comparison prompt]
Summary shows: In failed run, node-2 never received epoch proof notification.
In passing run, all nodes received it within 2s of each other.
Hypothesis: The new clock disparity check may be rejecting valid messages
during epoch transitions when clocks drift slightly.
User: Can you check if there are any logs about clock validation?
Claude: [Spawns analyze-logs with specific prompt]
Found: "Rejecting gossip message: clock disparity 3.2s exceeds maximum 2s"
This confirms the hypothesis.
Claude: [Searches codebase for MAXIMUM_GOSSIP_CLOCK_DISPARITY]
Found in p2p/config.ts, default is 2 seconds. The PR added this validation
but epoch proofs can take longer to propagate. Recommend increasing the
disparity threshold for epoch-related messages.
```