dotnet / ci-test-failures
Install for your project team
Run this command in your project directory to install the skill for your entire team:
mkdir -p .claude/skills/ci-test-failures && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3299" && unzip -o skill.zip -d .claude/skills/ci-test-failures && rm skill.zip
Project Skills
This skill will be saved in .claude/skills/ci-test-failures/ and checked into git. All team members will have access to it automatically.
Important: Please verify the skill by reviewing its instructions before using it.
Guide for diagnosing and fixing CI test failures using the DownloadFailingJobLogs tool. Use this when asked to investigate GitHub Actions test failures, download failure logs, or debug CI issues.
0 views
0 installs
Skill Content
---
name: ci-test-failures
description: Guide for diagnosing and fixing CI test failures using the DownloadFailingJobLogs tool. Use this when asked to investigate GitHub Actions test failures, download failure logs, or debug CI issues.
---
# CI Test Failure Diagnosis
This skill provides patterns and practices for diagnosing GitHub Actions test failures in the Aspire repository using the `DownloadFailingJobLogs` tool.
## Overview
When CI tests fail, use the `DownloadFailingJobLogs.cs` tool to automatically download logs and artifacts for failed jobs. This tool eliminates manual log hunting and provides structured access to test failures.
**Location**: `tools/scripts/DownloadFailingJobLogs.cs`
## Quick Start
### Step 1: Find the Run ID
Get the run ID from the GitHub Actions URL or use the `gh` CLI:
```bash
# From URL: https://github.com/microsoft/aspire/actions/runs/19846215629
# ^^^^^^^^^^
# run ID
# Or find the latest run on a branch
gh run list --repo microsoft/aspire --branch <branch-name> --limit 1 --json databaseId --jq '.[0].databaseId'
# Or for a PR
gh pr checks <pr-number> --repo microsoft/aspire
```
### Step 2: Run the Tool
```bash
cd tools/scripts
dotnet run DownloadFailingJobLogs.cs -- <run-id>
```
**Example:**
```bash
dotnet run DownloadFailingJobLogs.cs -- 19846215629
```
### Step 3: Analyze Output
The tool creates files in your current directory:
| File Pattern | Contents |
|--------------|----------|
| `failed_job_<n>_<job-name>.log` | Raw job logs from GitHub Actions |
| `artifact_<n>_<testname>_<os>.zip` | Downloaded artifact zip files |
| `artifact_<n>_<testname>_<os>/` | Extracted directory with .trx files, logs, binlogs |
## What the Tool Does
1. **Finds all failed jobs** in a GitHub Actions workflow run
2. **Downloads job logs** for each failed job
3. **Extracts test failures and errors** from logs using regex patterns
4. **Determines artifact names** from job names (pattern: `logs-{testShortName}-{os}`)
5. **Downloads test artifacts** containing .trx files and test logs
6. **Extracts artifacts** to local directories for inspection
## Example Workflow
```bash
# 1. Check failed jobs on a PR
gh pr checks 14105 --repo microsoft/aspire 2>&1 | Where-Object { $_ -match "fail" }
# 2. Get the run ID
$runId = gh run list --repo microsoft/aspire --branch davidfowl/my-branch --limit 1 --json databaseId --jq '.[0].databaseId'
# 3. Download failure logs
cd tools/scripts
dotnet run DownloadFailingJobLogs.cs -- $runId
# 4. Search for errors in downloaded logs
Get-Content "failed_job_0_*.log" | Select-String -Pattern "error|Error:" -Context 2,3 | Select-Object -First 20
# 5. Check .trx files for test failures
Get-ChildItem -Recurse -Filter "*.trx" | ForEach-Object {
[xml]$xml = Get-Content $_.FullName
$xml.TestRun.Results.UnitTestResult | Where-Object { $_.outcome -eq "Failed" }
}
```
## Understanding Job Log Output
The tool prints a summary for each failed job:
```
=== Failed Job 1/1 ===
Name: Tests / Integrations macos (Hosting.Azure) / Hosting.Azure (macos-latest)
ID: 56864254427
URL: https://github.com/microsoft/aspire/actions/runs/19846215629/job/56864254427
Downloading job logs...
Saved job logs to: failed_job_0_Tests___Integrations_macos__Hosting_Azure____Hosting_Azure__macos-latest_.log
Errors found (2):
- System.InvalidOperationException: Step 'provision-api-service' failed...
```
## Searching Downloaded Logs
### Find Errors in Job Logs
```powershell
# PowerShell
Get-Content "failed_job_*.log" | Select-String -Pattern "error|Error:" -Context 2,3
# Bash
grep -i "error" failed_job_*.log | head -50
```
### Find Build Failures
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "Build FAILED|error MSB|error CS"
```
### Find Test Failures
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "Failed!" -Context 5,0
```
### Check for Disk Space Issues
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "No space left|disk space"
```
### Check for Timeout Issues
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "timeout|timed out|Timeout"
```
## Using GitHub API for Annotations
Sometimes job logs aren't available (404). Use annotations instead:
```bash
gh api repos/microsoft/aspire/check-runs/<job-id>/annotations
```
This returns structured error information even when full logs aren't downloadable.
## Common Failure Patterns
### Disk Space Exhaustion
**Symptom:** `No space left on device` in annotations or logs
**Diagnosis:**
```powershell
gh api repos/microsoft/aspire/check-runs/<job-id>/annotations 2>&1
```
**Common fixes:**
- Add disk cleanup step before build
- Use larger runner (e.g., `8-core-ubuntu-latest`)
- Skip unnecessary build steps (e.g., `/p:BuildTests=false`)
### Command Not Found
**Symptom:** `exit code 127` or `command not found`
**Diagnosis:**
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "command not found|exit code 127" -Context 3,1
```
**Common fixes:**
- Ensure PATH includes required tools
- Use full path to executables
- Install missing dependencies
### Test Timeout
**Symptom:** Test hangs, then fails with timeout
**Diagnosis:**
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "Test host process exited|Timeout|timed out"
```
**Common fixes:**
- Increase test timeout
- Check for deadlocks in test code
- Review Heartbeat.cs output for resource exhaustion
### Build Failure
**Symptom:** `Build FAILED` or MSBuild errors
**Diagnosis:**
```powershell
Get-Content "failed_job_*.log" | Select-String -Pattern "error CS|error MSB|Build FAILED" -Context 0,3
```
**Common fixes:**
- Check for missing project references
- Verify package versions
- Download and analyze .binlog from artifacts
## Artifact Contents
Downloaded artifacts typically contain:
```
artifact_0_TestName_os/
├── testresults/
│ ├── TestName_net10.0_timestamp.trx # Test results XML
│ ├── Aspire.*.Tests_*.log # Console output
│ └── recordings/ # Asciinema recordings (CLI E2E tests)
├── *.crash.dmp # Crash dump (if test crashed)
└── test.binlog # MSBuild binary log
```
## Parsing .trx Files
```powershell
# Find all failed tests in .trx files
Get-ChildItem -Path "artifact_*" -Recurse -Filter "*.trx" | ForEach-Object {
Write-Host "=== $($_.Name) ==="
[xml]$xml = Get-Content $_.FullName
$xml.TestRun.Results.UnitTestResult | Where-Object { $_.outcome -eq "Failed" } | ForEach-Object {
Write-Host "FAILED: $($_.testName)"
Write-Host $_.Output.ErrorInfo.Message
Write-Host "---"
}
}
```
## Tips
### Clean Up Before Running
```powershell
Remove-Item *.log -Force -ErrorAction SilentlyContinue
Remove-Item *.zip -Force -ErrorAction SilentlyContinue
Remove-Item -Recurse artifact_* -Force -ErrorAction SilentlyContinue
```
### Run From tools/scripts Directory
The tool creates files in the current directory, so run it from `tools/scripts` to keep things organized:
```bash
cd tools/scripts
dotnet run DownloadFailingJobLogs.cs -- <run-id>
```
### Don't Commit Log Files
The downloaded log files can be large. Don't commit them to the repository:
```bash
# Before committing
rm tools/scripts/*.log
rm tools/scripts/*.zip
rm -rf tools/scripts/artifact_*
```
## Prerequisites
- .NET 10 SDK or later
- GitHub CLI (`gh`) installed and authenticated
- Access to the microsoft/aspire repository
## See Also
- `tools/scripts/README.md` - Full documentation
- `tools/scripts/Heartbeat.cs` - System monitoring tool for diagnosing hangs
- `.github/skills/cli-e2e-testing/SKILL.md` - CLI E2E test troubleshooting