Starlitnightly / single-trajectory-analysis

Guide to reproducing OmicVerse trajectory workflows spanning PAGA, Palantir, VIA, velocity coupling, and fate scoring notebooks.

0 views
0 installs

Skill Content

---
name: single-trajectory-analysis
title: Single-trajectory analysis
description: "Trajectory & RNA velocity: PAGA, Palantir, VIA, dynamo, scVelo, latentvelo, graphvelo backends via ov.single.Velo. Pseudotime, stream plots."
---

# Single-trajectory analysis skill

## Overview
This skill describes how to reproduce and extend the single-trajectory analysis workflow in `omicverse`, combining graph-based trajectory inference, RNA velocity coupling, and downstream fate scoring notebooks.

## Trajectory setup
- **PAGA (Partition-based graph abstraction)**
  - Build a neighborhood graph (`pp.neighbors`) on the preprocessed AnnData object.
  - Use `tl.paga` to compute cluster connectivity and `tl.draw_graph` or `tl.umap` with `init_pos='paga'` for embedding.
  - Interpret edge weights to prioritize branch resolution and seed paths.
- **Palantir**
  - Run `Palantir` on diffusion components, seeding with manually selected start cells (e.g., naïve T cells).
  - Extract pseudotime, branch probabilities, and differentiation potential for subsequent overlays.
- **VIA**
  - Execute `via.VIA` on the kNN graph to identify lineage progression with automatic root selection or user-defined roots.
  - Export terminal states and pseudotime for cross-validation against PAGA and Palantir results.

## Velocity coupling (VIA + scVelo)
- Use `scv.pp.filter_and_normalize`, `scv.pp.moments`, and `scv.tl.velocity` to generate velocity layers.
- Provide VIA with `adata.layers['velocity']` to refine lineage directionality (`via.VIA(..., velocity_weight=...)`).
- Compare VIA pseudotime with scVelo latent time (`scv.tl.latent_time`) to validate directionality and root selection.

## Advanced RNA Velocity Backends (ov.single.Velo)

OmicVerse provides a unified `Velo` class wrapping 4 velocity backends. Use this when you need more than basic scVelo:

### Backend selection guide

| Backend | Best for | GPU? | Prerequisites |
|---------|----------|------|---------------|
| **scvelo** | Standard velocity analysis | No | spliced/unspliced layers |
| **dynamo** | Kinetics modeling, vector fields | No | spliced/unspliced layers |
| **latentvelo** | VAE-based, batch correction, complex dynamics | Yes (torchdiffeq) | celltype_key, batch_key optional |
| **graphvelo** | Refinement layer on top of any backend | No | base velocity + connectivities |

### Unified Velo pipeline

```python
import omicverse as ov

velo = ov.single.Velo(adata)

# 1. Filter (scvelo backend) or preprocess (dynamo backend)
velo.filter_genes(min_shared_counts=20)     # For scvelo
# velo.preprocess(recipe='monocle', n_neighbors=30, n_pcs=30)  # For dynamo

# 2. Compute moments
velo.moments(backend='scvelo', n_pcs=30, n_neighbors=30)
# backend: 'scvelo' or 'dynamo'

# 3. Fit kinetic parameters
velo.dynamics(backend='scvelo')

# 4. Calculate velocity
velo.cal_velocity(method='scvelo')
# method: 'scvelo', 'dynamo', 'latentvelo', 'graphvelo'

# 5. Build velocity graph and project to embedding
velo.velocity_graph(basis='umap')
velo.velocity_embedding(basis='umap')
```

### latentvelo specifics (deep learning velocity)

latentvelo uses a VAE + neural ODE to learn latent dynamics. It handles batch effects and complex trajectories better than classical scVelo:

```python
velo.cal_velocity(
    method='latentvelo',
    celltype_key='cell_type',    # Optional: AnnotVAE uses cell type info
    batch_key='batch',           # Optional: batch correction
    velocity_key='velocity_S',
    n_top_genes=2000,
    latentvelo_VAE_kwargs={},    # Pass custom VAE hyperparameters
)
# Requires: pip install torchdiffeq
# Uses GPU if available, falls back to CPU
```

### graphvelo specifics (refinement layer)

GraphVelo refines velocity estimates from any base method by leveraging the cell graph structure. Run it after scvelo or dynamo:

```python
# First: compute base velocity with scvelo or dynamo
velo.cal_velocity(method='scvelo')

# Then: refine with graphvelo
velo.graphvelo(
    xkey='Ms',                          # Spliced moments key
    vkey='velocity_S',                  # Base velocity key to refine
    basis_keys=['X_umap', 'X_pca'],    # Project to multiple embeddings
    gene_subset=None,                   # Optional: restrict to gene subset
)
```

## Downstream fate scoring notebooks
- **CellFateGenie**: For pseudotime-associated gene discovery, use `search_skills('CellFateGenie fate genes')` to load the dedicated CellFateGenie skill.
- **`t_metacells.ipynb`**: Aggregate metacell trajectories for robustness checks and meta-state differential expression.
- **`t_cytotrace.ipynb`**: Integrate CytoTRACE differentiation potential with velocity-informed lineages for maturation scoring.

## Required preprocessing
1. Quality control: remove low-quality cells/genes, apply doublet filtering.
2. Normalization & log transformation (`sc.pp.normalize_total`, `sc.pp.log1p`).
3. Highly variable gene selection tailored to immune datasets (`sc.pp.highly_variable_genes`).
4. Batch correction if necessary (e.g., `scvi-tools`, `bbknn`).
5. Compute PCA, neighbor graph, and embedding (UMAP/FA) used by all trajectory methods.
6. For velocity: compute moments on the same neighbor graph before running VIA coupling.

## Parameter tuning
- Neighbor graph `n_neighbors` and `n_pcs` should be harmonized across PAGA, VIA, and Palantir to maintain consistency.
- In VIA, adjust `knn`, `too_big_factor`, and `root_user` for datasets with uneven sampling.
- Palantir requires careful start cell selection; use marker genes and velocity arrows to confirm.
- For PAGA, tweak `threshold` to control edge sparsity; ensure connected components reflect biological branches.
- Velocity estimation: compare `mode='stochastic'` vs `mode='dynamical'` in scVelo; recalibrate if terminal states disagree with VIA.

## Visualization and export
1. Overlay PAGA edges on UMAP (`scv.pl.paga`) and annotate branch labels.
2. Plot Palantir pseudotime and branch probabilities on embeddings.
3. Visualize VIA trajectories using `via.plot_fates` and `via.plot_scatter`.
4. Export pseudotime tables and fate probabilities to CSV for downstream notebooks.
5. Save high-resolution figures (PNG/SVG) and notebook artifacts for reproducibility.
6. Update notebooks with consistent color schemes and metadata columns before sharing.

## Defensive Validation Patterns

```python
# Before PAGA: verify neighbor graph exists
assert 'neighbors' in adata.uns, "Neighbor graph required. Run sc.pp.neighbors(adata) first."

# Before VIA velocity coupling: verify velocity layers exist
if 'velocity' not in adata.layers:
    print("WARNING: velocity layer missing. Run scv.tl.velocity(adata) first for VIA coupling.")
assert 'spliced' in adata.layers and 'unspliced' in adata.layers, \
    "Missing spliced/unspliced layers. Check loom/H5AD import preserved velocity layers."

# Before Palantir: verify PCA/diffusion components
assert 'X_pca' in adata.obsm, "PCA required. Run ov.pp.pca(adata) first."
```

## Troubleshooting tips
- **Missing velocity layers**: re-run `scv.pp.moments` and `scv.tl.velocity` ensuring `adata.layers['spliced']`/`['unspliced']` exist; verify loom/H5AD import preserved layers.
- **Disconnected PAGA graph**: inspect neighbor graph or adjust `n_neighbors`; confirm batch correction didn’t fragment the manifold.
- **Palantir convergence issues**: reduce diffusion components or reinitialize start cells; ensure no NaN values in data matrix.
- **VIA terminal states unstable**: increase iterations (`cluster_graph_pruning_iter`), or provide manual terminal state hints based on marker expression.
- **Notebook kernel memory errors**: downsample cells or precompute summaries (metacells) before rerunning.
- **latentvelo `ImportError: torchdiffeq`**: Install with `pip install torchdiffeq`. Required for neural ODE backend.
- **graphvelo returns NaN velocities**: Ensure base velocity (scvelo/dynamo) was computed first. graphvelo refines — it doesn't compute from scratch.
- **dynamo `preprocess` fails**: dynamo expects spliced/unspliced layers. Verify with `'spliced' in adata.layers`.