Overview#
ReconEval measures how well a latent representation reconstructs the gene-expression matrix it summarises. The benchmark covers three tasks (Fig 1c) on three datasets, scored with the same metric set.
Tasks#
End-to-end reconstruction. A single model (PCA, AE, scVI, nlscVI, or mlscVI) encodes expression to a latent space and decodes back. The latent grid is
{10, 32, 128, 512, 2048}. Drivers:experiments/01_end_to_end/.Foundation-model reconstruction. A frozen FM (SE, scGPT, scConcept, SCimilarity) produces per-cell embeddings; a downstream MLP, Transformer or KNN decoder maps them back to expression. Drivers:
experiments/02_foundation_model/.Latent-shift reconstruction. Given a control cell’s latent state and a perturbation covariate, predict the post-perturbation latent state and decode it. Two methods: CellFlow (JAX flow matching) and STATE (PyTorch transformer over cell sets). Drivers:
experiments/03_latent_shift/.
Datasets#
Dataset |
Scope |
Source |
|---|---|---|
Tahoe-100M |
1,137 drugs × 50 cell lines |
Arc Institute / Vevo |
PBMC-10M |
90 cytokines × 12 donors |
Parse Bio |
LuCA |
6 tissues, 4 diseases |
Human Lung Cancer Atlas |
Out-of-distribution splits#
Three OOD splits per dataset hold out cell type / line, perturbation,
or condition. The split assignments live under
data/reconstruction/<dataset>/split0X/.
Metric families#
The sc_reconstruction.metrics API groups metrics into three
families (Fig 2 / Fig 3):
Statistical:
metric_r2(),metric_mse(),metric_energy_distance().Biological:
metric_cellcycle(),metric_pathway(),metric_coexpression(),metric_deg(),metric_cytokine().Perturbational:
metric_knn_purity().
compute_all_metrics() runs all of them.
aggregate_rank_percentile() produces
the rank-percentile table, and
funky_heatmap() renders it as the
Fig 3 summary plot.