Allen Brain Atlas API in Python — Querying Brain Cell Types Programmatically (BICCN, ABC Atlas, 2026)
Step-by-step Python tutorial: query Allen Brain Cell Atlas via REST and h5ad downloads, find cell types by region, pull marker genes, and load BICCN whole-mouse-brain data into Scanpy. Includes the API endpoints, authentication-free access patterns, and how to programmatically replicate Allen's online cell type browser.
Why Programmatic Access Matters
The Allen Brain Cell Atlas (ABC Atlas, 2023+) hosts the largest single-cell RNA-seq dataset of mouse and human brain ever assembled — over 4 million mouse cells, 3 million human cells, with curated cell type taxonomies. The web browser at https://celltypes.brain-map.org is excellent for exploration, but if you need to:
- Compare 10+ cortical regions in one script
- Cross-reference Allen taxonomy with your own scRNA-seq clusters
- Download specific cell-type expression for a custom analysis
- Replicate or extend an Allen-based publication
...then clicking through the UI is no longer practical. You need programmatic access.
This guide walks through:
- Where the ABC Atlas data lives (it's not all behind one endpoint)
- Authentication-free Python access via the
cell-typesREST API and direct h5ad downloads - Loading ABC Atlas data into Scanpy/AnnData for analysis
- Cross-referencing with BICCN (the older Mouse Motor Cortex consortium) and CELLxGENE
- Common pitfalls (taxonomy version mismatches, region naming, file size)
For the broader scRNA-seq brain pipeline this fits into, see Single-Cell RNA-seq for Brain Tissue: 2026 Getting Started.
The 3 Allen Resources You'll Use
The Allen Institute hosts several overlapping resources. The ones a typical analyst needs:
| Resource | What's there | Access |
|---|---|---|
| ABC Atlas (Allen Brain Cell Atlas) | The 2023+ mouse + human single-cell atlases | AWS S3 + REST hub |
| Brain Map / Cell Types (legacy) | Older mouse + Patch-seq data | REST API |
| BICCN (Brain Initiative Cell Census) | Mouse motor cortex multi-modal reference | AWS, public |
| CELLxGENE (Chan Zuckerberg + Allen-deposited) | Browser + download for many Allen datasets | Web + Python cellxgene-census |
You'll often combine them. ABC Atlas for the latest comprehensive data; CELLxGENE for an easy read_h5ad() of a published dataset; legacy Cell Types REST API for older meta-analysis.
ABC Atlas — The Modern Way to Get Data
The ABC Atlas hosts data on AWS S3 in standard Parquet and h5ad formats. The simplest path is the official Python helper:
pip install abc_atlas_access
from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache
# Download the manifest (small) and choose what to pull
cache_dir = './abc_atlas_data'
cache = AbcProjectCache.from_s3_cache(cache_dir)
# List directories available
print(cache.list_directories)
# e.g. ['WMB-10X', 'WMB-10Xv2', 'WMB-10Xv3', 'WMB-taxonomy', 'WMB-MERFISH', ...]
WMB = Whole Mouse Brain. There are also WHB (Whole Human Brain) datasets in newer releases.
Pull a specific cell metadata table
# Cell × annotation table (cluster, class, subclass, supertype, neurotransmitter, region)
cell_metadata = cache.get_metadata_dataframe(
directory='WMB-10X',
file_name='cell_metadata',
)
print(cell_metadata.head())
print(f"Total cells: {len(cell_metadata):,}")
print(f"Unique cell types (cluster level): {cell_metadata['cluster'].nunique():,}")
Expected output (numbers from ~2024-2026 release):
Total cells: 4,042,976
Unique cell types (cluster level): 5,322
You now have one row per cell with its taxonomy assignment, region, etc., loaded as a pandas DataFrame. From here you can filter to your region of interest and pull just those cells' expression matrices.
Pull expression data for a specific region
# Filter to motor cortex (MOp = primary motor cortex)
mop_cells = cell_metadata[cell_metadata['region_of_interest_acronym'] == 'MOp']
print(f"Motor cortex cells: {len(mop_cells):,}")
# Get the expression matrix file path
expression_file = cache.get_data_path(
directory='WMB-10Xv3',
file_name='WMB-10Xv3-MOp-log2.h5ad',
)
# Load with anndata
import anndata as ad
adata = ad.read_h5ad(expression_file)
print(adata)
# AnnData object with n_obs × n_vars = ~280,000 × 32,000
Subset to your cells of interest and analyze with Scanpy as usual.
Scanpy Workflow from ABC Atlas Data
import scanpy as sc
# Already log2 in the ABC Atlas — don't double normalize
sc.pp.highly_variable_genes(adata, n_top_genes=3000, flavor='seurat_v3')
sc.tl.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=15, use_rep='X_pca')
sc.tl.umap(adata)
# Color by Allen taxonomy
sc.pl.umap(adata, color=['subclass', 'region_of_interest_acronym'])
The Allen taxonomy levels (coarse → fine):
- division (~5-10 large groups: glutamatergic, GABAergic, glia, immune, vascular)
- class (20-40 classes)
- subclass (~300)
- supertype (~1,200)
- cluster (~5,300, finest level)
For most cross-dataset comparisons, subclass is the right resolution — comparable to community standards.
ABC Atlas REST API — When You Don't Want the Bulk Download
If you just need metadata or a quick lookup, the REST endpoint at https://abc-atlas.brainmap.org/ (subject to change — check current docs) lets you query without downloading TB of data.
import requests
# Get list of cell types in a region
r = requests.get('https://abc-atlas.brainmap.org/api/v1/cell-types', params={
'region': 'MOp',
'taxonomy_level': 'subclass',
})
cell_types = r.json()
for ct in cell_types[:10]:
print(f"{ct['name']:30} cells={ct['cell_count']:,}")
(Endpoint paths shift between releases — check the current docs at https://portal.brain-map.org/atlases-and-data/bkp/abc-atlas)
Legacy Cell Types REST API (Still Useful)
The older Brain Map REST API (https://celltypes.brain-map.org/api) is well-documented and stable:
import requests
# All mouse cells in the IVSCC (intracellular electrophysiology) dataset
r = requests.get(
'https://api.brain-map.org/api/v2/data/query.json',
params={
'criteria': (
'model::Specimen,'
'rma::criteria,donor[species$il"Mus musculus"],'
'rma::include,donor(species),ephys_features,cell_morphologies'
),
'num_rows': 100,
},
)
data = r.json()['msg']
print(f"Cells returned: {len(data)}")
The query syntax is RMA (Resource Model Access) — Allen's older XML-derived query language. Verbose but precise.
For most modern work, ABC Atlas + CELLxGENE supersede this. The legacy API remains useful for Patch-seq (electrophysiology + transcriptomics) data and for older citations.
CELLxGENE Census — The Universal Backdoor
CZI's CELLxGENE Census aggregates Allen + many other published datasets. Available via Python:
pip install cellxgene-census
import cellxgene_census
# Open the latest census (read-only)
with cellxgene_census.open_soma(census_version="latest") as census:
# Get Allen Mouse Brain MOp cells
adata = cellxgene_census.get_anndata(
census=census,
organism="Mus musculus",
obs_value_filter='dataset_id == "..." ', # specific dataset
)
print(adata)
The Census is convenient when you want to combine Allen data with other public datasets in one frame — e.g., comparing your Alzheimer's snRNA-seq to both Allen reference and a separate AD cohort.
BICCN — Mouse Motor Cortex Multi-Modal Reference
The Brain Initiative Cell Census Network's 2021 mouse motor cortex paper (Yao et al., 2021, Nature) integrated 7 modalities (scRNA, snRNA, scATAC, spatial transcriptomics, Patch-seq, etc.). The data is on AWS:
# Browse what's there
aws s3 ls --no-sign-request s3://nemo-public/biccn/grant/u19_zeng/zeng/
Or via the data portal at https://biccn.org/data — typically download h5ad files directly:
import anndata as ad
adata = ad.read_h5ad('biccn_mouse_motor_cortex_10x.h5ad')
print(adata)
For specific BICCN multimodal integration analyses (e.g., joint scRNA + scATAC), the published Seurat objects and code are on the consortium GitHub: https://github.com/AllenInstitute
Common Pitfalls
1. Taxonomy version mismatch
Allen's cell type taxonomy updates roughly annually. A cluster labeled "Pvalb_1" in 2022 may not be the same as "Pvalb_1" in 2025. Always record the taxonomy version alongside your analysis (taxonomy_id or release_date in the metadata).
2. Region name versus region ID
Allen uses three coexisting conventions:
- Acronym:
MOp(primary motor cortex) - Full name:
Primary motor area - Numeric ID:
985
ABC Atlas metadata usually has all three columns. Cross-referencing with external datasets, you may need to translate.
3. Coordinates — CCFv3 vs other spaces
Spatial data uses the Common Coordinate Framework v3 (CCFv3) for mouse. Older datasets may use CCFv2 or sham coordinates. When integrating multi-modal data, verify the CCF version (ccf_version in metadata).
4. File sizes
Don't blindly cache.get_data_path() for everything. The full ABC Atlas is >1 TB. Use cache.list_data_files() and inspect file sizes before downloading. Filter by region first via the metadata table.
files = cache.list_data_files('WMB-10Xv3')
for f in files:
print(f" {f['name']:50} {f['size_bytes']/1e9:.1f} GB")
5. Read-only S3 access
The ABC Atlas S3 bucket is read-only and public — no AWS credentials needed. If you see authentication errors, you're probably using aws s3 cp without --no-sign-request. The abc_atlas_access package handles this automatically.
6. Memory
A full region's snRNA-seq h5ad can be 5-15 GB in memory. For analyses on a laptop, subset by cell type or use sparse representations and chunked reading:
# Read only specific columns / metadata first
adata = ad.read_h5ad(expression_file, backed='r')
print(adata.obs['subclass'].value_counts())
# Then subset before loading into memory
subset = adata[adata.obs['subclass'] == 'L2/3 IT'].to_memory()
Worked Example — Compare Cortex vs Striatum Cell Type Composition
A common analysis: what cell types differ in proportion between two regions?
import scanpy as sc
import matplotlib.pyplot as plt
# Filter metadata to MOp (motor cortex) and STR (striatum)
sel = cell_metadata[cell_metadata['region_of_interest_acronym'].isin(['MOp', 'STR'])]
# Cell type proportions per region (at subclass level)
ct_props = (
sel.groupby(['region_of_interest_acronym', 'subclass'])
.size()
.reset_index(name='n')
)
ct_props['prop'] = ct_props.groupby('region_of_interest_acronym')['n'].transform(
lambda x: x / x.sum()
)
# Wide format for easy comparison
wide = ct_props.pivot(index='subclass', columns='region_of_interest_acronym', values='prop').fillna(0)
wide['diff'] = wide.get('MOp', 0) - wide.get('STR', 0)
print(wide.sort_values('diff').head(20)) # Striatum-enriched
print(wide.sort_values('diff', ascending=False).head(20)) # Cortex-enriched
Result: as expected, striatum is dominated by GABAergic medium spiny neurons (MSN-D1, MSN-D2 subclasses), while motor cortex has the L2/3 IT, L5 ET, L6 CT excitatory hierarchy.
When to Use What
| Goal | Best resource |
|---|---|
| Newest comprehensive mouse brain atlas | ABC Atlas (WMB-10Xv3) |
| Specific human brain region from Allen | ABC Atlas (WHB-10X) or CELLxGENE |
| Combine Allen with other public datasets in one analysis | CELLxGENE Census |
| Patch-seq (electrophysiology + transcriptomics) | Legacy Cell Types REST API |
| BICCN multi-modal mouse motor cortex | BICCN portal direct download |
| Just browsing cell types interactively | Allen Cell Types web UI |
FAQ
Q: Do I need an Allen account or AWS credentials?
For data download, no. Public S3 bucket, --no-sign-request. For some analysis tools (Allen SDK older parts) account is optional.
Q: How current is the data?
ABC Atlas releases roughly twice a year. Check cache.current_manifest_file_name for the release date you're using. Pin it in your scripts for reproducibility.
Q: Can I use this for human brain data?
Yes — WHB-10X (Whole Human Brain) datasets are growing. Note that human datasets are smaller (10x cost, ethical sampling constraints) and more recent — verify what's in the current release.
Q: Allen vs HCA Brain — what's the difference? HCA (Human Cell Atlas) Brain is a separate consortium overlapping with Allen contributions. Many datasets are in both via CELLxGENE. Allen's strength is standardized taxonomy at scale; HCA's strength is multi-tissue integration across the body.
Q: My internet is slow — can I just use the cell counts/proportions metadata without downloading expression?
Yes. The cell_metadata.csv is small (~500 MB compressed) and contains taxonomy assignments, region, neurotransmitter type, etc. You can do composition analyses, marker enrichment lookups, and many comparisons without ever loading the expression matrices.
Q: Is there an R interface?
Allen Brain SDK has Python bindings; for R use cellxgene-census (has R bindings via reticulate) or download h5ad files and load with SingleCellExperiment / Seurat::ReadH5AD.
Q: Can I publish using ABC Atlas data without obtaining it from Allen directly? Yes — data is CC-BY licensed. Cite the appropriate Allen Institute publications (the data has a DOI per release) and the ABC Atlas portal in your Methods. Allen explicitly encourages reuse.
Q: My favorite cell type is missing from the atlas — what should I do? Check the latest release first (taxonomy expansions are frequent). If genuinely missing, this may be a real gap — consider depositing your data to CELLxGENE if you have it, contributing to the next iteration of the atlas.
Closing — The Workflow
For most analysts using Allen data programmatically in 2026:
- Install
abc_atlas_accessand pin the version - Pull
cell_metadatafirst — work out which region/cell types you care about before downloading expression - Download targeted h5ad files for those regions
- Load with Scanpy and work normally
- Use CELLxGENE Census when integrating Allen with non-Allen datasets
The web UI is for browsing; the API is for everything else. Once you've done this loop once, scaling to "compare 10 cortical regions across 4 mouse ages" becomes a 30-line script instead of a manual click-fest.
Related posts:
- Single-Cell RNA-seq for Brain Tissue: 2026 Getting Started Guide (Seurat, Scanpy)
- Sleep Deprivation Effects on the Brain — Neuroscience Guide
- Dementia vs Normal Aging — Cognitive Tests + Brain Imaging Guide
References:
- Yao, Z. et al. (2023). A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature, 624, 317-332.
- Yao, Z. et al. (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell, 184, 3222-3241.
- Allen Brain Cell Atlas portal: https://portal.brain-map.org/atlases-and-data/bkp/abc-atlas
- BICCN data portal: https://biccn.org/data
- CELLxGENE Census: https://chanzuckerberg.github.io/cellxgene-census/
- abc_atlas_access GitHub: https://github.com/AllenInstitute/abc_atlas_access