Brain Bioinformatics

Allen Brain Atlas API in Python — Querying Brain Cell Types Programmatically (BICCN, ABC Atlas, 2026)

Step-by-step Python tutorial: query Allen Brain Cell Atlas via REST and h5ad downloads, find cell types by region, pull marker genes, and load BICCN whole-mouse-brain data into Scanpy. Includes the API endpoints, authentication-free access patterns, and how to programmatically replicate Allen's online cell type browser.

·11 min read
#Allen Brain Atlas#ABC Atlas#BICCN#Allen Brain Cell Hub#Python API#Scanpy#brain cell types#scRNA-seq#Mouse brain#neuron classification#cellxgene#cell type taxonomy

Allen Brain Atlas API Python

Why Programmatic Access Matters

The Allen Brain Cell Atlas (ABC Atlas, 2023+) hosts the largest single-cell RNA-seq dataset of mouse and human brain ever assembled — over 4 million mouse cells, 3 million human cells, with curated cell type taxonomies. The web browser at https://celltypes.brain-map.org is excellent for exploration, but if you need to:

  • Compare 10+ cortical regions in one script
  • Cross-reference Allen taxonomy with your own scRNA-seq clusters
  • Download specific cell-type expression for a custom analysis
  • Replicate or extend an Allen-based publication

...then clicking through the UI is no longer practical. You need programmatic access.

This guide walks through:

  1. Where the ABC Atlas data lives (it's not all behind one endpoint)
  2. Authentication-free Python access via the cell-types REST API and direct h5ad downloads
  3. Loading ABC Atlas data into Scanpy/AnnData for analysis
  4. Cross-referencing with BICCN (the older Mouse Motor Cortex consortium) and CELLxGENE
  5. Common pitfalls (taxonomy version mismatches, region naming, file size)

For the broader scRNA-seq brain pipeline this fits into, see Single-Cell RNA-seq for Brain Tissue: 2026 Getting Started.

The 3 Allen Resources You'll Use

The Allen Institute hosts several overlapping resources. The ones a typical analyst needs:

ResourceWhat's thereAccess
ABC Atlas (Allen Brain Cell Atlas)The 2023+ mouse + human single-cell atlasesAWS S3 + REST hub
Brain Map / Cell Types (legacy)Older mouse + Patch-seq dataREST API
BICCN (Brain Initiative Cell Census)Mouse motor cortex multi-modal referenceAWS, public
CELLxGENE (Chan Zuckerberg + Allen-deposited)Browser + download for many Allen datasetsWeb + Python cellxgene-census

You'll often combine them. ABC Atlas for the latest comprehensive data; CELLxGENE for an easy read_h5ad() of a published dataset; legacy Cell Types REST API for older meta-analysis.

ABC Atlas — The Modern Way to Get Data

The ABC Atlas hosts data on AWS S3 in standard Parquet and h5ad formats. The simplest path is the official Python helper:

pip install abc_atlas_access
from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

# Download the manifest (small) and choose what to pull
cache_dir = './abc_atlas_data'
cache = AbcProjectCache.from_s3_cache(cache_dir)

# List directories available
print(cache.list_directories)
# e.g. ['WMB-10X', 'WMB-10Xv2', 'WMB-10Xv3', 'WMB-taxonomy', 'WMB-MERFISH', ...]

WMB = Whole Mouse Brain. There are also WHB (Whole Human Brain) datasets in newer releases.

Pull a specific cell metadata table

# Cell × annotation table (cluster, class, subclass, supertype, neurotransmitter, region)
cell_metadata = cache.get_metadata_dataframe(
    directory='WMB-10X',
    file_name='cell_metadata',
)
print(cell_metadata.head())
print(f"Total cells: {len(cell_metadata):,}")
print(f"Unique cell types (cluster level): {cell_metadata['cluster'].nunique():,}")

Expected output (numbers from ~2024-2026 release):

Total cells: 4,042,976
Unique cell types (cluster level): 5,322

You now have one row per cell with its taxonomy assignment, region, etc., loaded as a pandas DataFrame. From here you can filter to your region of interest and pull just those cells' expression matrices.

Pull expression data for a specific region

# Filter to motor cortex (MOp = primary motor cortex)
mop_cells = cell_metadata[cell_metadata['region_of_interest_acronym'] == 'MOp']
print(f"Motor cortex cells: {len(mop_cells):,}")

# Get the expression matrix file path
expression_file = cache.get_data_path(
    directory='WMB-10Xv3',
    file_name='WMB-10Xv3-MOp-log2.h5ad',
)

# Load with anndata
import anndata as ad
adata = ad.read_h5ad(expression_file)
print(adata)
# AnnData object with n_obs × n_vars = ~280,000 × 32,000

Subset to your cells of interest and analyze with Scanpy as usual.

Scanpy Workflow from ABC Atlas Data

import scanpy as sc

# Already log2 in the ABC Atlas — don't double normalize
sc.pp.highly_variable_genes(adata, n_top_genes=3000, flavor='seurat_v3')
sc.tl.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=15, use_rep='X_pca')
sc.tl.umap(adata)

# Color by Allen taxonomy
sc.pl.umap(adata, color=['subclass', 'region_of_interest_acronym'])

The Allen taxonomy levels (coarse → fine):

  • division (~5-10 large groups: glutamatergic, GABAergic, glia, immune, vascular)
  • class (20-40 classes)
  • subclass (~300)
  • supertype (~1,200)
  • cluster (~5,300, finest level)

For most cross-dataset comparisons, subclass is the right resolution — comparable to community standards.

ABC Atlas REST API — When You Don't Want the Bulk Download

If you just need metadata or a quick lookup, the REST endpoint at https://abc-atlas.brainmap.org/ (subject to change — check current docs) lets you query without downloading TB of data.

import requests

# Get list of cell types in a region
r = requests.get('https://abc-atlas.brainmap.org/api/v1/cell-types', params={
    'region': 'MOp',
    'taxonomy_level': 'subclass',
})
cell_types = r.json()
for ct in cell_types[:10]:
    print(f"{ct['name']:30} cells={ct['cell_count']:,}")

(Endpoint paths shift between releases — check the current docs at https://portal.brain-map.org/atlases-and-data/bkp/abc-atlas)

Legacy Cell Types REST API (Still Useful)

The older Brain Map REST API (https://celltypes.brain-map.org/api) is well-documented and stable:

import requests

# All mouse cells in the IVSCC (intracellular electrophysiology) dataset
r = requests.get(
    'https://api.brain-map.org/api/v2/data/query.json',
    params={
        'criteria': (
            'model::Specimen,'
            'rma::criteria,donor[species$il"Mus musculus"],'
            'rma::include,donor(species),ephys_features,cell_morphologies'
        ),
        'num_rows': 100,
    },
)
data = r.json()['msg']
print(f"Cells returned: {len(data)}")

The query syntax is RMA (Resource Model Access) — Allen's older XML-derived query language. Verbose but precise.

For most modern work, ABC Atlas + CELLxGENE supersede this. The legacy API remains useful for Patch-seq (electrophysiology + transcriptomics) data and for older citations.

CELLxGENE Census — The Universal Backdoor

CZI's CELLxGENE Census aggregates Allen + many other published datasets. Available via Python:

pip install cellxgene-census
import cellxgene_census

# Open the latest census (read-only)
with cellxgene_census.open_soma(census_version="latest") as census:
    # Get Allen Mouse Brain MOp cells
    adata = cellxgene_census.get_anndata(
        census=census,
        organism="Mus musculus",
        obs_value_filter='dataset_id == "..." ',  # specific dataset
    )
print(adata)

The Census is convenient when you want to combine Allen data with other public datasets in one frame — e.g., comparing your Alzheimer's snRNA-seq to both Allen reference and a separate AD cohort.

BICCN — Mouse Motor Cortex Multi-Modal Reference

The Brain Initiative Cell Census Network's 2021 mouse motor cortex paper (Yao et al., 2021, Nature) integrated 7 modalities (scRNA, snRNA, scATAC, spatial transcriptomics, Patch-seq, etc.). The data is on AWS:

# Browse what's there
aws s3 ls --no-sign-request s3://nemo-public/biccn/grant/u19_zeng/zeng/

Or via the data portal at https://biccn.org/data — typically download h5ad files directly:

import anndata as ad
adata = ad.read_h5ad('biccn_mouse_motor_cortex_10x.h5ad')
print(adata)

For specific BICCN multimodal integration analyses (e.g., joint scRNA + scATAC), the published Seurat objects and code are on the consortium GitHub: https://github.com/AllenInstitute

Common Pitfalls

1. Taxonomy version mismatch

Allen's cell type taxonomy updates roughly annually. A cluster labeled "Pvalb_1" in 2022 may not be the same as "Pvalb_1" in 2025. Always record the taxonomy version alongside your analysis (taxonomy_id or release_date in the metadata).

2. Region name versus region ID

Allen uses three coexisting conventions:

  • Acronym: MOp (primary motor cortex)
  • Full name: Primary motor area
  • Numeric ID: 985

ABC Atlas metadata usually has all three columns. Cross-referencing with external datasets, you may need to translate.

3. Coordinates — CCFv3 vs other spaces

Spatial data uses the Common Coordinate Framework v3 (CCFv3) for mouse. Older datasets may use CCFv2 or sham coordinates. When integrating multi-modal data, verify the CCF version (ccf_version in metadata).

4. File sizes

Don't blindly cache.get_data_path() for everything. The full ABC Atlas is >1 TB. Use cache.list_data_files() and inspect file sizes before downloading. Filter by region first via the metadata table.

files = cache.list_data_files('WMB-10Xv3')
for f in files:
    print(f"  {f['name']:50} {f['size_bytes']/1e9:.1f} GB")

5. Read-only S3 access

The ABC Atlas S3 bucket is read-only and public — no AWS credentials needed. If you see authentication errors, you're probably using aws s3 cp without --no-sign-request. The abc_atlas_access package handles this automatically.

6. Memory

A full region's snRNA-seq h5ad can be 5-15 GB in memory. For analyses on a laptop, subset by cell type or use sparse representations and chunked reading:

# Read only specific columns / metadata first
adata = ad.read_h5ad(expression_file, backed='r')
print(adata.obs['subclass'].value_counts())
# Then subset before loading into memory
subset = adata[adata.obs['subclass'] == 'L2/3 IT'].to_memory()

Worked Example — Compare Cortex vs Striatum Cell Type Composition

A common analysis: what cell types differ in proportion between two regions?

import scanpy as sc
import matplotlib.pyplot as plt

# Filter metadata to MOp (motor cortex) and STR (striatum)
sel = cell_metadata[cell_metadata['region_of_interest_acronym'].isin(['MOp', 'STR'])]

# Cell type proportions per region (at subclass level)
ct_props = (
    sel.groupby(['region_of_interest_acronym', 'subclass'])
    .size()
    .reset_index(name='n')
)
ct_props['prop'] = ct_props.groupby('region_of_interest_acronym')['n'].transform(
    lambda x: x / x.sum()
)

# Wide format for easy comparison
wide = ct_props.pivot(index='subclass', columns='region_of_interest_acronym', values='prop').fillna(0)
wide['diff'] = wide.get('MOp', 0) - wide.get('STR', 0)
print(wide.sort_values('diff').head(20))   # Striatum-enriched
print(wide.sort_values('diff', ascending=False).head(20))  # Cortex-enriched

Result: as expected, striatum is dominated by GABAergic medium spiny neurons (MSN-D1, MSN-D2 subclasses), while motor cortex has the L2/3 IT, L5 ET, L6 CT excitatory hierarchy.

When to Use What

GoalBest resource
Newest comprehensive mouse brain atlasABC Atlas (WMB-10Xv3)
Specific human brain region from AllenABC Atlas (WHB-10X) or CELLxGENE
Combine Allen with other public datasets in one analysisCELLxGENE Census
Patch-seq (electrophysiology + transcriptomics)Legacy Cell Types REST API
BICCN multi-modal mouse motor cortexBICCN portal direct download
Just browsing cell types interactivelyAllen Cell Types web UI

FAQ

Q: Do I need an Allen account or AWS credentials? For data download, no. Public S3 bucket, --no-sign-request. For some analysis tools (Allen SDK older parts) account is optional.

Q: How current is the data? ABC Atlas releases roughly twice a year. Check cache.current_manifest_file_name for the release date you're using. Pin it in your scripts for reproducibility.

Q: Can I use this for human brain data? Yes — WHB-10X (Whole Human Brain) datasets are growing. Note that human datasets are smaller (10x cost, ethical sampling constraints) and more recent — verify what's in the current release.

Q: Allen vs HCA Brain — what's the difference? HCA (Human Cell Atlas) Brain is a separate consortium overlapping with Allen contributions. Many datasets are in both via CELLxGENE. Allen's strength is standardized taxonomy at scale; HCA's strength is multi-tissue integration across the body.

Q: My internet is slow — can I just use the cell counts/proportions metadata without downloading expression? Yes. The cell_metadata.csv is small (~500 MB compressed) and contains taxonomy assignments, region, neurotransmitter type, etc. You can do composition analyses, marker enrichment lookups, and many comparisons without ever loading the expression matrices.

Q: Is there an R interface? Allen Brain SDK has Python bindings; for R use cellxgene-census (has R bindings via reticulate) or download h5ad files and load with SingleCellExperiment / Seurat::ReadH5AD.

Q: Can I publish using ABC Atlas data without obtaining it from Allen directly? Yes — data is CC-BY licensed. Cite the appropriate Allen Institute publications (the data has a DOI per release) and the ABC Atlas portal in your Methods. Allen explicitly encourages reuse.

Q: My favorite cell type is missing from the atlas — what should I do? Check the latest release first (taxonomy expansions are frequent). If genuinely missing, this may be a real gap — consider depositing your data to CELLxGENE if you have it, contributing to the next iteration of the atlas.

Closing — The Workflow

For most analysts using Allen data programmatically in 2026:

  1. Install abc_atlas_access and pin the version
  2. Pull cell_metadata first — work out which region/cell types you care about before downloading expression
  3. Download targeted h5ad files for those regions
  4. Load with Scanpy and work normally
  5. Use CELLxGENE Census when integrating Allen with non-Allen datasets

The web UI is for browsing; the API is for everything else. Once you've done this loop once, scaling to "compare 10 cortical regions across 4 mouse ages" becomes a 30-line script instead of a manual click-fest.


Related posts:

References:

관련 글