Introduction

This database website containing a pan-cancer single-cell atlas comprising 369 primary and 302 metastatic tumors across 18 cancer types and diverse metastatic anatomical sites.

Data collection and processing

our database...

Usage

Browse with a cancer type

If a certain type of cancer is of interest to the user, the user can click on the relevant cards on either side of the homepage to navigate to the page where the related data is displayed.

Home picture

On the Gene Expression page, users can select different Lineages and different annotation methods to view different UMAP diagrams.

Gene Expression

Top differentially expressed genes across subcell types comparing primary and metastatic tumors will displayed below the UMAP diagram.

gene_exp_table

Also users can view the GISTIC results for 18 different types of cancers on the CNV & GISTIC page.

gistic

Methods

Single cell RNA-seq data collection

Publicly available single-cell RNA-seq datasets relevant to tumor metastasis were systematically collected. Datasets were identified through keyword-based searches using terms such as “scRNA” and “metastasis”, and were further curated manually by reviewing relevant publications to ensure inclusion of high-quality and biologically relevant data. Data inclusion and exclusion criteria were applied at both the dataset and sample levels. Specifically, datasets derived from sorted samples or fluid samples (e.g., ascites) were excluded. Raw count files for the remaining eligible samples were downloaded, and individual samples were split and processed into AnnData (h5ad) format for downstream analysis. The detailed metadata were retrieved from the

Data procession

data_procession

GISTIC

To identify recurrent somatic copy number alterations (SCNAs), GISTIC39 analysis was performed separately for each cancer type. Malignant cells were first selected as described above. Within each sample, malignant cells were aggregated into pseudobulk GISTIC was run with the following parameters: -genegistic 1 (gene-level analysis), -ta 0.01, -td 0.01, -js 5, -broad 1 (include arm-level events), -conf 0.9 (confidence level), and -savegene 1 (save gene-level results), enabling detection of both arm-l

gistic-process