Introduction
This database website containing a pan-cancer single-cell atlas comprising 369 primary and 302 metastatic tumors across 18 cancer types and diverse metastatic anatomical sites.
Data collection and processing
our database...
Usage
Browse with a cancer type
If a certain type of cancer is of interest to the user, the user can click on the relevant cards on either side of the homepage to navigate to the page where the related data is displayed.
On the Gene Expression page, users can select different Lineages and different annotation methods to view different UMAP diagrams.
Top differentially expressed genes across subcell types comparing primary and metastatic tumors will displayed below the UMAP diagram.
Also users can view the GISTIC results for 18 different types of cancers on the CNV & GISTIC page.
Methods
Single cell RNA-seq data collection
Publicly available single-cell RNA-seq datasets relevant to tumor metastasis were systematically collected. Datasets were identified through keyword-based searches using terms such as “scRNA” and “metastasis”, and were further curated manually by reviewing relevant publications to ensure inclusion of high-quality and biologically relevant data. Data inclusion and exclusion criteria were applied at both the dataset and sample levels. Specifically, datasets derived from sorted samples or fluid samples (e.g., ascites) were excluded. Raw count files for the remaining eligible samples were downloaded, and individual samples were split and processed into AnnData (h5ad) format for downstream analysis. The detailed metadata were retrieved from the
Data procession
GISTIC
To identify recurrent somatic copy number alterations (SCNAs), GISTIC39 analysis was performed separately for each cancer type. Malignant cells were first selected as described above. Within each sample, malignant cells were aggregated into pseudobulk GISTIC was run with the following parameters: -genegistic 1 (gene-level analysis), -ta 0.01, -td 0.01, -js 5, -broad 1 (include arm-level events), -conf 0.9 (confidence level), and -savegene 1 (save gene-level results), enabling detection of both arm-l