Identification of novel cell subtypes is critically crucial in revealing the pathogenesis and heterogeneity of disease, which provides unprecedented insights into the development of therapeutic strategies. Although numerous cell-type identification methods exist, these methods heavily rely on the reference with the fixed cell labels, which fail to uncover new cell subtypes marked with phenotypic molecules within a specific disease context. To fill this gap, we propose a pioneering reference-free annotation method, Subtypist, to identify disease-associated cell subtypes expressing phenotypic features using an ensemble-based strategy.
- install dependent packages
devtools
> install.packages(pkgs = 'devtools')- then install Subtypist
> devtools::install_github('ZJUFanLab/Subtypist') # or download the repository as ZIP > devtools::install_local("/path/to/Subtypist-main.zip")- Loading or subsetting specific cell type
# Load Seurat object from a full dataset and subset specific cell types > FullObject <- Load("FullObject.RData") > Seu <- subset(FullObject, idents = c("T cells") # Alternatively, load a pre-processed Seurat object with annotated cell types, derived by subsetting the full dataset > # Seu <- readRDS("Seu.rds")- Cell type identification using an ensemble strategy without reference
# Object: a Loading or subsetting specific cell type Seurat object # min.resolution: the minimum value of resolution # max.resolution: the maxium value of resolution # use.assay: Name of assay to use # cluster.assay: Name of the assay in the Seurat object to use for clustering > result <- Subtypist_merge(object=Seu,min.resolution=0.3,max.resolution=1.5,by=0.1,use.assay="RNA",cluster_assay = "RNA") # Show results > print(result) $Object An object of class Seurat 1000 features across 1000 samples within 1 assay Active assay: RNA (1000 features, 985 variable features) 3 dimensional reductions calculated: pca, umap, tsne $result.table resolution merge_cluster initial_cluster molecular_phenotype Score 1 0.4 0 0 Gene7, Gene679, Gene990 2.0322883 2 0.4 1 1 Gene570, Gene807, Gene470 0.6984284 3 0.4 2 2 Gene871, Gene559, Gene247 3.3095874 4 0.4 3 3 Gene474, Gene746, Gene790 3.5678922 5 0.4 4 4 Gene776, Gene507, Gene323 2.9614237 6 1.1 0 0 Gene7, Gene679, Gene990 2.0341814 7 1.1 1 1 Gene871, Gene559, Gene247 3.3095874 8 1.1 2 2,5 Gene470, Gene807, Gene243 0.0000000 9 1.1 3 3 Gene474, Gene746, Gene577 3.4186799 10 1.1 4 4,7 Gene807, Gene801, Gene566 0.0000000 11 1.1 5 6 Gene776, Gene507, Gene323 2.9614237 - Evaluating clustering resolutions and annotating subtypes with specific phenotypic markers
# To evaluate and rank clustering resolutions based on their corresponding subtype identification results. > sortScore(result$result.table) ## resolution = 0.4: highest # A tibble: 5 × 2 resolution value <dbl> <dbl> 1 0.1 1.62 2 0.2 1.85 3 0.4 2.51 4 1.1 1.95 5 1.2 1.81 > # Add the result to the object > Seu <- AddSubtypist(result$Object,result.table=result$result.table,prefix='Subtypist') > # To assign more specific phenotypic molecules to each subtype, > # the `select_index` parameter can be used to specify which gene to select > Seu <- Subtypist::AddSubtypist(result$Object,resolution=c(0.4),result.table=result$result.table,prefix = 'Subtypist',meta.prefix = 'phenotypic melocules_',value.suffix='+ B',select_index=c('0'=1,'1'=1,'2'=1,'3'=2,'4'=3)) > print(unique(Seu@meta.data['phenotypic melocules_0.4']) > [1] "Gene570+ B" "Gene7+ B" "Gene746+ B" "Gene871+ B" "Gene323+ B" - Visualize subtype-level distributions across dimensionality reduction space (e.g., UMAP) using Subtypist_Dimplot(). This function overlays the specified phenotypic molecules annotations—derived at selected clustering resolutions—onto the Seurat object. For example:
> p <- Subtypist::Subtypist_Dimplot(Seu,result.table = result$result.table,resolution = c(0.4,1.1), show = "molecular_phenotype_",prefix = 'Subtypist')Subtypist was developed by Yue Yao. Should you have any questions, please contact Yue Yao at yuey@zju.edu.cn
