About scATAnno

scATAnno is a Python-based workflow designed to annotate cell types in single-cell ATAC-seq (scATAC-seq) data based on scATAC-seq reference atlases. It addresses the challenges presented by scATAC-seq data, such as data sparsity, high dimensionality of epigenomic features, and the lack of marker enhancers for annotation. Unlike existing methods, scATAnno directly utilizes peaks or cis-regulatory element (CRE) genomic regions as input features without converting them into gene activity scores. It overcomes the high dimensionality by employing spectral embedding to transform the data into a low dimensional space. To create catalogs of CREs at the single-cell level, scATAnno uses the chromatin state profile of large-scale reference atlases to generate peak signals and reference peaks.

By integrating query data with reference atlases in an unsupervised manner, scATAnno enables accurate cell type annotation. It provides uncertainty score metrics to assess the confidence of cell-type assignment, including a K-Nearest Neighbor (KNN) uncertainty score and a weighted distance-based uncertainty score in a low dimensional space. The workflow has been demonstrated to be effective in multiple case studies, including peripheral blood mononuclear cells (PBMC), Triple Negative Breast Cancer (TNBC), and basal cell carcinoma (BCC). It accurately annotates cell types and identifies unknown cell populations not represented in the reference atlas. Additionally, scATAnno enables efficient cell type annotation across tumor conditions and accurately annotates tumor-infiltrating lymphocytes (TIL) in TNBC using a BCC TIL atlas.