Pipeline Overview =========================== The conceptual idea and schematic of scATAnno is illustrated here. .. image:: _static/img/2.workflow_details-MainFigure1.png :align: center :width: 600 :alt: Workflow of scATAC Annotation Original Input ------------------ The following files are needed to run *Celltype Annotation* on your own experiment: - *fragments.tsv.gz* fragment file for each scATAC data - *barcodes.tsv* cell barcodes for each scATAC data - *reference.bed* reference peaks with chromosome regions from the selected reference atlas - Optionally: *UMAP* or *tSNE* projection coordinates and *Cluster* cluster numbers of cells can be provided by users Currently, this package only supports hg38 reference mapping Intermediate Output -------------------- The following files are intermediate outputs of *scATAnno* in order to generate a peak-by-cell matrix for query data: - *matrix.mtx* Sparse matrix files with fragment reads - *features.tsv* Reference peaks/cis-Regulatory Elements - *barcodes.tsv* Cell barcodes of high quality cells Final Output -------------------- The following files are final outputs of *scATAnno* using the annotation tool: - *1.Merged_query_reference.h5ad* Anndata of integrated query and reference cells - *X_spectral_harmony.csv* Harmozied spectral embeddings of integrated data - *query.h5ad* Anndata of query cells which stores annotation results. This AnnData should include essential prediction results in AnnData.obs - column *cluster_annotation* stores cell type assignment at cluster-level - column *uncertainty_score* stores final uncertainty score, which takes the maximum of KNN-based uncertainty and weighted distance-based uncertainty of query cells