scAVENGERS cluster: demultiplexing scATAC-seq data
Given reference and alternate allele count matrices generated by vartrix in mtx format, scAVENGERS cluster demultiplexes cell barcodes into each donor in unsupervised manner.
Usage
scAVENGERS cluster provides cluster assignment results in a tab-seperated format.
Unless -o option is not specified, the results are written into stdout. So, to save demultiplexing results to certain file, you may either run:
# Strategy 1: specifying -o option
scAVENGERS cluster -r ref.mtx -a alt.mtx -b barcodes.txt -o clusters_tmp.tsv
# Strategy 2: redirecting the output
scAVENGERS cluster -r ref.mtx -a alt.mtx -b barcodes.txt > clusters_tmp.tsv
Because scAVENGERS cluster does not perform doublet detection, excluding doublet barcodes before or after running scAVENGERS cluster is required. To note, the output of scAVENGERS is compatible to troublet in souporcell pipeline, so you can use troublet to detect doublets after demultiplexing.
# Strategy 1: Filtering out doublet barcodes after running doublet detection tools
cat clusters_tmp.tsv | LC_ALL=C grep -F -f $SINGLET_BARCODES > clusters.tsv
# Strategy 2: Using troublet as doublet detection tool
$TROUBLET_DIR/troublet -r ref.mtx -a alt.mtx --clusters clusters_tmp.tsv > clusters.tsv
Parameters
scAVENGERS/scAVENGERS cluster --help
usage: cluster.py [-h] -r REF -a ALT [-v VCF] -b BARCODES -o OUTPUT -k CLUSTERS [--priors PRIORS [PRIORS ...]] [--ploidy PLOIDY] [--err_rate ERR_RATE]
[--stop_criterion STOP_CRITERION] [--max_iter MAX_ITER] [-t THREADS]
optional arguments:
-h, --help show this help message and exit
-r REF, --ref REF Reference allele count matrix in mtx format
-a ALT, --alt ALT Alternate allele count matrix in mtx format
-v VCF, --vcf VCF Vcf file
-b BARCODES, --barcodes BARCODES
Line-seperated text file of barcode sequences
-o OUTPUT, --output OUTPUT
Output directory.
-k CLUSTERS, --clusters CLUSTERS
Number of donors.
--priors PRIORS [PRIORS ...]
Number or proportion of cells in each genotype.
--ploidy PLOIDY Ploidy. Defaults to 2.
--err_rate ERR_RATE Baseline probability. DO NOT set this parameter zero, because it leads to log-zeros. Defaults to 0.001.
--stop_criterion STOP_CRITERION
log likelihood change to define convergence for EM algorithm
--max_iter MAX_ITER number of maximum iterations for a temperature step
-t THREADS, --threads THREADS
number of threads