The primary entrypoint for parsing and QC of the cell hashing count matrix.

ProcessCountMatrix(
  rawCountData = NA,
  minCountPerCell = 5,
  barcodeWhitelist = NULL,
  barcodeBlacklist = c("no_match", "total_reads", "unmapped"),
  cellbarcodeWhitelist = NULL,
  doPlot = TRUE,
  simplifyBarcodeNames = TRUE,
  saveOriginalCellBarcodeFile = NULL,
  metricsFile = NULL,
  minCellsToContinue = 25,
  datatypeName = NULL
)

Arguments

rawCountData

The input barcode file or umi_count folder

minCountPerCell

Cells (columns) will be dropped if their total count is less than this value.

barcodeWhitelist

A vector of barcode names to retain.

barcodeBlacklist

A vector of barcodes names to discard. An example would be an input library generated with CITE-seq and cell hashing. In this case, it may make sense to discard the CITE-seq markers.

cellbarcodeWhitelist

If provided, the raw count matrix will be subset to include only these cells. This allows one to use the cellranger unfiltered matrix as an input, but filter based on target cells, such as those with GEX data. This can either be a character vector of barcodes, or a file with one cell barcode per line.

doPlot

If true, QC plots will be generated

simplifyBarcodeNames

If true, the sequence tag portion will be removed from the barcode names (i.e. HTO-1-ATGTGTGA -> HTO-1)

saveOriginalCellBarcodeFile

An optional file path, where the set of original cell barcodes, prior to filtering, will be written. The primary use-case is if the count matrix was generated using a cell whitelist (like cells with passing gene expression). Preserving this list allows downstream reporting.

metricsFile

If provided, summary metrics will be written to this file.

minCellsToContinue

Demultiplexing generally requires a minimal amount of cells. If the matrix contains fewer than this many cells, it will abort.

datatypeName

For output from CellRanger >= 3.0 with multiple data types, the result of Seurat::Read10X is a list. You need to supply the name of the Antibody Capture

Value

The updated count matrix