Call Cell Hashing And Generate Report — CallAndGenerateReport • cellhashR

Runs the default processing pipeline

CallAndGenerateReport(
  rawCountData,
  reportFile,
  callFile,
  rawFeatureMatrixH5 = NULL,
  barcodeWhitelist = NULL,
  barcodeBlacklist = c("no_match", "total_reads", "unmapped"),
  cellbarcodeWhitelist = "inputMatrix",
  methods = c("bff_cluster", "gmm_demux", "dropletutils"),
  methodsForConsensus = NULL,
  minCountPerCell = 5,
  title = NULL,
  metricsFile = NULL,
  rawCountsExport = NULL,
  skipNormalizationQc = FALSE,
  keepMarkdown = FALSE,
  molInfoFile = NULL,
  majorityConsensusThreshold = NULL,
  callerDisagreementThreshold = NULL,
  doTSNE = FALSE,
  datatypeName = NULL,
  maxAllowableDoubletRate = "auto",
  minAllowableDoubletRateFilter = 0.3,
  minAllowableSingletRate = 0.05
)

Arguments

rawCountData: The input barcode file or umi_count folder
reportFile: The file to which the HTML report will be written
callFile: The file to which the table of calls will be written
rawFeatureMatrixH5: Both demuxEM and demuxmix require the 10x h5 gene expression count file. This is only required when either demuxEM or demuxmix are used.
barcodeWhitelist: A vector of barcode names to retain.
barcodeBlacklist: A vector of barcodes names to discard. An example would be an input library generated with CITE-seq and cell hashing. In this case, it may make sense to discard the CITE-seq markers.
cellbarcodeWhitelist: Either a vector of expected barcodes (such as all cells with passing gene expression data), a file with one cellbarcode per line, or the string 'inputMatrix'. If the latter is provided, the set of cellbarcodes present in the original unfiltered count matrix will be stored and used for reporting. This allows the report to count cells that were filtered due to low counts separately from negative/non-callable cells.
methods: The set of methods to use for calling. See GenerateCellHashingCalls for options.
methodsForConsensus: By default, a consensus call will be generated using all methods; however, if this parameter is provided, all algorithms specified by methods will be run, but only the list here will be used for the final consensus call. This allows one to see the results of a given caller without using it for the final calls.
minCountPerCell: Cells (columns) will be dropped if their total count is less than this value.
title: A title for the HTML report
metricsFile: If provided, summary metrics will be written to this file.
rawCountsExport: If provided, the raw count matrix, after processing, will be written as an RDS object to this file. This can be useful for debugging.
skipNormalizationQc: If true, the normalization/QC plots will be skipped. These can be time consuming on large input data.
keepMarkdown: If true, the markdown file will be saved, in addition to the HTML file
molInfoFile: An optional path to the 10x molecule_info.h5.
majorityConsensusThreshold: This applies to calculating a consensus call when multiple algorithms are used. If NULL, then all non-negative calls must agree or that cell is marked discordant. If non-NULL, then the number of algorithms returning the top call is divided by the total number of non-negative calls. If this ratio is above the majorityConsensusThreshold, that value is selected. For example, when majorityConsensusThreshold=0.6 and the calls are: HTO-1,HTO-1,Negative,HTO-2, then 2/3 calls are for HTO-1, giving 0.66. This is greater than the majorityConsensusThreshold of 0.6, so HTO-1 is returned. This can be useful for situations where most algorithms agree, but a single caller fails.
callerDisagreementThreshold: If provided, the agreement rate will be calculated between each caller and the simple majority call, ignoring discordant and no-call cells. If any caller has an disagreement rate above this threshold, it will be dropped and the consensus call re-calculated. The general idea is to drop a caller that is systematically discordant.
doTSNE: If true, tSNE will be run on results as part of QC. This can be memory intensive and is not strictly needed, so it can be skipped if desired.
datatypeName: For output from CellRanger >= 3.0 with multiple data types, the result of Seurat::Read10X is a list. You need to supply the name of the Antibody Capture
maxAllowableDoubletRate: Per caller, the doublet rate will be computed as the total doublets / total droplets (including negatives). Any individual caller with a doublet rate above this value will be converted to NoCall. Note: if 'auto' is chosen, the value will be selected as 3x the theoretical doublet rate.
minAllowableDoubletRateFilter: This is the lower bound allowed for maxAllowableDoubletRate. This is primarily used to avoid excessively low values when selecting 'auto' for maxAllowableDoubletRate.
minAllowableSingletRate: If any algorithm scored fewer than this fraction of cells as singlets, it will be discarded from the consensus call. This is primarily designed as a means to automatically discard poorly performing algorithms.