Skip to contents

Wrapper function for TrainModel to train a suite of binary classifiers for each cell type present in the data

Usage

TrainModelsFromSeurat(
  seuratObj,
  celltype_column,
  assay = "RNA",
  slot = "data",
  output_dir = "./classifiers",
  hyperparameter_tuning = F,
  learner = "classif.ranger",
  inner_resampling = "cv",
  outer_resampling = "cv",
  inner_folds = 4,
  inner_ratio = 0.8,
  outer_folds = 3,
  outer_ratio = 0.8,
  n_models = 20,
  n_cores = NULL,
  gene_list = NULL,
  gene_exclusion_list = NULL,
  verbose = TRUE,
  min_cells_per_class = 20
)

Arguments

seuratObj

The Seurat Object to be updated

celltype_column

The metadata column containing the celltypes. One classifier will be created for each celltype present in this column.

assay

SeuratObj assay containing the desired count matrix/metadata

slot

Slot containing the count data. Should be restricted to counts, data, or scale.data.

output_dir

The directory in which models, metrics, and training data will be saved.

hyperparameter_tuning

Logical that determines whether or not hyperparameter tuning should be performed.

learner

The mlr3 learner that should be used. Currently fixed to "classif.ranger" if hyperparameter tuning is FALSE. Otherwise, "classif.xgboost" and "classif.ranger" are supported.

inner_resampling

The resampling strategy that is used for hyperparameter optimization. Holdout ("hout" or "holdout") and cross validation ("cv" or "cross-validation") are supported.

outer_resampling

The resampling strategy that is used to determine overfitting. Holdout ("hout" or "holdout") and cross validation ("cv" or "cross-validation") are supported.

inner_folds

The number of folds to be used for inner_resampling if cross-valdiation is performed.

inner_ratio

The ratio of training to testing data to be used for inner_resampling if holdout resampling is performed.

outer_folds

The number of folds to be used for outer_resampling if cross-valdiation is performed.

outer_ratio

The ratio of training to testing data to be used for inner_resampling if holdout resampling is performed.

n_models

The number of models to be trained during hyperparameter tuning. The model with the highest accuracy will be selected and returned.

n_cores

If non-null, this number of workers will be used with future::plan

gene_list

If non-null, the input count matrix will be subset to these features

gene_exclusion_list

If non-null, the input count matrix will be subset to drop these features

verbose

Whether or not to print the metrics data for each model after training.

min_cells_per_class

If provided, any classes (and corresponding cells) with fewer than this many cells will be dropped from the training data