Command Line Interface

BayesTME provides a suite of command line utilities that allow users to script running the pipeline end to end.

These commands will be available on the path in the python environment in which the bayestme package is installed.

load_spaceranger

Convert data from spaceranger to a SpatialExpressionDataset in h5 format

usage: load_spaceranger [-h] [--output OUTPUT] [--input INPUT] [-v]

Named Arguments

--output

Output file, a SpatialExpressionDataset in h5 format

--input

Input spaceranger dir

-v, --verbose

Enable verbose logging

Default: False

filter_genes

This command will create a new SpatialExpressionDataset that has genes filtered according to adjustable criteria. One or more of the criteria can be specified.

Filter genes from dataset based on one or more criteria

usage: filter_genes [-h] [--adata ADATA] [--output OUTPUT]
                    [--filter-ribosomal-genes]
                    [--n-top-by-standard-deviation N_TOP_BY_STANDARD_DEVIATION]
                    [--spot-threshold SPOT_THRESHOLD]
                    [--expression-truth EXPRESSION_TRUTH] [-v]

Named Arguments

--adata

Input AnnData in h5 format

--output

Output file, AnnData in h5 format containing the bleed corrected counts

--filter-ribosomal-genes

Filter ribosomal genes (based on gene name regex)

Default: False

--n-top-by-standard-deviation

Use the top N genes with the highest spatial variance.

--spot-threshold

Filter genes appearing in greater than the provided threshold of tissue spots.

--expression-truth

Filter out genes not found in all expression truth datasets.

-v, --verbose

Enable verbose logging

Default: False

bleeding_correction

Perform bleeding correction

usage: bleeding_correction [-h] [--adata ADATA] [--bleed-out BLEED_OUT]
                           [--adata-output ADATA_OUTPUT] [-i] [--n-top N_TOP]
                           [--max-steps MAX_STEPS]
                           [--local-weight LOCAL_WEIGHT] [-v]

Named Arguments

--adata

Input file, AnnData in h5 format

--bleed-out

Output file, BleedCorrectionResult in h5 format

--adata-output

A new AnnData in h5 format created using the bleed corrected counts

-i, --inplace

If provided, overwrite the input file –adata

Default: False

--n-top

Use N top genes by standard deviation to calculate the bleeding functions. Genes will not be filtered from output dataset.

Default: 50

--max-steps

Number of EM steps

Default: 5

--local-weight

Initial value for local weight, a tuning parameter for bleed correction. rho_0g from equation 1 in the paper. By default will be set to sqrt(N tissue spots)

-v, --verbose

Enable verbose logging

Default: False

phenotype_selection

Select values for number of cell types and lambda smoothing parameter via k-fold cross-validation.

usage: phenotype_selection [-h] [--adata ADATA] [--job-index JOB_INDEX]
                           [--n-fold N_FOLD] [--n-splits N_SPLITS]
                           [--n-samples N_SAMPLES] [--n-burn N_BURN]
                           [--n-thin N_THIN] [--n-gene N_GENE]
                           [--n-components-min N_COMPONENTS_MIN]
                           [--n-components-max N_COMPONENTS_MAX]
                           [--lambda-values LAMBDA_VALUES]
                           [--max-ncell MAX_NCELL] [--background-noise]
                           [--lda-initialization] [--output-dir OUTPUT_DIR]
                           [-v]

Named Arguments

--adata

Input file, AnnData in h5 format

--job-index

Run only this job index, suitable for running the sampling in parallel across many machines

--n-fold

Number of times to run k-fold cross-validation.

Default: 5

--n-splits

Split dataset into k consecutive folds for each instance of k-fold cross-validation

Default: 15

--n-samples

Number of samples from the posterior distribution.

Default: 100

--n-burn

Number of burn-in samples

Default: 2000

--n-thin

Thinning factor for sampling

Default: 5

--n-gene

Use N top genes by standard deviation to model deconvolution. If this number is less than the total number of genes the top N by spatial variance will be selected

Default: 1000

--n-components-min

Minimum number of cell types to try.

Default: 2

--n-components-max

Maximum number of cell types to try.

Default: 12

--lambda-values

Potential values of the lambda smoothing parameter to try. Defaults to (1, 1e1, 1e2, 1e3, 1e4, 1e5)

--max-ncell

Maximum cell count within a spot to model.

Default: 120

--background-noise

Default: False

--lda-initialization

Default: False

--output-dir

Output directory. N new files will be saved in this directory, where N is the number of cross-validation jobs.

-v, --verbose

Enable verbose logging

Default: False

deconvolve

Deconvolve data

usage: deconvolve [-h] [--adata ADATA] [--adata-output ADATA_OUTPUT] [-i]
                  [--output OUTPUT] [--n-gene N_GENE]
                  [--n-components N_COMPONENTS] [--lam2 LAM2]
                  [--n-samples N_SAMPLES] [--n-burn N_BURN] [--n-thin N_THIN]
                  [--random-seed RANDOM_SEED] [--background-noise]
                  [--lda-initialization] [--expression-truth EXPRESSION_TRUTH]
                  [-v]

Named Arguments

--adata

Input AnnData in h5 format, expected to be already bleed corrected

--adata-output

A new AnnData in h5 format created with the deconvolution summary results appended.

-i, --inplace

If provided, append deconvolution summary results to the –adata archive in place

Default: False

--output

Path where DeconvolutionResult will be written h5 format

--n-gene

number of genes

--n-components

Number of cell types, expected to be determined from cross validation.

--lam2

Smoothness parameter, this tuning parameter expected to be determinedfrom cross validation.

--n-samples

Number of samples from the posterior distribution.

Default: 100

--n-burn

Number of burn-in samples

Default: 1000

--n-thin

Thinning factor for sampling

Default: 10

--random-seed

Random seed

Default: 0

--background-noise

Turn background noise on

Default: False

--lda-initialization

Turn LDA Initialization on

Default: False

--expression-truth

Use expression ground truth from one or matched samples that have been processed with the seurat companion scRNA fine mapping workflow. This flag can be provided multiple times for multiple matched samples.

-v, --verbose

Enable verbose logging

Default: False

select_marker_genes

Perform marker gene selection

usage: select_marker_genes [-h] [--adata ADATA] [--adata-output ADATA_OUTPUT]
                           [-i] [--deconvolution-result DECONVOLUTION_RESULT]
                           [--n-marker-genes N_MARKER_GENES] [--alpha ALPHA]
                           [--marker-gene-method {TIGHT,FALSE_DISCOVERY_RATE}]
                           [-v]

Named Arguments

--adata

Input file, AnnData in h5 format

--adata-output

A new AnnData in h5 format created with the deconvolution summary results appended.

-i, --inplace

If provided, append deconvolution summary results to the –adata archive in place

Default: False

--deconvolution-result

Input file, DeconvolutionResult in h5 format

--n-marker-genes

Maximum number of marker genes per cell type.

Default: 5

--alpha

Alpha cutoff for choosing marker genes.

Default: 0.05

--marker-gene-method

Possible choices: TIGHT, FALSE_DISCOVERY_RATE

Method for choosing marker genes.

Default: TIGHT

-v, --verbose

Enable verbose logging

Default: False

spatial_expression

Detect spatial differential expression patterns

usage: spatial_expression [-h] [--deconvolve-results DECONVOLVE_RESULTS]
                          [--adata ADATA] [--output OUTPUT]
                          [--n-cell-min N_CELL_MIN]
                          [--n-spatial-patterns N_SPATIAL_PATTERNS]
                          [--n-samples N_SAMPLES] [--n-burn N_BURN]
                          [--n-thin N_THIN] [--simple] [--alpha0 ALPHA0]
                          [--prior-var PRIOR_VAR] [--lam2 LAM2]
                          [--n-gene N_GENE] [-v]

Named Arguments

--deconvolve-results

DeconvolutionResult in h5 format

--adata

AnnData in h5 format

--output

Path to store SpatialDifferentialExpressionResult in h5 format

--n-cell-min

Only consider spots where there are at least <n_cell_min> cells of a given type, as determined by the deconvolution results.

Default: 5

--n-spatial-patterns

Number of spatial patterns.

--n-samples

Number of samples from the posterior distribution.

Default: 100

--n-burn

Number of burn-in samples

Default: 1000

--n-thin

Thinning factor for sampling

Default: 2

--simple

Simpler model for sampling spatial differential expression posterior

Default: False

--alpha0

Alpha0 tuning parameter. Defaults to 10

Default: 10

--prior-var

Prior var tuning parameter. Defaults to 100.0

Default: 100.0

--lam2

Smoothness parameter, this tuning parameter expected to be determined from cross validation.

Default: 1

--n-gene

Number of genes to consider for detecting spatial programs, if this number is less than the total number of genes the top N by spatial variance will be selected

-v, --verbose

Enable verbose logging

Default: False

Plotting

Creating plots is separated into separate commands:

plot_bleeding_correction

Plot bleeding correction results

usage: plot_bleeding_correction [-h] [--raw-adata RAW_ADATA]
                                [--corrected-adata CORRECTED_ADATA]
                                [--bleed-correction-results BLEED_CORRECTION_RESULTS]
                                [--output-dir OUTPUT_DIR] [--n-top N_TOP] [-v]

Named Arguments

--raw-adata

Input file, AnnData in h5 format

--corrected-adata

Input file, AnnData in h5 format

--bleed-correction-results

Input file, BleedCorrectionResult in h5 format

--output-dir

Output directory

--n-top

Plot top n genes by stddev

Default: 10

-v, --verbose

Enable verbose logging

Default: False

plot_deconvolution

Plot deconvolution results

usage: plot_deconvolution [-h] [--adata ADATA] [--output-dir OUTPUT_DIR]
                          [--cell-type-names CELL_TYPE_NAMES] [-v]

Named Arguments

--adata

Input file, AnnData in h5 format. Expected to be annotated with deconvolution results.

--output-dir

Output directory.

--cell-type-names

A comma separated list of cell type names to use for plots.For example –cell-type-names “type 1, type 2, type 3”

-v, --verbose

Enable verbose logging

Default: False

plot_spatial_expression

Plot spatial differential expression results

usage: plot_spatial_expression [-h] [--adata ADATA]
                               [--deconvolution-result DECONVOLUTION_RESULT]
                               [--sde-result SDE_RESULT]
                               [--output-dir OUTPUT_DIR]
                               [--cell-type-names CELL_TYPE_NAMES] [-v]

Named Arguments

--adata

Input file, AnnData in h5 format

--deconvolution-result

Input file, DeconvolutionResult in h5 format

--sde-result

Input file, SpatialDifferentialExpressionResult in h5 format

--output-dir

Output directory

--cell-type-names

A comma separated list of cell type names to use for plots.For example –cell-type-names “type 1, type 2, type 3”

-v, --verbose

Enable verbose logging

Default: False