bayestme package¶

Submodules¶

bayestme.bleeding_correction module¶

bayestme.cv_likelihoods module¶

bayestme.cv_likelihoods.get_best_lambda_value(likelihoods, best_n_components, lambda_array, k_vals)¶

bayestme.cv_likelihoods.get_max_likelihood_n_components(likelihoods, k_vals)¶

bayestme.cv_likelihoods.load_likelihoods(output_dir=None, output_files=None)¶

bayestme.cv_likelihoods.plot_cv_running(results_dir, results_files, out_path, output_format='pdf')¶

bayestme.cv_likelihoods.plot_likelihoods(likelihood_path, out_file, normalize=False)¶

bayestme.data module¶

class bayestme.data.BleedCorrectionResult(corrected_reads: numpy.ndarray, global_rates: numpy.ndarray, basis_functions: numpy.ndarray, weights: numpy.ndarray)¶

Bases: object

Data model for the results of bleeding correction.

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.DeconvolutionResult(cell_prob_trace: numpy.ndarray, expression_trace: numpy.ndarray, beta_trace: numpy.ndarray, cell_num_total_trace: numpy.ndarray, lam2: float, n_components: int, losses: numpy.ndarray | None = None)¶

Bases: object

Data model for the results of sampling from the deconvolution posterior distribution.

align_celltype(sc_expression, n=50)¶: reorder the deconvolution results, aligned the detected cell type with the given scrna rerference :param sc_expression: K by G matrix, K cell types and G genes :returen: the ordering of the deconvolution result that matches the given scref

property cell_num_trace¶

property nb_probs¶

property omega¶

Return a matrix of ω_kg from equation 6 of the preprint

Returns:: An <N cell types> x <N markers> floating point matrix.

property omega_difference¶

Return a matrix of average ratio of expression/ maximum expression for each marker in each component

This statistic represents the “overexpression” of a gene in a cell type, and is used for scaling the dot size in our marker gene plot.

Returns:: An <N cell types> x <N markers> floating point matrix.

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

property reads_trace¶

Returns:: <N Samples> x <N Spots> x <N Genes> x <N Cell Types>

property relative_expression¶

Return a matrix of average expression in this cell type, minus the max expression in all other cell types, divided by the maximum expression in all cell types. A higher number for this statistic represents a better candidate marker gene.

This statistic is used as a tiebreaker criteria for marker gene selection when omega_kg values are equal.

Returns:: An <N cell types> x <N markers> floating point matrix.

property relative_mean_expression¶

Return a matrix of average expression in this cell type, divided by the average expression in all cell types.

Returns:: An <N cell types> x <N markers> floating point matrix.

save(path)¶

class bayestme.data.PhenotypeSelectionResult(mask: numpy.ndarray, cell_prob_trace: numpy.ndarray, expression_trace: numpy.ndarray, beta_trace: numpy.ndarray, cell_num_trace: numpy.ndarray, log_lh_train_trace: numpy.ndarray, log_lh_test_trace: numpy.ndarray, n_components: int, lam: float, fold_number: int)¶

Bases: object

Data model for the results of one job in phenotype selection k-fold cross validation

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.SpatialDifferentialExpressionResult(w_hat: numpy.array, v_hat: numpy.array, losses: numpy.array)¶

Bases: object

Data model for results from sampling from the spatial differential expression posterior distribution.

property n_components¶

property n_spatial_patterns¶

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

property spatial_hat¶: Return the per spot gene expression learned by the factor model

class bayestme.data.SpatialExpressionDataset(adata: anndata.AnnData)¶

Bases: object

Data model for holding read counts, their associated position information, and whether they come from tissue or non tissue spots. Also holds the names of the gene markers in the dataset.

property cell_type_counts: numpy.array | None¶

property cell_type_probabilities: numpy.array | None¶

copy() → SpatialExpressionDataset¶: Return a copy of this object :return: A copy of this object

property counts: numpy.array¶

property edges: numpy.array¶

classmethod from_arrays(raw_counts: numpy.ndarray, positions: numpy.ndarray | None, tissue_mask: numpy.ndarray | None, gene_names: numpy.ndarray, layout: Layout, edges: numpy.ndarray, barcodes: numpy.array | None = None)¶

Construct SpatialExpressionDataset directly from numpy arrays.

Parameters:

raw_counts – An <N spots> x <N markers> matrix.
positions – An <N spots> x 2 matrix of spot coordinates.
tissue_mask – An <N spot> length array of booleans. True if spot is in tissue, False if not.
gene_names – An <M markers> length array of gene names.
layout – Layout.SQUARE of the spots are in a square grid layout, Layout.HEX if the spots are
edges – An <N spots> x 2 matrix of edges between spots.
barcodes – List of UMI barcodes

in a hex grid layout.

property gene_names: numpy.array¶

property layout: Layout¶

property marker_gene_indices: List[numpy.array] | None¶

property marker_gene_names: List[numpy.array] | None¶

property n_cell_types: int | None¶

property n_gene: int¶

property n_spot: int¶

property n_spot_in: int¶

property omega_difference: numpy.array | None¶

property positions: numpy.array¶

property positions_tissue: numpy.array¶

property raw_counts: numpy.array¶

classmethod read_count_mat(data_path, layout=Layout.SQUARE)¶

Load data from tsv count matrix containing only in-tissue spots where the count matrix is a tsv file of shape G by N The column names and row names are position and gene names respectively

Parameters:

data_path – /path/to/count_matrix
layout – Layout.SQUARE of the spots are in a square grid layout, Layout.HEX if the spots are

in a hex grid layout. :return: SpatialExpressionDataset

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

classmethod read_spaceranger(data_path)¶

Load data from spaceranger /outputs folder

Parameters:: data_path – Directory containing at least 1) /raw_feature_bc_matrix for raw count matrix 2) /filtered_feature_bc_matrix for filtered count matrix 3) /spatial for position list
Returns:: SpatialExpressionDataset

property relative_mean_expression: numpy.array | None¶

save(path)¶

property tissue_mask: numpy.array¶

bayestme.data.add_deconvolution_results_to_dataset(stdata: SpatialExpressionDataset, result: DeconvolutionResult)¶

Modify stdata in-place to annotate it with selected marker genes

Parameters:

stdata – data.SpatialExpressionDataset to modify
result – data.DeconvolutionResult to use

bayestme.data.create_anndata_object(counts: numpy.ndarray, coordinates: numpy.ndarray | None, tissue_mask: numpy.ndarray | None, gene_names: numpy.ndarray, layout: Layout, edges: numpy.ndarray, barcodes: numpy.array | None = None)¶

Create an AnnData object from spatial expression data.

Parameters:

counts – N x G read count matrix
coordinates – N x 2 coordinate matrix
tissue_mask – N length boolean array indicating in-tissue or out of tissue
gene_names – N length string array of gene names
layout – Layout enum
edges – N x 2 array indicating adjacency between spots
barcodes – List of UMI barcodes

Returns:

AnnData object containing all information provided.

bayestme.data.is_csv(fn: str)¶

bayestme.data.is_csv_tsv(fn: str)¶

bayestme.data.is_tsv(fn: str)¶

bayestme.data.read_with_maybe_header(path, header)¶

bayestme.deconvolution module¶

bayestme.deconvolution.sample_from_posterior(data: SpatialExpressionDataset, n_components: int | None = None, spatial_smoothing_parameter=None, n_samples=100, n_svi_steps=10000, expression_truth=None, use_spatial_guide=True, rng: numpy.random.Generator | None = None) → DeconvolutionResult¶

bayestme.svi.deconvolution module¶

class bayestme.svi.deconvolution.BayesTME_VI(stdata: SpatialExpressionDataset, rho=0.5, lr=0.001, beta_1=0.9, beta_2=0.999, expression_truth: numpy.array | None = None, expression_truth_weight=10.0, expression_truth_n_dummy_cell_types=2)¶

Bases: object

deconvolution(K, n_iter=10000, n_traces=1000, use_spatial_guide=True)¶

guide(data, n_class, n_genes)¶: guide without spatial regularizer

model(data, n_class, n_genes)¶

plot_loss(output_file)¶

Plot the loss curve

Parameters:: output_file – Where to save the plot

spatial_guide(data, n_class, n_genes)¶: guide with spatial regularizer

spatial_regularizer(x)¶

bayestme.svi.deconvolution.deconvolve(stdata: SpatialExpressionDataset, n_components=None, rho=None, n_svi_steps=10000, n_samples=100, use_spatial_guide=True, expression_truth=None, rng: numpy.random.Generator | None = None) → DeconvolutionResult¶

bayestme.svi.deconvolution.rv_should_be_sampled(site)¶: Determine if a random variable in the model should be saved or if its redundant.

bayestme.expression_truth module¶

bayestme.expression_truth.calculate_celltype_profile_prior_from_adata(fn, gene_names, celltype_column: str, sample_column: str | None = None)¶

bayestme.expression_truth.combine_multiple_expression_truth(expression_truth_arrays: List[numpy.array], num_warmup=200, num_samples=200)¶

bayestme.expression_truth.dirichlet_alpha_model(expression_truth=None, N=None, J=None, K=None)¶

bayestme.expression_truth.fit_alpha_for_multiple_samples(data, num_warmup=200, num_samples=200)¶

bayestme.expression_truth.load_expression_truth(stdata: SpatialExpressionDataset, seurat_output: str)¶

Load outputs from seurat fine mapping to be used in deconvolution.

Parameters:

stdata – SpatialExpressionDataset object
seurat_output – CSV output from seurat fine mapping workflow

Returns:

Tuple of n_components x n_genes size array, representing relative

expression of each gene in each cell type

bayestme.gene_filtering module¶

class bayestme.gene_filtering.FilterType(value)¶

Bases: Enum

An enumeration.

RIBOSOME = 2¶

SPOTS = 1¶

bayestme.gene_filtering.filter_genes_by_spot_threshold(dataset: SpatialExpressionDataset, spot_threshold: float)¶

bayestme.gene_filtering.filter_list_of_genes(dataset: SpatialExpressionDataset, genes_to_remove)¶

bayestme.gene_filtering.filter_ribosome_genes(dataset: SpatialExpressionDataset)¶

bayestme.gene_filtering.filter_stdata_to_match_expression_truth(stdata: SpatialExpressionDataset, adata_fn: str)¶

Filter the stdata down to the intersection of genes between it and the expression truth file.

Parameters:

stdata – SpatialExpressionDataset object
adata_fn – h5ad file with matched scRNA

Returns:

Filtered stdata object

bayestme.gene_filtering.select_top_genes_by_standard_deviation(dataset: SpatialExpressionDataset, n_gene: int) → SpatialExpressionDataset¶

bayestme.log_config module¶

bayestme.log_config.add_logging_args(parser)¶

bayestme.log_config.configure_logging(args)¶

bayestme.phenotype_selection module¶

bayestme.plot.common module¶

bayestme.plot.deconvolution module¶

bayestme.semi_synthetic_spatial module¶

bayestme.semi_synthetic_spatial.generate_semi_synthetic(adata: anndata.AnnData, cluster_id_column: str, tissue_positions: numpy.ndarray, n_genes: int, cell_num=None, canvas_size=(36, 36), sq_size=4, layout=None, random_seed=None, n_spatial_gene=50, alpha=1, w=None, spatial_gene=None, spatial_cell_type=None, spatial_programs=None, verbose=True)¶

Generate a synthetic ST dataset using a pre-clustered scRNA dataset.

Parameters:

adata – scRNA dataset
cluster_id_column – Column in adata.obs that contains cluster/cell type classification
tissue_positions – coordinates of tissue points in slide
n_genes – Number of genes to use
cell_num
canvas_size
sq_size
layout
random_seed
n_spatial_gene
alpha
w
spatial_gene
spatial_cell_type
spatial_programs
verbose

Returns:

data.SpatialExpressionDataset containing simulated ST data

bayestme.semi_synthetic_spatial.get_rank(array)¶

bayestme.spatial_transcriptional_programs module¶

bayestme.synthetic_data module¶

bayestme.synthetic_data.create_deconvolve_dataset(n_nodes: int = 12, n_components: int = 5, n_samples: int = 100, n_genes: int = 100, n_marker_gene: int = 5, layout=Layout.SQUARE)¶

bayestme.synthetic_data.create_toy_deconvolve_result(n_nodes: int, n_components: int, n_samples: int, n_gene: int) → DeconvolutionResult¶

bayestme.synthetic_data.generate_demo_dataset()¶: Generate a fake dataset for use in testing or demonstration. This dataset should produce two cell types, with 2 markers genes for each cell type.

bayestme.synthetic_data.generate_demo_dataset_with_bleeding()¶: Generate a fake dataset for demonstration of bleeding correction.

bayestme.synthetic_data.generate_demo_stp_dataset()¶: Generate a fake dataset for use in testing or demonstration. This dataset should produce two cell types, with a clear spatial expression pattern within the first cell type.

bayestme.synthetic_data.generate_fake_stdataset(n_rows: int = 30, n_cols: int = 30, n_genes: int = 20, layout: Layout = Layout.SQUARE) → SpatialExpressionDataset¶

Create a fake dataset for use in testing or demonstration.

Parameters:

n_rows – width of the fake slide
n_cols – height of the fake slide
n_genes – number of marker genes
layout – layout of spots on the fake slide

Returns:

SpatialExpressionDataset object containing simulated data

bayestme.synthetic_data.generate_simulated_bleeding_reads_data(n_rows=30, n_cols=30, n_genes=20, spot_bleed_prob=0.5, length_scale=0.2, gene_bandwidth=1, bleeding='anisotropic')¶

Generate simulated read data with modeled bleeding.

Parameters:

n_rows – Number of spot rows
n_cols – Number of spot columns
n_genes – Number of genes in output reads dataset
spot_bleed_prob – Tuning param
length_scale – Tuning param
gene_bandwidth – Tuning param
bleeding – Type of bleeding

Returns:

(locations, tissue_mask, true_rates, true_counts, bleed_counts)

bayestme.utils module¶

bayestme.utils.construct_composite_trendfilter(adjacency_matrix, k, anchor=0, sparse=False)¶

Build the k^1 through k^n trendfilter matrices stacked on top of each other in order to penalize multiple k values

Parameters:

adjacency_matrix – An adjacency matrix in first order discrete difference operator form.
k – Maximum order of trend filtering.
anchor – Node index to set to 1 in the first row of the resulting matrix
sparse – If true return a sparse matrix.

Returns:

Composite trendfilter matrix.

bayestme.utils.construct_edge_adjacency(neighbors)¶

Build the oriented edge-adjacency matrix in “discrete difference operator” form, an iterable of (v1, v2) tuples representing edges.

Parameters:: neighbors – E x 2 array, where E is the number of edges
Returns:: An E x V sparse matrix, where E is the number of edges and V the number of vertices

bayestme.utils.construct_trendfilter(adjacency_matrix, k, sparse=False)¶

Builds the k’th-order trend filtering matrix from an adjacency matrix. k=0 is the fused lasso / total variation matrix k=1 is the linear trend filtering / graph laplacian matrix k=2 is the quadratic trend filtering matrix etc.

Parameters:

adjacency_matrix – An adjacency matrix in first order discrete difference operator form.
k – Order of trend filtering
sparse – If true return a sparse matrix, otherwise return a dense np.ndarray

Returns:

Graph trend filtering matrix

bayestme.utils.filter_reads_to_top_n_genes(reads, n_gene)¶

bayestme.utils.get_edges(positions: numpy.array, layout: Layout) → numpy.array¶

Given a set of positions and plate layout, return adjacency edges.

Parameters:

positions – An N x 2 array of coordinates
layout – Plate layout enum

Returns:

An <N edges> x 2 array

bayestme.utils.get_knn_edges(positions: numpy.array, k: int) → numpy.array¶

bayestme.utils.get_kth_order_discrete_difference_operator(first_order_discrete_difference_operator, k)¶

Calculate the k-th order trend filtering matrix given a first order discrete difference operator of shape M x N.

Parameters:

first_order_discrete_difference_operator – Input first order discrete difference operator
k – Order of the output discrete difference operator

Returns:

If k is even, this returns an M x N size matrix, if k is odd, this returns an N x N size matrix

bayestme.utils.get_posmap(pos)¶

Return a matrix where the (i, j) entry stores the idx of the reads at spatial coordinates (i, j) on the tissue sample.

Parameters:: pos – A 2 x N matrix of coordinates.
Returns:: An X x Y matrix of coordinates, where X and Y are

the max coordinate values on their respective axes

bayestme.utils.get_regular_grid_edges(positions: numpy.array, degree: int, epsilon: float = 0.1) → numpy.array¶

Given a set of positions which are laid out in a regular grid based on a degree, return adjacency edges.

For example degree=4 would be a regular square grid, degree=6 would be a regular hex grid, etc.

Our algorithm is as follows: For each point, find the nearest degree points. If the distance to the nearest point is less than epsilon * min_dist, add an edge between the two points.

This should control for any points on the border of the grid getting spurious connections.

Parameters:

positions – An N x 2 array of coordinates
degree – The degree of the grid
epsilon – The epsilon parameter. Set higher if grid is more irregular.

Returns:

An <N edges> x 2 array of indices into positions

bayestme.utils.get_stddev_ordering(reads: numpy.ndarray)¶

bayestme.utils.get_top_gene_names_by_stddev(reads: numpy.ndarray, gene_names: numpy.array, n_genes=<class 'int'>)¶

bayestme.utils.ilogit(x)¶

bayestme.utils.is_first_order_discrete_difference_operator(a)¶

Test if a matrix is a discrete difference operator, meaning each row has one pair of (-1, 1) values representing adjacency in a graph structure.

Returns:: True if matrix is a first order discrete difference operator, False otherwise.

bayestme.utils.log_likelihood(attrributes, n_cells, Obs)¶

bayestme.utils.logp(beta, components, attr, obs)¶

bayestme.utils.multinomial_rvs(count, p, rng: numpy.random.Generator | None = None)¶

Sample from the multinomial distribution with multiple p vectors.

count must be an (n-1)-dimensional numpy array.

p must an n-dimensional numpy array, n >= 1. The last axis of p holds the sequence of probabilities for a multinomial distribution.

The return value has the same shape as p. Taken from: https://stackoverflow.com/questions/55818845/fast-vectorized-multinomial-in-python

Parameters:

count
p
rng

Returns:

bayestme.utils.order_reads_by_stddev(reads: numpy.ndarray)¶

bayestme.utils.sample_horseshoe_plus(size=1, rng: numpy.random.Generator | None = None)¶

bayestme.utils.stable_softmax(x, axis=-1)¶

bayestme package¶

Submodules¶

bayestme.bleeding_correction module¶

bayestme.cv_likelihoods module¶

bayestme.data module¶

bayestme.deconvolution module¶

bayestme.svi.deconvolution module¶

bayestme.expression_truth module¶

bayestme.gene_filtering module¶

bayestme.log_config module¶

bayestme.phenotype_selection module¶

bayestme.plot.common module¶

bayestme.plot.deconvolution module¶

bayestme.semi_synthetic_spatial module¶

bayestme.spatial_transcriptional_programs module¶

bayestme.synthetic_data module¶

bayestme.utils module¶

bayestme

Navigation

Related Topics