bayestme package¶

Submodules¶

bayestme.bleeding_correction module¶

Same as bleed_correction8.py except now we use separate basis functions for in-tissue and out-out-tissue. The hope is that this enables us to account for tissue friction which seems to be an issue.

bayestme.bleeding_correction.build_basis_indices(locations, tissue_mask)¶

Creates 8 sets of basis functions: north, south, east, west, for in- and out-tissue. Each basis is how far the 2nd element is from the first element.

Output is two matrices:

(basis index matrix, basis mask matrix)

The basis index matrix has the following pseudo-code definition

for location_i in locations:

for location_j in locations:

[number non_tissue spots north of location_i up until location_j,: number non_tissue spots south of location_i up until location_j, number non_tissue spots east of location_i up until location_j, number non_tissue spots west of location_i up until location_j, number tissue spots north of location_i up until location_j, number tissue spots south of location_i up until location_j, number tissue spots east of location_i up until location_j, number tissue spots west of location_i up until location_j]

The mask index has the following pseudo-code definition:

for location_i in locations:

for location_j in locations:

if location_i != location_j:

[true if location_i is >= location_j on first dimension else false,: true if location_i is < location_j on first dimension else false, true if location_i is >= location_j on second dimension else false, true if location_i is < location_j on second dimension else false, true if location_i is >= location_j on first dimension else false, true if location_i is < location_j on first dimension else false, true if location_i is >= location_j on second dimension else false, true if location_i is < location_j on second dimension else false]

else if location_i == location_j:

[False]*8

Parameters:

locations –
tissue_mask –

Returns:

(basis_idxs, basis_mask)

bayestme.bleeding_correction.calculate_pairwise_coordinate_differences(locations)¶

Calculate pairwise coordinate differences between all locations

For example:

[[0,0], [1,1], [2,2]] ->

[[[0,0], [1, 1], [2, 2]],: [[-1,-1], [0, 0], [1, 1]], [[-2,-2], [-1, -1], [0, 0]]]

Parameters:: locations – np.ndarray of shape (N, 2), where N is the

number of coordinate points :return: np.ndarray of shape (N, N, 2)

bayestme.bleeding_correction.clean_bleed(dataset: ~bayestme.data.SpatialExpressionDataset, n_top: int, local_weight: int | None = None, max_steps: int = 5) -> (<class 'bayestme.data.SpatialExpressionDataset'>, <class 'bayestme.data.BleedCorrectionResult'>)¶

Parameters:

dataset – SpatialExpressionDataset
n_top – Number of genes to use for bleed correction. Will use the top n genes by standard deviation for building basis functions.
local_weight – Tuning parameter (optional, a reasonable value will be chosen if not provided)
max_steps – Number of Expectation Maximization iterations to use.

Returns:

Tuple of (SpatialExpressionDataset, BleedCorrectionResult), the SpatialExpressionDataset returned will contain the bleed corrected read counts.

bayestme.bleeding_correction.create_top_n_gene_bleeding_plots(dataset: SpatialExpressionDataset, corrected_dataset: SpatialExpressionDataset, bleed_result: BleedCorrectionResult, output_dir: str, output_format: str = 'pdf', n_genes: int = 10)¶

bayestme.bleeding_correction.decontaminate_spots(Reads, tissue_mask, basis_idxs, basis_mask, n_top=10, rel_tol=0.0001, max_steps=5, local_weight=15, basis_init=None, Rates_init=None)¶

bayestme.bleeding_correction.fit_basis_functions(Reads, tissue_mask, Rates, global_rates, basis_idxs, basis_mask, lam=0, local_weight=100, x_init=None)¶

bayestme.bleeding_correction.fit_spot_rates(Reads, tissue_mask, Weights, x_init=None)¶

bayestme.bleeding_correction.get_suggested_initial_local_weight(dataset: SpatialExpressionDataset) → float¶

bayestme.bleeding_correction.has_non_tissue_spots(dataset: SpatialExpressionDataset) → bool¶

bayestme.bleeding_correction.imshow_matrix(reads, locations, fill=False)¶

bayestme.bleeding_correction.plot_basis_functions(basis_functions, output_dir)¶

bayestme.bleeding_correction.plot_before_after_cleanup(before_correction: SpatialExpressionDataset, after_correction: SpatialExpressionDataset, gene: str, output_dir: str, output_format: str = 'pdf', cmap=matplotlib.cm.jet)¶

bayestme.bleeding_correction.plot_bleed_vectors(stdata: SpatialExpressionDataset, bleed_result: BleedCorrectionResult, gene_name: str, output_path: str, colormap=matplotlib.cm.Set2_r)¶

bayestme.bleeding_correction.plot_bleeding(before_correction: SpatialExpressionDataset, after_correction: SpatialExpressionDataset, gene: str, output_path: str)¶: Plot the raw reads, effective reads, and bleeding of a given gene.

bayestme.bleeding_correction.rates_from_raw(x, tissue_mask, Reads_shape)¶

bayestme.bleeding_correction.select_local_weight(reads, tissue_mask, basis_idxs, basis_mask, min_weight=1, max_weight=100, n_weights=11, test_pct=0.2, n_top=10)¶

bayestme.bleeding_correction.softplus(x)¶

bayestme.bleeding_correction.tissue_mask_to_grid(tissue_mask, locations)¶

bayestme.bleeding_correction.weights_from_basis(basis_functions, basis_idxs, basis_mask, tissue_mask, local_weight)¶

bayestme.communities module¶

bayestme.communities.adjacency_matrix_from_edges(edges)¶

bayestme.communities.align_clusters(assignments_ref, assignments)¶

bayestme.communities.communities_from_posteriors(posteriors, edges, n_clusters=None, min_clusters=1, max_clusters=20, cluster_score=<function gaussian_aicc_bic_mixture>)¶

bayestme.communities.communities_from_posteriors_separate(posteriors, edges, n_clusters=None, min_clusters=1, max_clusters=7, cluster_score=<function gaussian_aicc_bic_mixture>)¶

bayestme.communities.exhaustive_best_permutation(source, target, n_clusters)¶

bayestme.communities.gaussian_aicc(posteriors, assignments, clusters)¶

bayestme.communities.gaussian_aicc_bic_mixture(posteriors, assignments, clusters, bic_proportion=0.5)¶

bayestme.communities.gaussian_bic(posteriors, assignments, clusters)¶

bayestme.communities.greedy_best_permutation(source, target, n_clusters)¶

bayestme.communities.swap_labels(labels, perm)¶

bayestme.communities.test_communities()¶

bayestme.cv_likelihoods module¶

bayestme.cv_likelihoods.get_best_lambda_value(likelihoods, best_n_components, lambda_array, k_vals)¶

bayestme.cv_likelihoods.get_max_likelihood_n_components(likelihoods, k_vals)¶

bayestme.cv_likelihoods.load_likelihoods(output_dir=None, output_files=None)¶

bayestme.cv_likelihoods.plot_cv_running(results_dir, results_files, out_path, output_format='pdf')¶

bayestme.cv_likelihoods.plot_likelihoods(likelihood_path, out_file, normalize=False)¶

bayestme.data module¶

class bayestme.data.BleedCorrectionResult(corrected_reads: numpy.ndarray, global_rates: numpy.ndarray, basis_functions: numpy.ndarray, weights: numpy.ndarray)¶

Bases: object

Data model for the results of bleeding correction.

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.DeconvolutionResult(cell_prob_trace: numpy.ndarray, expression_trace: numpy.ndarray, beta_trace: numpy.ndarray, cell_num_trace: numpy.ndarray, reads_trace: numpy.ndarray, lam2: float, n_components: int)¶

Bases: object

Data model for the results of sampling from the deconvolution posterior distribution.

align_celltype(sc_expression, n=50)¶: reorder the deconvolution results, aligned the detected cell type with the given scrna rerference :param sc_expression: K by G matrix, K cell types and G genes :returen: the ordering of the deconvolution result that matches the given scref

property omega¶

Return a matrix of ω_kg from equation 6 of the preprint

Returns:: An <N cell types> x <N markers> floating point matrix.

property omega_difference¶

Return a matrix of average ratio of expression/ maximum expression for each marker in each component

This statistic represents the “overexpression” of a gene in a cell type, and is used for scaling the dot size in our marker gene plot.

Returns:: An <N cell types> x <N markers> floating point matrix.

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

property relative_expression¶

Return a matrix of average expression in this cell type, minus the max expression in all other cell types, divided by the maximum expression in all cell types. A higher number for this statistic represents a better candidate marker gene.

This statistic is used as a tiebreaker criteria for marker gene selection when omega_kg values are equal.

Returns:: An <N cell types> x <N markers> floating point matrix.

save(path)¶

class bayestme.data.Layout(value)¶

Bases: Enum

An enumeration.

HEX = 1¶

SQUARE = 2¶

class bayestme.data.PhenotypeSelectionResult(mask: numpy.ndarray, cell_prob_trace: numpy.ndarray, expression_trace: numpy.ndarray, beta_trace: numpy.ndarray, cell_num_trace: numpy.ndarray, log_lh_train_trace: numpy.ndarray, log_lh_test_trace: numpy.ndarray, n_components: int, lam: float, fold_number: int)¶

Bases: object

Data model for the results of one job in phenotype selection k-fold cross validation

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.SpatialDifferentialExpressionResult(w_samples: numpy.ndarray, c_samples: numpy.ndarray, gamma_samples: numpy.ndarray, h_samples: numpy.ndarray, v_samples: numpy.ndarray, theta_samples: numpy.ndarray)¶

Bases: object

Data model for results from sampling from the spatial differential expression posterior distribution.

property n_components¶

property n_spatial_patterns¶

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.SpatialDifferentialExpressionSamplerState(pickled_bit_generator: bytes, n_cell_types: int, n_nodes: int, n_signals: int, n_spatial_patterns: int, lam2: float, edges: numpy.ndarray, alpha: numpy.ndarray, W: numpy.ndarray, Gamma: numpy.ndarray, H: numpy.ndarray, C: numpy.ndarray, V: numpy.ndarray, Theta: numpy.ndarray, Omegas: numpy.ndarray, prior_vars: numpy.ndarray, Delta: scipy.sparse.csc.csc_matrix, DeltaT: scipy.sparse.csc.csc_matrix, Tau2: numpy.ndarray, Tau2_a: numpy.ndarray, Tau2_b: numpy.ndarray, Tau2_c: numpy.ndarray, Sigma0_inv: scipy.sparse.csc.csc_matrix, Cov_mats: numpy.ndarray)¶

Bases: object

Data model for internal SDE gibbs sampler state

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

save(path)¶

class bayestme.data.SpatialExpressionDataset(adata: anndata.AnnData)¶

Bases: object

Data model for holding read counts, their associated position information, and whether they come from tissue or non tissue spots. Also holds the names of the gene markers in the dataset.

property cell_type_counts: numpy.ndarray | None¶

property cell_type_probabilities: numpy.ndarray | None¶

property edges: numpy.ndarray¶

classmethod from_arrays(raw_counts: numpy.ndarray, positions: numpy.ndarray | None, tissue_mask: numpy.ndarray | None, gene_names: numpy.ndarray, layout: Layout, barcodes: numpy.array | None = None)¶

Construct SpatialExpressionDataset directly from numpy arrays.

Parameters:

raw_counts – An <N spots> x <N markers> matrix.
positions – An <N spots> x 2 matrix of spot coordinates.
tissue_mask – An <N spot> length array of booleans. True if spot is in tissue, False if not.
gene_names – An <M markers> length array of gene names.
layout – Layout.SQUARE of the spots are in a square grid layout, Layout.HEX if the spots are
barcodes – List of UMI barcodes

in a hex grid layout.

property gene_names: numpy.array¶

property layout: Layout¶

property marker_gene_indices: List[numpy.ndarray] | None¶

property marker_gene_names: List[numpy.ndarray] | None¶

property n_cell_types: int | None¶

property n_gene: int¶

property n_spot: int¶

property n_spot_in: int¶

property omega_difference: numpy.ndarray | None¶

property positions: numpy.ndarray¶

property positions_tissue: numpy.ndarray¶

property raw_counts: numpy.ndarray¶

classmethod read_count_mat(data_path, layout=Layout.SQUARE)¶

Load data from tsv count matrix containing only in-tissue spots where the count matrix is a tsv file of shape G by N The column names and row names are position and gene names respectively

Parameters:

data_path – /path/to/count_matrix
layout – Layout.SQUARE of the spots are in a square grid layout, Layout.HEX if the spots are

in a hex grid layout. :return: SpatialExpressionDataset

classmethod read_h5(path)¶: Read this class from an h5 archive :param path: Path to h5 file. :return: SpatialExpressionDataset

classmethod read_legacy_h5(path)¶

classmethod read_spaceranger(data_path, layout=Layout.HEX)¶

Load data from spaceranger /outputs folder

Parameters:

data_path – Directory containing at least 1) /raw_feature_bc_matrix for raw count matrix 2) /filtered_feature_bc_matrix for filtered count matrix 3) /spatial for position list
layout – Layout.SQUARE of the spots are in a square grid layout, Layout.HEX if the spots are

in a hex grid layout. :return: SpatialExpressionDataset

property reads: numpy.ndarray¶

save(path)¶

property tissue_mask: numpy.array¶

bayestme.data.create_anndata_object(counts: numpy.ndarray, coordinates: numpy.ndarray | None, tissue_mask: numpy.ndarray | None, gene_names: numpy.ndarray, layout: Layout, barcodes: numpy.array | None = None)¶

Create an AnnData object from spatial expression data.

Parameters:

counts – N x G read count matrix
coordinates – N x 2 coordinate matrix
tissue_mask – N length boolean array indicating in-tissue or out of tissue
gene_names – N length string array of gene names
layout – Layout enum
barcodes – List of UMI barcodes

Returns:

AnnData object containing all information provided.

bayestme.data.is_csv(fn: str)¶

bayestme.data.is_csv_tsv(fn: str)¶

bayestme.data.is_tsv(fn: str)¶

bayestme.deconvolution module¶

class bayestme.deconvolution.MarkerGeneMethod(value)¶

Bases: Enum

An enumeration.

FALSE_DISCOVERY_RATE = 'FALSE_DISCOVERY_RATE'¶

TIGHT = 'TIGHT'¶

bayestme.deconvolution.add_deconvolution_results_to_dataset(stdata: SpatialExpressionDataset, result: DeconvolutionResult)¶

Modify stdata in-place to annotate it with selected marker genes

Parameters:

stdata – data.SpatialExpressionDataset to modify
result – data.DeconvolutionResult to use

bayestme.deconvolution.add_marker_gene_results_to_dataset(stdata: SpatialExpressionDataset, marker_genes: List[numpy.ndarray])¶

Modify stdata in place to to annotate it with marker gene selection results.

Parameters:

stdata – data.SpatialExpressionDataset to modify
marker_genes – Selected marker genes to add to dataset

bayestme.deconvolution.create_top_gene_lists(stdata: SpatialExpressionDataset, deconvolution_result: DeconvolutionResult, output_path: str, n_marker_genes: int = 5, alpha: float = 0.05, marker_gene_method: MarkerGeneMethod = MarkerGeneMethod.TIGHT, cell_type_names=None)¶

bayestme.deconvolution.deconvolve(reads, edges, n_gene=None, n_components=None, lam2=None, n_samples=100, n_burnin=1000, n_thin=10, bkg=False, lda=False, n_max=120, D=30, expression_truth=None, rng: numpy.random.Generator | None = None) → DeconvolutionResult¶

Run deconvolution

Parameters:

reads – Read count matrix
edges – Spot adjacency matrix
n_gene – Number of gene markers
n_components – Number of components or cell types
lam2 – Lambda smoothing parameter
n_samples – Number of total samples from the posterior distribution
n_burnin – Number of burn in samples before samples are saved
n_thin – Proportion of samples to save
bkg –
lda – If true use LDA initialization
expression_truth – If provided, use ground truth per cell type relative expression values,

output from companion scRNA fine mapping. :param rng: Numpy random generator to use :return: data.DeconvolutionResult

bayestme.deconvolution.plot_cell_num(stdata: SpatialExpressionDataset, output_dir: str, output_format: str = 'pdf', cmap=matplotlib.cm.jet, seperate_pdf: bool = False, cell_type_names: List[str] | None = None)¶

bayestme.deconvolution.plot_cell_num_scatterpie(stdata: SpatialExpressionDataset, output_path: str, cell_type_names: List[str] | None = None)¶

Create a “scatter pie” plot of the deconvolution cell counts.

Parameters:

stdata – SpatialExpressionDataset to plot
output_path – Where to save plot
cell_type_names – Cell type names to use in plot, an array of length n_components

bayestme.deconvolution.plot_cell_prob(stdata: SpatialExpressionDataset, output_dir: str, output_format: str = 'pdf', cmap=matplotlib.cm.jet, seperate_pdf: bool = False, cell_type_names: List[str] | None = None)¶

bayestme.deconvolution.plot_deconvolution(stdata: SpatialExpressionDataset, output_dir: str, output_format: str = 'pdf', cell_type_names: List[str] | None = None)¶

Create a suite of plots for deconvolution results.

Parameters:

stdata – SpatialExpressionDataset to plot
output_dir – Output directory where plots will be saved
output_format – File format of plots
cell_type_names – Cell type names to use in plot, an array of length n_components

bayestme.deconvolution.plot_marker_genes(stdata: SpatialExpressionDataset, output_file: str, cell_type_labels: List[str] | None = None, colormap: matplotlib.cm.ScalarMappable = matplotlib.cm.BuPu)¶

bayestme.deconvolution.select_marker_genes(deconvolution_result: DeconvolutionResult, n_marker: int = 5, alpha: float = 0.05, method: MarkerGeneMethod = MarkerGeneMethod.TIGHT) → List[numpy.ndarray]¶

Returns a list of length <N components>, where the values are indices of the marker genes in the gene name array for that component.

Parameters:

deconvolution_result – DeconvolutionResult object
n_marker – Number of markers per cell type to select
alpha – Marker gene threshold parameter, defaults to 0.05
method – Enum representing which marker gene selection method to use.

Returns:

List of arrays of gene indices

bayestme.expression_truth module¶

bayestme.expression_truth.combine_multiple_expression_truth(expression_truth_arrays: List[numpy.array], num_warmup=200, num_samples=200)¶

bayestme.expression_truth.dirichlet_alpha_model(expression_truth, N=None, J=None)¶

bayestme.expression_truth.fit_alpha_for_multiple_samples(data, num_warmup=200, num_samples=200)¶

bayestme.expression_truth.load_expression_truth(stdata: SpatialExpressionDataset, seurat_output: str)¶

Load outputs from seurat fine mapping to be used in deconvolution.

Parameters:

stdata – SpatialExpressionDataset object
seurat_output – CSV output from seurat fine mapping workflow

Returns:

Tuple of n_components x n_genes size array, representing relative

expression of each gene in each cell type

bayestme.fast_multivariate_normal module¶

bayestme.fast_multivariate_normal.sample_multivariate_normal(Q, mu=None, mu_part=None, sparse=True, precision=False, chol_factor=False, Q_shape=None, **kwargs)¶

Fast sampling from a multivariate normal with covariance or precision parameterization. Supports sparse arrays.

Parameters:

Q – covariance or precision matrix
mu – If provided, assumes the model is N(mu, Q)
mu_part – If provided, assumes the model is N(Q mu_part, Q)
sparse – If true, assumes we are working with a sparse Q
precision – If true, assumes Q is a precision matrix (inverse covariance)
chol_factor – If true, assumes Q is a (lower triangular) Cholesky decomposition of the covariance matrix (or of the precision matrix if precision=True).
Q_shape –
kwargs –

Returns:

One sample from the multivariate normal

bayestme.fast_multivariate_normal.sample_multivariate_normal_from_covariance(Q, mu=None, mu_part=None, sparse=True, chol_factor=False, force_psd=False, force_psd_eps=1e-06, force_psd_attempts=4, rng: numpy.random.Generator | None = None)¶

Fast sampling from a multivariate normal with covariance parameterization. Supports sparse arrays.

Parameters:

Q – Covariance matrix
mu – If provided, assumes the model is N(mu, Q)
mu_part – If provided, assumes the model is N(Q mu_part, Q)
sparse – If true, assumes we are working with a sparse Q
chol_factor – If true, assumes Q is a (lower triangular) Cholesky decomposition of the covariance matrix
force_psd – If true, attempts to force the covariance-matrix to be positive definite adding a diagonal term
force_psd_eps – If force_psd is true, force_psd_eps is the frist value added to the diagonal to force the covariance matrix to be positive definite.l
force_psd_attempts – If force_psd is true, this is the number of attempts to force the covariance matrix to be positive definite. Each attempt a diagonal term that is 10 times larger than the previous one is added.
rng – numpy.random.Generator to use

Returns:

One sample from the multivariate normal distribution

bayestme.fast_multivariate_normal.sample_multivariate_normal_from_precision(Q, mu=None, mu_part=None, sparse=True, chol_factor=False, Q_shape=None, force_psd=False, force_psd_eps=1e-06, force_psd_attempts=4, rng: numpy.random.Generator | None = None)¶

Fast sampling from a multivariate normal with precision parameterization. Supports sparse arrays.

Parameters:

Q – Precision matrix
mu – If provided, assumes the model is N(mu, Q^-1)
mu_part – If provided, assumes the model is N(Q^-1 mu_part, Q^-1)
sparse – If true, assumes we are working with a sparse Q
chol_factor – If true, assumes Q is a (lower triangular) Cholesky decomposition of the precision matrix
Q_shape –
force_psd – If true, attempts to force the precision matrix to be positive definite adding a diagonal term.
force_psd_eps – If force_psd is true, force_psd_eps is the frist value added to the diagonal to force the precision matrix to be positive definite.
force_psd_attempts – If force_psd is true, this is the number of attempts to force the precision matrix to be positive definite. Each attempt a diagonal term that is 10 times larger than the previous one is added.
rng – numpy.random.Generator to use

Returns:

One sample from the multivariate normal distribution

bayestme.gene_filtering module¶

class bayestme.gene_filtering.FilterType(value)¶

Bases: Enum

An enumeration.

RIBOSOME = 2¶

SPOTS = 1¶

bayestme.gene_filtering.filter_genes_by_spot_threshold(dataset: SpatialExpressionDataset, spot_threshold: float)¶

bayestme.gene_filtering.filter_list_of_genes(dataset: SpatialExpressionDataset, genes_to_remove)¶

bayestme.gene_filtering.filter_ribosome_genes(dataset: SpatialExpressionDataset)¶

bayestme.gene_filtering.filter_stdata_to_match_expression_truth(stdata: SpatialExpressionDataset, seurat_output: str)¶

Filter the stdata down to the intersection of genes between it and the expression truth file.

Parameters:

stdata – SpatialExpressionDataset object
seurat_output – CSV output from seurat fine mapping workflow

Returns:

Filtered stdata object

bayestme.gene_filtering.select_top_genes_by_standard_deviation(dataset: SpatialExpressionDataset, n_gene: int) → SpatialExpressionDataset¶

bayestme.gfbt_multinomial module¶

class bayestme.gfbt_multinomial.GraphFusedBinomialTree(n_classes, edges, trend_order=0, lam2=1e-05, pg_seed=42, stability=1e-06, rng: numpy.random.Generator | None = None)¶

Bases: object

calculate_probs()¶: Converts from a binomial tree representation to a multinomial representation.

counts_from_outcomes(outcomes)¶

resample(outcomes)¶

bayestme.hmm_fast module¶

class bayestme.hmm_fast.HMM(n_components, n_max=120, rng: numpy.random.Generator | None = None)¶

Bases: object

backward_sample()¶

ffbs(Obs, start_prob, LogTransition, expression)¶

forward_filter(Obs)¶

ship_up(mat)¶

bayestme.hmm_fast.emission_fast(reads, log_cell, expression, cell_grid)¶

bayestme.hmm_fast.transition_mat_vec(phi, n_max, ifsigma=False)¶

bayestme.log_config module¶

bayestme.log_config.add_logging_args(parser)¶

bayestme.log_config.configure_logging(args)¶

bayestme.model_bkg module¶

class bayestme.model_bkg.GraphFusedMultinomial(n_components, edges, observations, n_gene=300, n_max=120, background_noise=False, mask=None, c=4, D=30, lam_psi=0.01, lda_initialization=False, truth_expression=None, truth_prob=None, truth_cellnum=None, truth_reads=None, truth_beta=None, rng: numpy.random.Generator | None = None)¶

Bases: object

load_model(load_dir='')¶

sample(Obs)¶

sample_beta()¶: sample expected cell-type-wise total cellular gene expression

sample_cell_num()¶: sample the cell-type-wise cell count cell_num[i] = [D_i, d_i1, d_i2, …, d_iK], where d_i1 + d_i2 + … + d_iK = D_i D_i total cell number in spot i d_ik cell number of cell-type k in spot i

sample_phi()¶: sample cell-type-wise gene expression profile, phi_kg

sample_probs()¶: sample cell-type probability psi_ik with spatial smoothing

sample_reads(Observations)¶: sample cell-type-wise reads of each gene at each spot, R_igk reads: N*G*K R_igk Observation: N*G R_ig observed data, gene reads at each spot assignment_probs: N*G*K xi_igk multinational prob cell_num: N*K d_ik cell-type-wise cell count in each spot, d_ik, N*K betas: K beta_k expected cell-type-wise total gene expression of individual cells

save_model(save_dir='')¶

bayestme.model_bkg.transition_mat(phi, n_max, coeff, ifsigma=False)¶

bayestme.phenotype_selection module¶

bayestme.phenotype_selection.create_folds(stdata: SpatialExpressionDataset, n_fold=5, n_splits=15, n_neighbours=None)¶

bayestme.phenotype_selection.get_n_neighbors(stdata: SpatialExpressionDataset)¶: Return a numpy array of size stdata.n_spot_in representing the number of neighbors for each spot.

bayestme.phenotype_selection.get_phenotype_selection_parameters_for_folds(stdata: SpatialExpressionDataset, n_fold: int, n_splits: int, lams: Iterable[int], n_components_min: int, n_components_max: int)¶

bayestme.phenotype_selection.plot_folds(stdata, folds, output_dir: str)¶

bayestme.phenotype_selection.run_phenotype_selection_single_job(lam: float, n_components: int, mask: numpy.ndarray, fold_number: int, stdata: SpatialExpressionDataset, n_samples: int, n_burn: int, n_thin: int, max_ncell: int, n_gene: int, background_noise: bool, lda_initialization: bool) → PhenotypeSelectionResult¶

bayestme.phenotype_selection.sample_graph_fused_multinomial(train: numpy.ndarray, test: numpy.ndarray, n_components, edges: numpy.ndarray, n_gene: int, lam_psi: float, background_noise: bool, lda_initialization: bool, mask: numpy.ndarray, n_max: int, n_samples: int, n_thin: int, n_burn: int, rng: numpy.random.Generator | None = None)¶

bayestme.plotting module¶

bayestme.plotting.get_wedge_dimensions_from_value_array(value_array: numpy.array)¶: Get a series of N (start, stop) pairs in degrees of the pie chart defined by the data in the length N value_array :param value_array: np.array of length N :return: list of 2-tuples of length N.

bayestme.plotting.get_x_y_arrays_for_layout(coords: numpy.ndarray, layout: Layout) → Tuple[numpy.array, numpy.array]¶

bayestme.plotting.plot_colored_spatial_polygon(fig: matplotlib.figure.Figure, ax: matplotlib.axes.Axes, coords: numpy.ndarray, values: numpy.ndarray, layout: Layout, colormap: matplotlib.cm.ScalarMappable = matplotlib.cm.BuPu, norm=None, plotting_coordinates=None, normalize=True)¶

Basic plot of spatial gene expression

Parameters:

fig – matplotlib figure artist object to which plot will be written
ax – matplotlib axes artist object to which plot will be written
coords – np.ndarray of int, shape of (N, 2)
values – np.ndarray of int, shape of (N,)
layout – Layout enum
colormap – Colormap for converting values to colors, defaults to BuPu
norm – Function for normalizing scalar values, defaults to Normalizer over values domain
plotting_coordinates – Expand the plotting window to include these coordinates, default is to just plot over coords.
normalize – Whether to normalize values before coloring them or not. Set false for boolean data.

Returns:

matplotlib Figure object

bayestme.plotting.plot_gene_in_tissue_counts(stdata: SpatialExpressionDataset, gene: str)¶

bayestme.plotting.plot_gene_raw_counts(stdata: SpatialExpressionDataset, gene: str, output_file: str)¶

bayestme.plotting.plot_spatial_pie_charts(fig: matplotlib.figure.Figure, ax: matplotlib.axes.Axes, coords: numpy.ndarray, values: numpy.ndarray, layout: Layout, colormap: matplotlib.cm.ScalarMappable = matplotlib.colors.ListedColormap, plotting_coordinates=None, cell_type_names=None)¶

Plot pie charts to show relative proportions of multiple scalar values at each spot

Parameters:

fig – matplotlib figure artist object to which plot will be written
ax – matplotlib axes artist object to which plot will be written
coords – np.ndarray of int, shape of (N, 2)
values – np.ndarray of int, shape of (N, M)
layout – Layout enum
colormap – Colormap for the pie chart wedges, defaults to Glasbey30
plotting_coordinates – Expand the plotting window to include these coordinates, default is to just plot over coords.
cell_type_names – A array of length n_components, which provides cell type names.

Returns:

matplotlib Figure object

bayestme.semi_synthetic_spatial module¶

bayestme.semi_synthetic_spatial.generate_semi_synthetic(adata: anndata.AnnData, cluster_id_column: str, tissue_positions: numpy.ndarray, n_genes: int, cell_num=None, canvas_size=(36, 36), sq_size=4, layout=None, random_seed=None, n_spatial_gene=50, alpha=1, w=None, spatial_gene=None, spatial_cell_type=None, spatial_programs=None, verbose=True)¶

Generate a synthetic ST dataset using a pre-clustered scRNA dataset.

Parameters:

adata – scRNA dataset
cluster_id_column – Column in adata.obs that contains cluster/cell type classification
tissue_positions – coordinates of tissue points in slide
n_genes – Number of genes to use
cell_num –
canvas_size –
sq_size –
layout –
random_seed –
n_spatial_gene –
alpha –
w –
spatial_gene –
spatial_cell_type –
spatial_programs –
verbose –

Returns:

data.SpatialExpressionDataset containing simulated ST data

bayestme.semi_synthetic_spatial.get_rank(array)¶

bayestme.spatial_expression module¶

class bayestme.spatial_expression.SpatialDifferentialExpression(n_cell_types: int, n_spatial_patterns: int, n_nodes: int, n_signals: int, edges: numpy.ndarray, alpha: numpy.ndarray, prior_vars: numpy.ndarray, lam2: float = 1, rng: numpy.random.Generator | None = None)¶

Bases: object

checkpoint_variable_state()¶

clear_checkpoint()¶

constant_state = ['n_cell_types', 'n_nodes', 'n_signals', 'n_spatial_patterns', 'lam2', 'edges', 'prior_vars', 'alpha']¶

get_state() → SpatialDifferentialExpressionSamplerState¶

initialize()¶

classmethod load_from_state(sampler_state: SpatialDifferentialExpressionSamplerState)¶

reset_to_checkpoint()¶

sample(n_obs, Y_igk, cell_type_filter)¶

sample_pattern_probs()¶

sample_pg(rates, Y_igk)¶

sample_sigmainv(stability=1e-06, lam2=1)¶

sample_spatial_assignments(n_obs, Y_igk)¶

sample_spatial_patterns(n_obs, Y_igk, cell_type_filter)¶

sample_spatial_weights(n_obs, Y_igk)¶

spatial_detection(cell_num_trace, beta_trace, expression_trace, reads_trace, n_samples=100, n_burn=100, n_thin=5, ncell_min=5, simple=False)¶

variable_state = ['W', 'Gamma', 'H', 'C', 'V', 'Theta', 'Omegas', 'Delta', 'DeltaT', 'Tau2', 'Tau2_a', 'Tau2_b', 'Tau2_c', 'Sigma0_inv', 'Cov_mats']¶

bayestme.spatial_expression.calculate_spatial_genes(sde_result: SpatialDifferentialExpressionResult, h_threshold=0.95, magnitude_filter=None) → numpy.array¶

bayestme.spatial_expression.filter_disconnected_points(edges, data)¶

bayestme.spatial_expression.filter_pseudogenes_from_selection(gene_id_selection: numpy.array, gene_names: numpy.array)¶

Given an array indexing genes from gene_names, remove any selection of a pseudogene.

Parameters:

gene_id_selection – Array indexing into gene_names
gene_names – Array of string, gene names

Returns:

bayestme.spatial_expression.get_n_cell_correlation(n_cell_filter: numpy.array, w_pattern: numpy.array)¶

Return perason’s R between two arrays

Parameters:

n_cell_filter – np.array
w_pattern – np.array

Returns:

float

bayestme.spatial_expression.get_proportion_of_spots_in_k_with_pattern_h_per_gene(h_samples: numpy.ndarray, k: int, h: int) → numpy.array¶

Parameters:

h_samples – h_samples from SpatialDifferentialExpressionResult
k – cell type index
h – spatial pattern index

Returns:

float

bayestme.spatial_expression.moran_i(edges: numpy.ndarray, data: numpy.array, two_tailed=False) → float¶

Calculate Moran’s I (spatial auto-correlation statistic)

Parameters:

edges – N x 2 np.ndarray, where N is the number of edges. Values in the edges matrix represent indices into data.
data – np.array of scalar values
two_tailed – If true, return 2-tailed p-value. Default is False.

Returns:

float between 0 and 1

bayestme.spatial_expression.plot_significant_spatial_patterns(stdata: SpatialExpressionDataset, decon_result: DeconvolutionResult, sde_result: SpatialDifferentialExpressionResult, output_dir, output_format: str = 'pdf', cell_type_names: List[str] | None = None)¶

bayestme.spatial_expression.plot_spatial_pattern(fig: matplotlib.figure.Figure, ax: matplotlib.axes.Axes, stdata: SpatialExpressionDataset, decon_result: DeconvolutionResult, sde_result: SpatialDifferentialExpressionResult, gene_ids: numpy.array, k: int, h: int, colormap, plot_threshold: int = 2)¶

bayestme.spatial_expression.plot_spatial_pattern_and_all_constituent_genes(stdata: SpatialExpressionDataset, decon_result: DeconvolutionResult, sde_result: SpatialDifferentialExpressionResult, gene_ids: numpy.array, k: int, h: int, program_id: int, output_dir: str, output_format: str, colormap=matplotlib.cm.coolwarm, plot_threshold: int = 2, cell_type_name: str | None = None)¶

bayestme.spatial_expression.plot_spatial_pattern_legend(fig: matplotlib.figure.Figure, ax: matplotlib.axes.Axes, stdata: SpatialExpressionDataset, sde_result: SpatialDifferentialExpressionResult, gene_ids: numpy.array, k: int, colormap, max_genes_to_plot: int = 8)¶

bayestme.spatial_expression.plot_spatial_pattern_with_legend(stdata: SpatialExpressionDataset, decon_result: DeconvolutionResult, sde_result: SpatialDifferentialExpressionResult, gene_ids: numpy.array, k: int, h: int, output_file: str, colormap=matplotlib.cm.coolwarm, plot_threshold: int = 2)¶

bayestme.spatial_expression.run_spatial_expression(sde: SpatialDifferentialExpression, deconvolve_results: DeconvolutionResult, n_samples: int, n_burn: int, n_thin: int, n_cell_min: int, simple=True) → SpatialDifferentialExpressionResult¶

bayestme.spatial_expression.select_significant_spatial_programs(stdata: SpatialExpressionDataset, decon_result: DeconvolutionResult, sde_result: SpatialDifferentialExpressionResult, tissue_threshold: int = 5, cell_correlation_threshold: float = 0.5, moran_i_score_threshold: float = 0.9, gene_spatial_pattern_proportion_threshold: float = 0.95, filter_pseudogenes: bool = False)¶

Filter significant combinations of cell type and spatial expression patterns. This methodology aims to filter out programs that capture technical noise and over-dispersion rather than meaningful spatial signal.

Parameters:

stdata – spatial expression dataset
decon_result – deconvolution result
sde_result – spatial differential expression result
cell_correlation_threshold – Threshold for cell correlation metric
moran_i_score_threshold – Moran’s I score cutoff,

spatial programs with scores below this will be filtered out :param gene_spatial_pattern_proportion_threshold: Only return (k, h) pairs where in cell types k greater than this proportion of spots are labeled with spatial pattern h for at least one gene. :param tissue_threshold: Only consider spots with greater than this many cells of type k for Moran’s I calculation and cell correlation calculation :param filter_pseudogenes: Do not consider pseudogenes. :return: generator of (cell type index, spatial pattern index, np.array of gene indices)

bayestme.synthetic_data module¶

bayestme.synthetic_data.generate_demo_dataset()¶: Generate a fake dataset for use in testing or demonstration. This dataset should produce two cell types, with 2 markers genes for each cell type.

bayestme.synthetic_data.generate_fake_stdataset(n_rows: int = 30, n_cols: int = 30, n_genes: int = 20, layout: Layout = Layout.SQUARE) → SpatialExpressionDataset¶

Create a fake dataset for use in testing or demonstration.

Parameters:

n_rows – width of the fake slide
n_cols – height of the fake slide
n_genes – number of marker genes
layout – layout of spots on the fake slide

Returns:

SpatialExpressionDataset object containing simulated data

bayestme.synthetic_data.generate_simulated_bleeding_reads_data(n_rows=30, n_cols=30, n_genes=20, spot_bleed_prob=0.5, length_scale=0.2, gene_bandwidth=1, bleeding='anisotropic')¶

Generate simulated read data with modeled bleeding.

Parameters:

n_rows – Number of spot rows
n_cols – Number of spot columns
n_genes – Number of genes in output reads dataset
spot_bleed_prob – Tuning param
length_scale – Tuning param
gene_bandwidth – Tuning param
bleeding – Type of bleeding

Returns:

(locations, tissue_mask, true_rates, true_counts, bleed_counts)

bayestme.utils module¶

bayestme.utils.construct_composite_trendfilter(adjacency_matrix, k, anchor=0, sparse=False)¶

Build the k^1 through k^n trendfilter matrices stacked on top of each other in order to penalize multiple k values

Parameters:

adjacency_matrix – An adjacency matrix in first order discrete difference operator form.
k – Maximum order of trend filtering.
anchor – Node index to set to 1 in the first row of the resulting matrix
sparse – If true return a sparse matrix.

Returns:

Composite trendfilter matrix.

bayestme.utils.construct_edge_adjacency(neighbors)¶

Build the oriented edge-adjacency matrix in “discrete difference operator” form an interable of (v1, v2) tuples representing edges.

Parameters:: neighbors – E x 2 array, where E is the number of edges
Returns:: An E x V sparse matrix, where E is the number of edges and V the number of vertices

bayestme.utils.construct_trendfilter(adjacency_matrix, k, sparse=False)¶

Builds the k’th-order trend filtering matrix from an adjacency matrix. k=0 is the fused lasso / total variation matrix k=1 is the linear trend filtering / graph laplacian matrix k=2 is the quadratic trend filtering matrix etc.

Parameters:

adjacency_matrix – An adjacency matrix in first order discrete difference operator form.
k – Order of trend filtering
sparse – If true return a sparse matrix, otherwise return a dense np.ndarray

Returns:

Graph trend filtering matrix

bayestme.utils.filter_reads_to_top_n_genes(reads, n_gene)¶

bayestme.utils.get_edges(pos, layout=1) → numpy.ndarray¶

Given a set of positions and plate layout, return adjacency edges.

Parameters:

pos – An N x 2 array of coordinates
layout – Plate layout enum

Returns:

An <N edges> x 2 array

bayestme.utils.get_kth_order_discrete_difference_operator(first_order_discrete_difference_operator, k)¶

Calculate the k-th order trend filtering matrix given a first order discrete difference operator of shape M x N.

Parameters:

first_order_discrete_difference_operator – Input first order discrete difference operator
k – Order of the output discrete difference operator

Returns:

If k is even, this returns an M x N size matrix, if k is odd, this returns an N x N size matrix

bayestme.utils.get_posmap(pos)¶

Return a matrix where the (i, j) entry stores the idx of the reads at spatial coordinates (i, j) on the tissue sample.

Parameters:: pos – A 2 x N matrix of coordinates.
Returns:: An X x Y matrix of coordinates, where X and Y are

the max coordinate values on their respective axes

bayestme.utils.get_stddev_ordering(reads: numpy.ndarray)¶

bayestme.utils.get_top_gene_names_by_stddev(reads: numpy.ndarray, gene_names: numpy.array, n_genes=<class 'int'>)¶

bayestme.utils.ilogit(x)¶

bayestme.utils.is_first_order_discrete_difference_operator(a)¶

Test if a matrix is a discrete difference operator, meaning each row has one pair of (-1, 1) values representing adjacency in a graph structure.

Returns:: True if matrix is a first order discrete difference operator, False otherwise.

bayestme.utils.log_likelihood(attrributes, n_cells, Obs)¶

bayestme.utils.logp(beta, components, attr, obs)¶

bayestme.utils.multinomial_rvs(count, p, rng: numpy.random.Generator | None = None)¶

Sample from the multinomial distribution with multiple p vectors.

count must be an (n-1)-dimensional numpy array.

p must an n-dimensional numpy array, n >= 1. The last axis of p holds the sequence of probabilities for a multinomial distribution.

The return value has the same shape as p. Taken from: https://stackoverflow.com/questions/55818845/fast-vectorized-multinomial-in-python

Parameters:

count –
p –
rng –

Returns:

bayestme.utils.order_reads_by_stddev(reads: numpy.ndarray)¶

bayestme.utils.sample_horseshoe_plus(size=1, rng: numpy.random.Generator | None = None)¶

bayestme.utils.stable_softmax(x, axis=-1)¶

bayestme package¶

Submodules¶

bayestme.bleeding_correction module¶

bayestme.communities module¶

bayestme.cv_likelihoods module¶

bayestme.data module¶

bayestme.deconvolution module¶

bayestme.expression_truth module¶

bayestme.fast_multivariate_normal module¶

bayestme.gene_filtering module¶

bayestme.gfbt_multinomial module¶

bayestme.hmm_fast module¶

bayestme.log_config module¶

bayestme.model_bkg module¶

bayestme.phenotype_selection module¶

bayestme.plotting module¶

bayestme.semi_synthetic_spatial module¶

bayestme.spatial_expression module¶

bayestme.synthetic_data module¶

bayestme.utils module¶

bayestme

Navigation

Related Topics