patpy.tl.GroupedPseudobulk

patpy.tl.GroupedPseudobulk#

class patpy.tl.GroupedPseudobulk(sample_key, cell_group_key, layer='X_pca', seed=67)#: Baseline, where distances between samples are average distances between their cell group pseudobulks

Attributes table#

DISTANCES_UNS_KEY

Methods table#

`calculate_distance_matrix`([force, ...])	Calculate distances between samples as average distance between per cell-type pseudobulks
`embed`([method, n_jobs, verbose])	Convert distances to embedding of the samples
`evaluate_representation`(target[, method, ...])	Evaluate representation of `target` for the given distance matrix
`plot_clustermap`([metadata_cols, figsize])	Plot a clusterized heatmap of distances
`plot_embedding`([method, metadata_cols, ...])	Plot embedding of samples colored by `metadata_cols`
`plot_metadata_distribution`(metadata_columns, ...)	Predict metadata columns, and plot embeddings colorised by metadata values
`predict_metadata`(target[, metadata, ...])	Predict classes from metadata column `target` for samples using K-Nearest Neighbors classifier
`prepare_anndata`(adata)	fit-like method: prepare adata for the analysis
`to_adata`([metadata])	Convert samples data to AnnData object

Attributes#

GroupedPseudobulk.DISTANCES_UNS_KEY = 'X_ct_pseudobulk_distances'#

Methods#

GroupedPseudobulk.calculate_distance_matrix(force=False, aggregate='mean', dist='euclidean')#: Calculate distances between samples as average distance between per cell-type pseudobulks

GroupedPseudobulk.embed(method='UMAP', n_jobs=-1, verbose=False)#

Convert distances to embedding of the samples

Parameters:

method (str = "TSNE) – Method to use for embedding. Currently, “TSNE” and “MDS” are supported
n_jobs (int = 1) – Number of threads to use for computation. Use -1 to run on all processors
verbose (bool = False) – If True, print logging information during the computation

Returns:

coordinates : array-like Coordinates of samples in the embedding space. 2D for TSNE and MDS

GroupedPseudobulk.evaluate_representation(target, method='knn', metadata=None, num_donors_subset=None, proportion_donors_subset=None, **parameters)#

Evaluate representation of target for the given distance matrix

Parameters:

target ("str") – A sample-level covariate to evaluate representation for
method (Literal["knn", "distances", "proportions", "silhouette"]) –
Method to use for evaluation:
- knn: predict values of target using K-nearest neighbors and evaluate the prediction
- distances: test if distances between samples are significantly different from the null distribution
- proportions: test if distribution of target differs between groups (e.g. clusters)
- silhouette: calculate silhouette score for the given distances
num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.
proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.
parameters (dict) –
Parameters for the evaluation method. The following parameters are used:
- knn:
  - n_neighbors: number of neighbors to use for prediction
  - task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of predict_knn for more information
- distances:
  - control_level: value of target that should be used as a control group
  - normalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of test_distances_significance for more information
  - n_bootstraps: number of bootstrap iterations to use
  - trimmed_fraction: fraction of the most extreme values to remove from the distribution
  - compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio
- proportions:
  - groups: groups (e.g. cluster numbers) of the observations

Returns:

-result (dict) Result of evaluation with the following keys:

score: a number evaluating the representation. The higher the better
metric: name of the metric used for evaluation
n_unique: number of unique values in target
n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs)
method: name of the method used for evaluation

There are other optional keys depending on the method used for evaluation.

GroupedPseudobulk.plot_clustermap(metadata_cols=None, figsize=(10, 12), *args, **kwargs)#: Plot a clusterized heatmap of distances

GroupedPseudobulk.plot_embedding(method='UMAP', metadata_cols=None, continuous_palette='viridis', categorical_palette='tab10', na_color='lightgray', axes=None)#: Plot embedding of samples colored by metadata_cols

GroupedPseudobulk.plot_metadata_distribution(metadata_columns, tasks, method='knn', embedding='UMAP', metadata=None, metric_threshold=0.4)#

Predict metadata columns, and plot embeddings colorised by metadata values

Parameters:

metadata_columns (list) – List of metadata columns to show
tasks (list) – Tasks for each metadata column (classification, ranking or regression). Can be one string for all columns.
method (Literal["knn", "distances", "proportions", "silhouette"]) – Method to use for evaluation. See documentation of evaluate_representation for more information
embedding (str = "UMAP") – Embedding to use for plotting
metric_threshold (float = 0.3) – Results with lower values than this metric will not be displayed

GroupedPseudobulk.predict_metadata(target, metadata=None, n_neighbors=3, task='classification')#

Predict classes from metadata column target for samples using K-Nearest Neighbors classifier

Parameters:

target (str) – Column name from adata.obs, which will be used for classification
metadata (Optional[pd.DataFrame] = None) – Table with metadata about samples. Index should contain samples. If None, adata.obs is used
n_neighbors (int = 3) – Number of neighbors to use for classification
task (str = "classification")

Returns:

y_truearray-like: True values of target from metadata for samples with known values
y_predictedarray-like: Predicted values of target for samples with known values

GroupedPseudobulk.prepare_anndata(adata)#: fit-like method: prepare adata for the analysis

GroupedPseudobulk.to_adata(metadata=None, *args, **kwargs)#

Convert samples data to AnnData object

Parameters:

metadata (Optional[pd.DataFrame] = None) – Metadata about samples to be added to .obs of AnnData object. Should contain samples in index
*args – Additional arguments to pass to calculate_distance_matrix method
**kwargs – Additional arguments to pass to calculate_distance_matrix method

Returns:

-samples_adata (AnnData) AnnData object with samples data

patpy.tl.GroupedPseudobulk

Contents

patpy.tl.GroupedPseudobulk#

Attributes table#

Methods table#

Attributes#

Methods#