patpy.tl.GroupedPseudobulk#
- class patpy.tl.GroupedPseudobulk(sample_key, cell_group_key, layer='X_pca', seed=67)#
Baseline, where distances between samples are average distances between their cell group pseudobulks
Attributes table#
Methods table#
|
Calculate distances between samples as average distance between per cell-type pseudobulks |
|
Convert distances to embedding of the samples |
|
Evaluate representation of |
|
Plot a clusterized heatmap of distances |
|
Plot embedding of samples colored by |
|
Predict metadata columns, and plot embeddings colorised by metadata values |
|
Predict classes from metadata column |
|
fit-like method: prepare adata for the analysis |
|
Convert samples data to AnnData object |
Attributes#
- GroupedPseudobulk.DISTANCES_UNS_KEY = 'X_ct_pseudobulk_distances'#
Methods#
- GroupedPseudobulk.calculate_distance_matrix(force=False, aggregate='mean', dist='euclidean')#
Calculate distances between samples as average distance between per cell-type pseudobulks
- GroupedPseudobulk.embed(method='UMAP', n_jobs=-1, verbose=False)#
Convert distances to embedding of the samples
- Parameters:
method (str = "TSNE) – Method to use for embedding. Currently, “TSNE” and “MDS” are supported
n_jobs (int = 1) – Number of threads to use for computation. Use -1 to run on all processors
verbose (bool = False) – If True, print logging information during the computation
- Returns:
coordinates : array-like Coordinates of samples in the embedding space. 2D for TSNE and MDS
- GroupedPseudobulk.evaluate_representation(target, method='knn', metadata=None, num_donors_subset=None, proportion_donors_subset=None, **parameters)#
Evaluate representation of
targetfor the given distance matrix- Parameters:
target ("str") – A sample-level covariate to evaluate representation for
method (Literal["knn", "distances", "proportions", "silhouette"]) –
Method to use for evaluation:
knn: predict values of
targetusing K-nearest neighbors and evaluate the predictiondistances: test if distances between samples are significantly different from the null distribution
proportions: test if distribution of
targetdiffers between groups (e.g. clusters)silhouette: calculate silhouette score for the given distances
num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.
proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.
parameters (dict) –
Parameters for the evaluation method. The following parameters are used:
- knn:
n_neighbors: number of neighbors to use for prediction
task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of
predict_knnfor more information
- distances:
control_level: value of
targetthat should be used as a control groupnormalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of
test_distances_significancefor more informationn_bootstraps: number of bootstrap iterations to use
trimmed_fraction: fraction of the most extreme values to remove from the distribution
compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio
- proportions:
groups: groups (e.g. cluster numbers) of the observations
- Returns:
-result (
dict) Result of evaluation with the following keys:score: a number evaluating the representation. The higher the better
metric: name of the metric used for evaluation
n_unique: number of unique values in
targetn_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs)
method: name of the method used for evaluation
There are other optional keys depending on the method used for evaluation.
- GroupedPseudobulk.plot_clustermap(metadata_cols=None, figsize=(10, 12), *args, **kwargs)#
Plot a clusterized heatmap of distances
- GroupedPseudobulk.plot_embedding(method='UMAP', metadata_cols=None, continuous_palette='viridis', categorical_palette='tab10', na_color='lightgray', axes=None)#
Plot embedding of samples colored by
metadata_cols
- GroupedPseudobulk.plot_metadata_distribution(metadata_columns, tasks, method='knn', embedding='UMAP', metadata=None, metric_threshold=0.4)#
Predict metadata columns, and plot embeddings colorised by metadata values
- Parameters:
metadata_columns (list) – List of metadata columns to show
tasks (list) – Tasks for each metadata column (classification, ranking or regression). Can be one string for all columns.
method (Literal["knn", "distances", "proportions", "silhouette"]) – Method to use for evaluation. See documentation of
evaluate_representationfor more informationembedding (str = "UMAP") – Embedding to use for plotting
metric_threshold (float = 0.3) – Results with lower values than this metric will not be displayed
- GroupedPseudobulk.predict_metadata(target, metadata=None, n_neighbors=3, task='classification')#
Predict classes from metadata column
targetfor samples using K-Nearest Neighbors classifier- Parameters:
target (str) – Column name from
adata.obs, which will be used for classificationmetadata (Optional[pd.DataFrame] = None) – Table with metadata about samples. Index should contain samples. If None,
adata.obsis usedn_neighbors (int = 3) – Number of neighbors to use for classification
task (str = "classification")
- Returns:
- y_truearray-like
True values of
targetfrom metadata for samples with known values- y_predictedarray-like
Predicted values of
targetfor samples with known values
- GroupedPseudobulk.prepare_anndata(adata)#
fit-like method: prepare adata for the analysis
- GroupedPseudobulk.to_adata(metadata=None, *args, **kwargs)#
Convert samples data to AnnData object
- Parameters:
metadata (Optional[pd.DataFrame] = None) – Metadata about samples to be added to .obs of AnnData object. Should contain samples in index
*args – Additional arguments to pass to calculate_distance_matrix method
**kwargs – Additional arguments to pass to calculate_distance_matrix method
- Returns:
-samples_adata (
AnnData) AnnData object with samples data