patpy.tl.GloScope#

class patpy.tl.GloScope(sample_key, cell_group_key=None, layer=None, seed=67, dist_mat='KL', dens='KNN', k=25, n_workers=1)#

A class that loads a file to R using rpy2 and follows the same interface as other SampleRepresentation methods

Methods table#

calculate_distance_matrix([force])

Calculate distances between samples represented as GloScope embeddings

embed([method, n_jobs, verbose])

Embed distances into 2-D coordinates.

evaluate_representation(target[, method, ...])

Evaluate representation of target for the given distance matrix

fit_linear_probe(target[, task, test_size, ...])

Fit a linear probe on top of sample embeddings.

plot_clustermap([metadata_cols, figsize])

Plot a hierarchically-clustered heat-map of the distance matrix.

plot_embedding([method, metadata_cols, ...])

Plot a 2-D embedding of distances, optionally coloured by metadata.

plot_metadata_distribution(metadata_columns, ...)

Predict metadata columns, and plot embeddings colorised by metadata values

predict_metadata(target[, metadata, ...])

Predict classes from metadata column target for samples using K-Nearest Neighbors classifier

prepare_anndata(adata)

Prepare anndata for GloScope calculation

to_adata([metadata])

Convert samples data to AnnData object

Methods#

GloScope.calculate_distance_matrix(force=False)#

Calculate distances between samples represented as GloScope embeddings

GloScope.embed(method='UMAP', n_jobs=-1, verbose=False)#

Embed distances into 2-D coordinates.

Parameters:
  • distances – Square distance matrix of shape (n_samples, n_samples).

  • method (str (default: 'UMAP')) – One of "MDS", "TSNE", "UMAP".

  • n_jobs (int (default: -1)) – Number of parallel threads (-1 = all).

  • verbose (bool (default: False)) – Print progress information.

Return type:

ndarray

Returns:

-coordinates (ndarray) Array of shape (n_samples, 2).

GloScope.evaluate_representation(target, method='knn', metadata=None, num_donors_subset=None, proportion_donors_subset=None, **parameters)#

Evaluate representation of target for the given distance matrix

Parameters:
  • target ("str") – A sample-level covariate to evaluate representation for

  • method (Literal['knn', 'distances', 'proportions', 'silhouette', 'persistence', 'permanova'] (default: 'knn')) –

    Method to use for evaluation:

    • knn: predict values of target using K-nearest neighbors and evaluate the prediction

    • distances: test if distances between samples are significantly different from the null distribution

    • proportions: test if distribution of target differs between groups (e.g. clusters)

    • silhouette: calculate silhouette score for the given distances

  • num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.

  • proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.

  • parameters (dict) –

    Parameters for the evaluation method. The following parameters are used:

    • knn:
      • n_neighbors: number of neighbors to use for prediction

      • task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of predict_knn for more information

    • distances:
      • control_level: value of target that should be used as a control group

      • normalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of test_distances_significance for more information

      • n_bootstraps: number of bootstrap iterations to use

      • trimmed_fraction: fraction of the most extreme values to remove from the distribution

      • compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio

    • proportions:
      • groups: groups (e.g. cluster numbers) of the observations

Returns:

-result (dict) Result of evaluation with the following keys:

  • score: a number evaluating the representation. The higher the better

  • metric: name of the metric used for evaluation

  • n_unique: number of unique values in target

  • n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs)

  • method: name of the method used for evaluation

There are other optional keys depending on the method used for evaluation.

GloScope.fit_linear_probe(target, task='classification', test_size=0.2, random_state=42, test_sample_labels=None)#

Fit a linear probe on top of sample embeddings.

Parameters:
  • target (str) – Column in self.adata.obs to predict.

  • task (Literal['classification', 'regression'] (default: 'classification')) – "classification" or "regression".

  • test_size (float (default: 0.2)) – Fraction of donors held out for evaluation when test_sample_labels is not provided.

  • random_state (int (default: 42)) – Random seed for the train/test split (used only when test_sample_labels is not provided).

  • test_sample_labels (list | None (default: None)) – Explicit list of sample labels (index values of sample_representation) to use as the test set. When provided, test_size and random_state are ignored. When None, a random split is performed and the chosen test labels are stored in test_sample_labels for reproducibility.

Return type:

dict

Returns:

dict Keys: "model", "test_sample_labels", "{target}_test", "{target}_pred".

For classification: additionally "accuracy" and "f1". For regression: additionally "r2" and "pearson".

Examples

>>> result = model.fit_linear_probe(target="age", task="regression")
>>> print(f"Pearson r = {result['pearson']:.3f}")
GloScope.plot_clustermap(metadata_cols=None, figsize=(10, 12), *args, **kwargs)#

Plot a hierarchically-clustered heat-map of the distance matrix.

Parameters:
Returns:

seaborn.matrix.ClusterGrid

GloScope.plot_embedding(method='UMAP', metadata_cols=None, continuous_palette='viridis', categorical_palette='tab10', na_color='lightgray', axes=None, use_uns_colors=True, color_key_suffix='_colors', show_legend=True)#

Plot a 2-D embedding of distances, optionally coloured by metadata.

Parameters:
  • method (str (default: 'UMAP')) – Embedding method. One of "MDS", "TSNE", "UMAP".

  • metadata_cols (list[str] | None (default: None)) – Columns from .obs used for colouring.

  • continuous_palette (str (default: 'viridis')) – Seaborn palette names for continuous / categorical metadata.

  • categorical_palette (str (default: 'tab10')) – Seaborn palette names for continuous / categorical metadata.

  • na_color (str (default: 'lightgray')) – Colour used for samples with missing metadata values.

  • axes (default: None) – Existing matplotlib Axes (or array of Axes) to plot into.

  • use_uns_colors (bool (default: True)) – If True, look for colors in adata.uns[f'{col}{color_key_suffix}'] and use them if available (similar to scanpy).

  • color_key_suffix (str (default: '_colors')) – Suffix for the color key in adata.uns. Default is "_colors". For example, with suffix "_colors", will look for adata.uns['cell_type_colors'].

  • show_legend (bool (default: True)) – If True, display the legend. If False, hide it.

Returns:

matplotlib Axes or array of Axes

GloScope.plot_metadata_distribution(metadata_columns, tasks, method='knn', embedding='UMAP', metadata=None, metric_threshold=0.4)#

Predict metadata columns, and plot embeddings colorised by metadata values

Parameters:
  • metadata_columns (list[str]) – List of metadata columns to show

  • tasks (list[str]) – Tasks for each metadata column (classification, ranking or regression). Can be one string for all columns.

  • method (Literal['knn', 'distances', 'proportions', 'silhouette', 'persistence', 'permanova'] (default: 'knn')) – Method to use for evaluation. See documentation of evaluate_representation for more information

  • embedding (str (default: 'UMAP')) – Embedding to use for plotting

  • metric_threshold (float = 0.3) – Results with lower values than this metric will not be displayed

GloScope.predict_metadata(target, metadata=None, n_neighbors=3, task='classification')#

Predict classes from metadata column target for samples using K-Nearest Neighbors classifier

Parameters:
  • target (str) – Column name from adata.obs, which will be used for classification

  • metadata (Optional[pd.DataFrame] = None) – Table with metadata about samples. Index should contain samples. If None, adata.obs is used

  • n_neighbors (int (default: 3)) – Number of neighbors to use for classification

  • task (str = "classification")

Returns:

y_truearray-like

True values of target from metadata for samples with known values

y_predictedarray-like

Predicted values of target for samples with known values

GloScope.prepare_anndata(adata)#

Prepare anndata for GloScope calculation

GloScope.to_adata(metadata=None, *args, **kwargs)#

Convert samples data to AnnData object

Parameters:
  • metadata (DataFrame (default: None)) – Metadata about samples to be added to .obs of AnnData object. Should contain samples in index

  • *args – Additional arguments to pass to calculate_distance_matrix method

  • **kwargs – Additional arguments to pass to calculate_distance_matrix method

Returns:

-samples_adata (AnnData) AnnData object with samples data