patpy.tl.RandomVector#

class patpy.tl.RandomVector(sample_key, cell_group_key, latent_dim=30, seed=67)#

A dummy baseline, which represents samples as random embeddings

Attributes table#

Methods table#

calculate_distance_matrix([force])

Calculate distances between samples represented as random vectors

embed([method, n_jobs, verbose])

Convert distances to embedding of the samples

evaluate_representation(target[, method, ...])

Evaluate representation of target for the given distance matrix

plot_clustermap([metadata_cols, figsize])

Plot a clusterized heatmap of distances

plot_embedding([method, metadata_cols, ...])

Plot embedding of samples colored by metadata_cols

plot_metadata_distribution(metadata_columns, ...)

Predict metadata columns, and plot embeddings colorised by metadata values

predict_metadata(target[, metadata, ...])

Predict classes from metadata column target for samples using K-Nearest Neighbors classifier

prepare_anndata(adata)

fit-like method: prepare adata for the analysis

to_adata([metadata])

Convert samples data to AnnData object

Attributes#

RandomVector.DISTANCES_UNS_KEY = 'X_random_vector_distances'#

Methods#

RandomVector.calculate_distance_matrix(force=False)#

Calculate distances between samples represented as random vectors

RandomVector.embed(method='UMAP', n_jobs=-1, verbose=False)#

Convert distances to embedding of the samples

Parameters:
  • method (str = "TSNE) – Method to use for embedding. Currently, “TSNE” and “MDS” are supported

  • n_jobs (int = 1) – Number of threads to use for computation. Use -1 to run on all processors

  • verbose (bool = False) – If True, print logging information during the computation

Returns:

coordinates : array-like Coordinates of samples in the embedding space. 2D for TSNE and MDS

RandomVector.evaluate_representation(target, method='knn', metadata=None, num_donors_subset=None, proportion_donors_subset=None, **parameters)#

Evaluate representation of target for the given distance matrix

Parameters:
  • target ("str") – A sample-level covariate to evaluate representation for

  • method (Literal["knn", "distances", "proportions", "silhouette"]) –

    Method to use for evaluation:

    • knn: predict values of target using K-nearest neighbors and evaluate the prediction

    • distances: test if distances between samples are significantly different from the null distribution

    • proportions: test if distribution of target differs between groups (e.g. clusters)

    • silhouette: calculate silhouette score for the given distances

  • num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.

  • proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.

  • parameters (dict) –

    Parameters for the evaluation method. The following parameters are used:

    • knn:
      • n_neighbors: number of neighbors to use for prediction

      • task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of predict_knn for more information

    • distances:
      • control_level: value of target that should be used as a control group

      • normalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of test_distances_significance for more information

      • n_bootstraps: number of bootstrap iterations to use

      • trimmed_fraction: fraction of the most extreme values to remove from the distribution

      • compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio

    • proportions:
      • groups: groups (e.g. cluster numbers) of the observations

Returns:

-result (dict) Result of evaluation with the following keys:

  • score: a number evaluating the representation. The higher the better

  • metric: name of the metric used for evaluation

  • n_unique: number of unique values in target

  • n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs)

  • method: name of the method used for evaluation

There are other optional keys depending on the method used for evaluation.

RandomVector.plot_clustermap(metadata_cols=None, figsize=(10, 12), *args, **kwargs)#

Plot a clusterized heatmap of distances

RandomVector.plot_embedding(method='UMAP', metadata_cols=None, continuous_palette='viridis', categorical_palette='tab10', na_color='lightgray', axes=None)#

Plot embedding of samples colored by metadata_cols

RandomVector.plot_metadata_distribution(metadata_columns, tasks, method='knn', embedding='UMAP', metadata=None, metric_threshold=0.4)#

Predict metadata columns, and plot embeddings colorised by metadata values

Parameters:
  • metadata_columns (list) – List of metadata columns to show

  • tasks (list) – Tasks for each metadata column (classification, ranking or regression). Can be one string for all columns.

  • method (Literal["knn", "distances", "proportions", "silhouette"]) – Method to use for evaluation. See documentation of evaluate_representation for more information

  • embedding (str = "UMAP") – Embedding to use for plotting

  • metric_threshold (float = 0.3) – Results with lower values than this metric will not be displayed

RandomVector.predict_metadata(target, metadata=None, n_neighbors=3, task='classification')#

Predict classes from metadata column target for samples using K-Nearest Neighbors classifier

Parameters:
  • target (str) – Column name from adata.obs, which will be used for classification

  • metadata (Optional[pd.DataFrame] = None) – Table with metadata about samples. Index should contain samples. If None, adata.obs is used

  • n_neighbors (int = 3) – Number of neighbors to use for classification

  • task (str = "classification")

Returns:

y_truearray-like

True values of target from metadata for samples with known values

y_predictedarray-like

Predicted values of target for samples with known values

RandomVector.prepare_anndata(adata)#

fit-like method: prepare adata for the analysis

RandomVector.to_adata(metadata=None, *args, **kwargs)#

Convert samples data to AnnData object

Parameters:
  • metadata (Optional[pd.DataFrame] = None) – Metadata about samples to be added to .obs of AnnData object. Should contain samples in index

  • *args – Additional arguments to pass to calculate_distance_matrix method

  • **kwargs – Additional arguments to pass to calculate_distance_matrix method

Returns:

-samples_adata (AnnData) AnnData object with samples data