patpy.tl.evaluate_representation

patpy.tl.evaluate_representation#

patpy.tl.evaluate_representation(distances, target, method='knn', num_donors_subset=None, proportion_donors_subset=None, **parameters)#

Evaluate representation of target for the given distance matrix

Parameters:

distances (square matrix) – Matrix of distances between samples
target (array-like) – Vector with the values of a feature for each sample
method (Literal['knn', 'distances', 'proportions', 'silhouette', 'persistence', 'permanova'] (default: 'knn')) – Method to use for evaluation: - knn: predict values of target using K-nearest neighbors and evaluate the prediction - distances: test if distances between samples are significantly different from the null distribution - proportions: test if distribution of target differs between groups (e.g. clusters) - silhouette: calculate silhouette score using sklearn.metrics.silhouette_score() with metric='precomputed' - persistence: calculate the persistence of connected components in filtration of a kNN graph based on the values of target - permanova: PERMANOVA pseudo-F on the distance matrix for a categorical target (single factor, as in vegan::adonis2). Larger score means stronger separation
num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.
proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.
parameters (dict) –
Parameters for the evaluation method. The following parameters are used: - knn:
- n_neighbors: number of neighbors to use for prediction
- task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of predict_knn for more information
- distances:
  - control_level: value of target that should be used as a control group
  - normalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of test_distances_significance for more information
  - n_bootstraps: number of bootstrap iterations to use
  - trimmed_fraction: fraction of the most extreme values to remove from the distribution
  - compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio
- proportions:
  - groups: groups (e.g. cluster numbers) of the observations
- persistence:
  - max_feature_difference: maximum difference in the feature values allowed between connected nodes
  - n_neighbors: number of neighbors to use for constructing the kNN graph
- permanova:
  - permutations: permutation count for the p-value (default 999). Use 0 to skip permutations (p-value nan)
  - random_state: seed or numpy.random.Generator for permutations

Returns:

-result (dict) Result of evaluation with the following keys: - score: a number evaluating the representation. The higher the better - metric: name of the metric used for evaluation - n_unique: number of unique values in target - n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs) - method: name of the method used for evaluation There are other optional keys depending on the method used for evaluation.

patpy.tl.evaluate_representation

Contents

patpy.tl.evaluate_representation#