patpy.tl.evaluate_representation

patpy.tl.evaluate_representation#

patpy.tl.evaluate_representation(distances, target, method='knn', num_donors_subset=None, proportion_donors_subset=None, **parameters)#

Evaluate representation of target for the given distance matrix

Parameters:
  • distances (square matrix) – Matrix of distances between samples

  • target (array-like) – Vector with the values of a feature for each sample

  • method (Literal['knn', 'distances', 'proportions', 'silhouette', 'persistence', 'permanova'] (default: 'knn')) – Method to use for evaluation: - knn: predict values of target using K-nearest neighbors and evaluate the prediction - distances: test if distances between samples are significantly different from the null distribution - proportions: test if distribution of target differs between groups (e.g. clusters) - silhouette: calculate silhouette score using sklearn.metrics.silhouette_score() with metric='precomputed' - persistence: calculate the persistence of connected components in filtration of a kNN graph based on the values of target - permanova: PERMANOVA pseudo-F on the distance matrix for a categorical target (single factor, as in vegan::adonis2). Larger score means stronger separation

  • num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.

  • proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.

  • parameters (dict) –

    Parameters for the evaluation method. The following parameters are used: - knn:

    • n_neighbors: number of neighbors to use for prediction

    • task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of predict_knn for more information

    • distances:
      • control_level: value of target that should be used as a control group

      • normalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of test_distances_significance for more information

      • n_bootstraps: number of bootstrap iterations to use

      • trimmed_fraction: fraction of the most extreme values to remove from the distribution

      • compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio

    • proportions:
      • groups: groups (e.g. cluster numbers) of the observations

    • persistence:
      • max_feature_difference: maximum difference in the feature values allowed between connected nodes

      • n_neighbors: number of neighbors to use for constructing the kNN graph

    • permanova:
      • permutations: permutation count for the p-value (default 999). Use 0 to skip permutations (p-value nan)

      • random_state: seed or numpy.random.Generator for permutations

Returns:

-result (dict) Result of evaluation with the following keys: - score: a number evaluating the representation. The higher the better - metric: name of the metric used for evaluation - n_unique: number of unique values in target - n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs) - method: name of the method used for evaluation There are other optional keys depending on the method used for evaluation.