patpy.tl.evaluate_representation#
- patpy.tl.evaluate_representation(distances, target, method='knn', num_donors_subset=None, proportion_donors_subset=None, **parameters)#
Evaluate representation of
targetfor the given distance matrix- Parameters:
distances (square matrix) – Matrix of distances between samples
target (array-like) – Vector with the values of a feature for each sample
method (
Literal['knn','distances','proportions','silhouette','persistence','permanova'] (default:'knn')) – Method to use for evaluation: - knn: predict values oftargetusing K-nearest neighbors and evaluate the prediction - distances: test if distances between samples are significantly different from the null distribution - proportions: test if distribution oftargetdiffers between groups (e.g. clusters) - silhouette: calculate silhouette score usingsklearn.metrics.silhouette_score()withmetric='precomputed'- persistence: calculate the persistence of connected components in filtration of a kNN graph based on the values oftarget- permanova: PERMANOVA pseudo-F on the distance matrix for a categoricaltarget(single factor, as invegan::adonis2). Largerscoremeans stronger separationnum_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.
proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.
parameters (dict) –
Parameters for the evaluation method. The following parameters are used: - knn:
n_neighbors: number of neighbors to use for prediction
task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of
predict_knnfor more information
- distances:
control_level: value of
targetthat should be used as a control groupnormalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of
test_distances_significancefor more informationn_bootstraps: number of bootstrap iterations to use
trimmed_fraction: fraction of the most extreme values to remove from the distribution
compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio
- proportions:
groups: groups (e.g. cluster numbers) of the observations
- persistence:
max_feature_difference: maximum difference in the feature values allowed between connected nodes
n_neighbors: number of neighbors to use for constructing the kNN graph
- permanova:
permutations: permutation count for the p-value (default 999). Use
0to skip permutations (p-valuenan)random_state: seed or
numpy.random.Generatorfor permutations
- Returns:
-result (
dict) Result of evaluation with the following keys: - score: a number evaluating the representation. The higher the better - metric: name of the metric used for evaluation - n_unique: number of unique values intarget- n_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs) - method: name of the method used for evaluation There are other optional keys depending on the method used for evaluation.