patpy.tl.MOFA#
- class patpy.tl.MOFA(sample_key, cell_group_key, layer=None, seed=67, n_factors=10, aggregate_cell_types=True, aggregation_mode='mean', scale_views=False, scale_groups=False, center_groups=True, use_float32=False, ard_factors=False, ard_weights=True, spikeslab_weights=True, spikeslab_factors=False, iterations=1000, convergence_mode='fast', startELBO=1, freqELBO=1, gpu_mode=False, gpu_device=None, verbose=False, quiet=False, outfile=None, save_interrupted=False)#
Patient representation using MOFA2 model, treating patients as samples with optional cell type views.
- Parameters:
sample_key (
str) – Column in.obscontaining sample (patient) IDs.cell_group_key (
str) – Column in.obscontaining cell type information.layer (
str|None(default:None)) – Layer in AnnData to use for gene expression data. If None, uses.X.seed (
int(default:67)) – Random seed for reproducibility.n_factors (
int(default:10)) – Number of latent factors to learn.aggregate_cell_types (
bool(default:True)) – If True, treat each cell type as a separate view. If False, aggregate gene expression across all cell types into a single view.aggregation_mode (
str(default:'mean')) – Name of the aggregation function to use (e.g., ‘mean’, ‘median’, ‘sum’)scale_views (
bool(default:False)) – Scale each view to unit variance.scale_groups (
bool(default:False)) – Scale each group to unit variance.center_groups (
bool(default:True)) – Center each group.use_float32 (
bool(default:False)) – Use 32-bit floating point precision.ard_factors (
bool(default:False)) – Use Automatic Relevance Determination (ARD) prior on factors.ard_weights (
bool(default:True)) – Use ARD prior on weights.spikeslab_weights (
bool(default:True)) – Use spike-and-slab prior on weights.spikeslab_factors (
bool(default:False)) – Use spike-and-slab prior on factors.iterations (
int(default:1000)) – Maximum number of training iterations.convergence_mode (
str(default:'fast')) – Convergence speed mode.startELBO (
int(default:1)) – Iteration number to start computing the Evidence Lower Bound (ELBO).freqELBO (
int(default:1)) – Frequency of ELBO computation afterstartELBO.gpu_mode (
bool(default:False)) – Use GPU for training.gpu_device (
int|None(default:None)) – GPU device ID to use.verbose (
bool(default:False)) – Verbose output during training.quiet (
bool(default:False)) – Suppress training output.outfile (
str|None(default:None)) – Path to save the trained model.save_interrupted (
bool(default:False)) – Save the model if training is interrupted.
Methods table#
|
Calculate distances between patients using MOFA2 latent factors. |
|
Embed distances into 2-D coordinates. |
|
Evaluate representation of |
|
Fit a linear probe on top of sample embeddings. |
|
Plot a hierarchically-clustered heat-map of the distance matrix. |
|
Plot a 2-D embedding of distances, optionally coloured by metadata. |
|
Predict metadata columns, and plot embeddings colorised by metadata values |
|
Predict classes from metadata column |
|
Prepare AnnData for MOFA2, optionally treating cell types as separate views. |
|
Convert samples data to AnnData object |
Methods#
- MOFA.calculate_distance_matrix(force=False, store_weights=False, dist='euclidean')#
Calculate distances between patients using MOFA2 latent factors.
- Parameters:
force (bool = False) – If True, recalculate the distance matrix even if it exists.
store_weights (bool, default: False) – If True, store the weights (relation of factors to genes) in
self.adata.uns.
- Returns:
-distances (
ndarray) Matrix of distances between patients.
- MOFA.embed(method='UMAP', n_jobs=-1, verbose=False)#
Embed distances into 2-D coordinates.
- Parameters:
- Return type:
- Returns:
-coordinates (
ndarray) Array of shape(n_samples, 2).
- MOFA.evaluate_representation(target, method='knn', metadata=None, num_donors_subset=None, proportion_donors_subset=None, **parameters)#
Evaluate representation of
targetfor the given distance matrix- Parameters:
target ("str") – A sample-level covariate to evaluate representation for
method (
Literal['knn','distances','proportions','silhouette','persistence','permanova'] (default:'knn')) –Method to use for evaluation:
knn: predict values of
targetusing K-nearest neighbors and evaluate the predictiondistances: test if distances between samples are significantly different from the null distribution
proportions: test if distribution of
targetdiffers between groups (e.g. clusters)silhouette: calculate silhouette score for the given distances
num_donors_subset (int, optional) – Absolute number of donors to include in the evaluation.
proportion_donors_subset (float, optional) – Proportion of donors to include in the evaluation.
parameters (dict) –
Parameters for the evaluation method. The following parameters are used:
- knn:
n_neighbors: number of neighbors to use for prediction
task: type of prediction task. One of “classification”, “regression”, “ranking”. See documentation of
predict_knnfor more information
- distances:
control_level: value of
targetthat should be used as a control groupnormalization_type: type of normalization to use. One of “total”, “shift”, “var”. See documentation of
test_distances_significancefor more informationn_bootstraps: number of bootstrap iterations to use
trimmed_fraction: fraction of the most extreme values to remove from the distribution
compare_by_difference: if True, normalization is defined as difference (as in the original paper). Otherwise, it is defined as a ratio
- proportions:
groups: groups (e.g. cluster numbers) of the observations
- Returns:
-result (
dict) Result of evaluation with the following keys:score: a number evaluating the representation. The higher the better
metric: name of the metric used for evaluation
n_unique: number of unique values in
targetn_observations: number of observations used for evaluation. Can be different for different targets, even within one dataset (because of NAs)
method: name of the method used for evaluation
There are other optional keys depending on the method used for evaluation.
- MOFA.fit_linear_probe(target, task='classification', test_size=0.2, random_state=42, test_sample_labels=None)#
Fit a linear probe on top of sample embeddings.
- Parameters:
target (
str) – Column inself.adata.obsto predict.task (
Literal['classification','regression'] (default:'classification')) –"classification"or"regression".test_size (
float(default:0.2)) – Fraction of donors held out for evaluation whentest_sample_labelsis not provided.random_state (
int(default:42)) – Random seed for the train/test split (used only whentest_sample_labelsis not provided).test_sample_labels (
list|None(default:None)) – Explicit list of sample labels (index values ofsample_representation) to use as the test set. When provided,test_sizeandrandom_stateare ignored. WhenNone, a random split is performed and the chosen test labels are stored intest_sample_labelsfor reproducibility.
- Return type:
- Returns:
dict Keys:
"model","test_sample_labels","{target}_test","{target}_pred".For classification: additionally
"accuracy"and"f1". For regression: additionally"r2"and"pearson".
Examples
>>> result = model.fit_linear_probe(target="age", task="regression") >>> print(f"Pearson r = {result['pearson']:.3f}")
- MOFA.plot_clustermap(metadata_cols=None, figsize=(10, 12), *args, **kwargs)#
Plot a hierarchically-clustered heat-map of the distance matrix.
- Parameters:
metadata_cols (list[str] or None) –
.obscolumns to annotate the heat-map.figsize (tuple)
*args – Passed to
calculate_distance_matrix().**kwargs – Passed to
calculate_distance_matrix().
- Returns:
seaborn.matrix.ClusterGrid
- MOFA.plot_embedding(method='UMAP', metadata_cols=None, continuous_palette='viridis', categorical_palette='tab10', na_color='lightgray', axes=None, use_uns_colors=True, color_key_suffix='_colors', show_legend=True)#
Plot a 2-D embedding of distances, optionally coloured by metadata.
- Parameters:
method (
str(default:'UMAP')) – Embedding method. One of"MDS","TSNE","UMAP".metadata_cols (
list[str] |None(default:None)) – Columns from.obsused for colouring.continuous_palette (
str(default:'viridis')) – Seaborn palette names for continuous / categorical metadata.categorical_palette (
str(default:'tab10')) – Seaborn palette names for continuous / categorical metadata.na_color (
str(default:'lightgray')) – Colour used for samples with missing metadata values.axes (default:
None) – Existing matplotlib Axes (or array of Axes) to plot into.use_uns_colors (
bool(default:True)) – IfTrue, look for colors inadata.uns[f'{col}{color_key_suffix}']and use them if available (similar to scanpy).color_key_suffix (
str(default:'_colors')) – Suffix for the color key inadata.uns. Default is"_colors". For example, with suffix"_colors", will look foradata.uns['cell_type_colors'].show_legend (
bool(default:True)) – IfTrue, display the legend. IfFalse, hide it.
- Returns:
matplotlib Axes or array of Axes
- MOFA.plot_metadata_distribution(metadata_columns, tasks, method='knn', embedding='UMAP', metadata=None, metric_threshold=0.4)#
Predict metadata columns, and plot embeddings colorised by metadata values
- Parameters:
metadata_columns (
list[str]) – List of metadata columns to showtasks (
list[str]) – Tasks for each metadata column (classification, ranking or regression). Can be one string for all columns.method (
Literal['knn','distances','proportions','silhouette','persistence','permanova'] (default:'knn')) – Method to use for evaluation. See documentation ofevaluate_representationfor more informationembedding (
str(default:'UMAP')) – Embedding to use for plottingmetric_threshold (float = 0.3) – Results with lower values than this metric will not be displayed
- MOFA.predict_metadata(target, metadata=None, n_neighbors=3, task='classification')#
Predict classes from metadata column
targetfor samples using K-Nearest Neighbors classifier- Parameters:
target (str) – Column name from
adata.obs, which will be used for classificationmetadata (Optional[pd.DataFrame] = None) – Table with metadata about samples. Index should contain samples. If None,
adata.obsis usedn_neighbors (
int(default:3)) – Number of neighbors to use for classificationtask (str = "classification")
- Returns:
- y_truearray-like
True values of
targetfrom metadata for samples with known values- y_predictedarray-like
Predicted values of
targetfor samples with known values
- MOFA.prepare_anndata(adata)#
Prepare AnnData for MOFA2, optionally treating cell types as separate views.
- Parameters:
adata (AnnData) – Annotated data matrix
- MOFA.to_adata(metadata=None, *args, **kwargs)#
Convert samples data to AnnData object
- Parameters:
metadata (
DataFrame(default:None)) – Metadata about samples to be added to .obs of AnnData object. Should contain samples in index*args – Additional arguments to pass to calculate_distance_matrix method
**kwargs – Additional arguments to pass to calculate_distance_matrix method
- Returns:
-samples_adata (
AnnData) AnnData object with samples data