permutation_importance#
- CrossValidationReport.inspection.permutation_importance(*, data_source='test', at_step=0, metric=None, n_repeats=5, max_samples=1.0, n_jobs=None, seed=None)[source]#
Display the permutation feature importance.
This computes the permutation importance using sklearn’s
permutation_importance()function, which consists in permuting the values of one feature and comparing the value ofmetricbetween with and without the permutation, which gives an indication on the impact of the feature.By default,
seedis set toNone, which means the function will return a different result at every call. In that case, the results are not cached. If you wish to take advantage of skore’s caching capabilities, make sure you set theseedparameter.- Parameters:
- data_source{“test”, “train”}, default=”test”
The data source to use.
“test” : use the test set provided when creating the report.
“train” : use the train set provided when creating the report.
- at_stepint or str, default=0
If the estimator is a
Pipeline, at which step of the pipeline the importance is computed. Ifn, then the features that are evaluated are the ones right before then-th step of the pipeline. For instance,If 0, compute the importance just before the start of the pipeline (i.e. the importance of the raw input features).
If -1, compute the importance just before the end of the pipeline (i.e. the importance of the fully engineered features, just before the actual prediction step).
If a string, will be searched among the pipeline’s
named_steps.Has no effect if the estimator is not a
Pipeline.- metricstr, callable, scorer, or list of such instances or dict of such instances, default=None
The metric to pass to
permutation_importance(). The possible values (whether or not in a list) are:if a string, either one of the built-in metrics or a scikit-learn scorer name. You can get the possible list of string using
report.metrics.help()orsklearn.metrics.get_scorer_names()for the built-in metrics or the scikit-learn scorers, respectively.if a callable, it should take as arguments
y_true,y_predas the two first arguments. Additional arguments can be passed as keyword arguments and will be forwarded withmetric_kwargs. No favorability indicator can be displayed in this case.if the callable API is too restrictive (e.g. need to pass same parameter name with different values), you can use scikit-learn scorers as provided by
sklearn.metrics.make_scorer(). In this case, the metric favorability will only be displayed if it is given explicitly viamake_scorer’sgreater_is_betterparameter.
- n_repeatsint, default=5
Number of times to permute a feature.
- max_samplesint or float, default=1.0
The number of samples to draw from
Xto compute feature importance in each repeat (without replacement).If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples.
If max_samples is equal to 1.0 or X.shape[0], all samples will be used.
While using this option may provide less accurate importance estimates, it keeps the method tractable when evaluating feature importance on large datasets. In combination with n_repeats, this allows to control the computational speed vs statistical accuracy trade-off of this method.
- n_jobsint or None, default=None
Number of jobs to run in parallel. -1 means using all processors.
- seedint or None, default=None
The seed used to initialize the random number generator used for the permutations.
- Returns:
PermutationImportanceDisplayThe permutation importance display.
Notes
Even if pipeline components output sparse arrays, these will be made dense.
Examples
>>> from sklearn.datasets import make_regression >>> from sklearn.linear_model import Ridge >>> from skore import train_test_split >>> from skore import CrossValidationReport >>> X, y = make_regression(n_features=3, random_state=0) >>> report = CrossValidationReport(estimator=Ridge(), X=X, y=y, splitter=2) >>> report.inspection.permutation_importance( ... n_repeats=2, ... seed=0, ... ).frame(aggregate=None) data_source metric split feature repetition value 0 test r2 0 Feature #0 1 0.71... 1 test r2 0 Feature #1 1 1.59... 2 test r2 0 Feature #2 1 0.01... 3 test r2 0 Feature #0 2 0.70... 4 test r2 0 Feature #1 2 1.58... 5 test r2 0 Feature #2 2 0.01... 6 test r2 1 Feature #0 1 0.63... 7 test r2 1 Feature #1 1 1.82... 8 test r2 1 Feature #2 1 0.01... 9 test r2 1 Feature #0 2 0.49... 10 test r2 1 Feature #1 2 1.15... 11 test r2 1 Feature #2 2 0.01... >>> report.inspection.permutation_importance( ... metric=["r2", "neg_mean_squared_error"], ... n_repeats=2, ... seed=0, ... ).frame(aggregate=None) data_source metric ... repetition value 0 test r2 ... 1 0.71... 1 test r2 ... 1 1.59... 2 test r2 ... 1 0.01... 3 test r2 ... 2 0.70... 4 test r2 ... 2 1.58... 5 test r2 ... 2 0.01... ... >>> report.inspection.permutation_importance( ... n_repeats=2, ... seed=0, ... ).frame() data_source metric feature value_mean value_std 0 test r2 Feature #0 0.63... 0.10... 1 test r2 Feature #1 1.54... 0.07... 2 test r2 Feature #2 0.01... 0.00... >>> report.inspection.permutation_importance( ... n_repeats=2, ... seed=0, ... ).frame(level="repetitions") data_source metric split feature value_mean value_std 0 test r2 0 Feature #0 0.71... 0.00... 1 test r2 0 Feature #1 1.58... 0.00... 2 test r2 0 Feature #2 0.01... 0.00... 3 test r2 1 Feature #0 0.56... 0.09... 4 test r2 1 Feature #1 1.49... 0.47... 5 test r2 1 Feature #2 0.01... 0.00... >>> # Compute the importance at the end of feature engineering pipeline >>> from sklearn.pipeline import make_pipeline >>> from sklearn.preprocessing import StandardScaler >>> pipeline = make_pipeline(StandardScaler(), Ridge()) >>> pipeline_report = CrossValidationReport( ... estimator=pipeline, X=X, y=y, splitter=2 ... ) >>> pipeline_report.inspection.permutation_importance( ... n_repeats=2, ... seed=0, ... at_step=-1, ... ).frame() data_source metric feature value_mean value_std 0 test r2 x0 0.63... 0.10... 1 test r2 x1 1.53... 0.06... 2 test r2 x2 0.01... 0.00... >>> pipeline_report.inspection.permutation_importance( ... n_repeats=2, ... seed=0, ... at_step="ridge", ... ).frame() data_source metric feature value_mean value_std 0 test r2 x0 0.63... 0.10... 1 test r2 x1 1.53... 0.06... 2 test r2 x2 0.01... 0.00...