PermutationImportanceDisplay#

class skore.PermutationImportanceDisplay(*, importances, report_type)[source]#

Display to inspect feature importance via feature permutation.

Parameters:
importancespd.DataFrame

The importances computed after permuting the input features. The columns are:

  • estimator

  • data_source

  • metric

  • split

  • feature

  • label or output (classification vs. regression)

  • repetition

  • value

report_type{“estimator”, “cross-validation”, “comparison-estimator”, “comparison-cross-validation”}

Report type from which the display is created.

Attributes:
facet_seaborn FacetGrid

FacetGrid containing the permutation importance.

figure_matplotlib Figure

Figure containing the permutation importance.

ax_matplotlib Axes

Axes with permutation importance.

frame(*, metric=None, aggregate=('mean', 'std'), level='splits', select_k=None, sorting_order=None)[source]#

Get the feature importance in a dataframe format.

Parameters:
metricstr or list of str, default=None

Filter the importances by metric. If None, all importances associated with each metric are returned.

aggregate{“mean”, “std”}, (“mean”, std) or None, default=(“mean”, “std”)

How to aggregate the importances. Applied on repetitions or on repetitions then splits, for (comparisons of) cross validation reports and depending on the value of level.

level{“splits”, “repetitions”}, default=”splits”

Over which dimensions to aggregate when aggregate is not None. "repetitions" aggregates only over repetitions (keeps split for cross-validation). "splits" aggregates over repetitions then over splits. Only relevant when aggregate is not None and the report is a CrossValidationReport or a ComparisonReport containing such type of report.

select_kint, default=None

Select features by importance:

  • Positive values: the select_k features with largest importance

  • Negative values: the -select_k features with smallest importance

Selection is performed independently within each group (estimator, and per label/output if applicable). For cross-validation, features are ranked by mean importance across splits. When aggregate is None, ranking uses mean importance per feature over repetitions (and splits); all repetition/split rows are kept for the selected features.

sorting_order{“descending”, “ascending”, None}, default=None

Sort features by importance: “descending” (most important first), “ascending” (least important first), or None to preserve original order. Can be used independently of select_k. When aggregate is None, ordering uses mean importance per feature over repetitions (and splits).

Returns:
pd.DataFrame

The feature importances. The columns depend on the report type and parameters, and include:

  • data_source: Data source used to compute the importances ("train" or "test").

  • metric: Metric used to compute the importances.

  • feature: Feature name.

  • value_mean and value_std: Aggregated importance values (only when aggregate is not None).

  • value: Raw importance value per repetition (only when aggregate is None).

  • estimator: Name of the estimator (for comparison reports).

  • split: Cross-validation split index (for cross-validation reports, only when aggregate is None).

  • label: Class label (for multiclass classification).

  • output: Output index (for multi-output regression).

help()[source]#

Display display help using rich or HTML.

plot(*, metric=None, subplot_by='auto', select_k=None, sorting_order=None)[source]#

Plot the permutation importance.

Parameters:
metricstr or None, default=None

Metric to plot. Required when multiple metrics are computed.

subplot_bystr or None, default=”auto”

Column to use for subplotting. The possible values are:

  • if "auto", depending of the information available, a meaningful decision is made to create subplots.

  • if a string, the corresponding column of the dataframe is used to create several subplots. Those plots will be organized in a grid of a single row and several columns.

  • if None, all information is plotted on a single plot. An error is raised if there is too much information to plot on a single plot.

select_kint, default=None

If set, only the top (positive) or bottom (negative) k features by importance are shown. See frame() for details.

sorting_order{“descending”, “ascending”, None}, default=None

Sort features by importance before plotting. See frame() for details.

set_style(*, policy='update', boxplot_kwargs=None, stripplot_kwargs=None)[source]#

Set the style parameters for the display.

Parameters:
policy{“override”, “update”}, default=”update”

Policy to use when setting the style parameters. If “override”, existing settings are set to the provided values. If “update”, existing settings are not changed; only settings that were previously unset are changed.

boxplot_kwargsdict, default=None

Keyword arguments to be passed to seaborn.boxplot() for rendering the importances with a EstimatorReport.

stripplot_kwargsdict, default=None

Keyword arguments to be passed to seaborn.stripplot() for rendering the importances with a EstimatorReport.

Returns:
None
Raises:
ValueError

If a style parameter is unknown.

static style_plot(plot_func)[source]#

Apply consistent style to skore displays.

This decorator: 1. Applies default style settings 2. Executes plot_func 3. Calls plt.tight_layout() to make sure axis does not overlap 4. Restores the original style settings

Parameters:
plot_funccallable

The plot function to be decorated.

Returns:
callable

The decorated plot function.