.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_evaluation/plot_estimator_report.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_evaluation_plot_estimator_report.py: .. _example_estimator_report: =============================================================== `EstimatorReport`: Get insights from any scikit-learn estimator =============================================================== This example shows how the :class:`skore.EstimatorReport` class can be used to quickly get insights from any scikit-learn estimator. .. GENERATED FROM PYTHON SOURCE LINES 13-19 Loading our dataset and defining our estimator ============================================== First, we load a dataset from skrub. Our goal is to predict if a healthcare manufacturing companies paid a medical doctors or hospitals, in order to detect potential conflict of interest. .. GENERATED FROM PYTHON SOURCE LINES 21-27 .. code-block:: Python from skrub.datasets import fetch_open_payments dataset = fetch_open_payments() df = dataset.X y = dataset.y .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading 'open_payments' from https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/open_payments.zip (attempt 1/3) .. GENERATED FROM PYTHON SOURCE LINES 28-32 .. code-block:: Python from skrub import TableReport TableReport(df) .. raw:: html

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 33-35 .. code-block:: Python TableReport(y.to_frame()) .. raw:: html

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 36-43 Looking at the distributions of the target, we observe that this classification task is quite imbalanced. It means that we have to be careful when selecting a set of statistical metrics to evaluate the classification performance of our predictive model. In addition, we see that the class labels are not specified by an integer 0 or 1 but instead by a string "allowed" or "disallowed". For our application, the label of interest is "allowed". .. GENERATED FROM PYTHON SOURCE LINES 43-45 .. code-block:: Python pos_label, neg_label = "allowed", "disallowed" .. GENERATED FROM PYTHON SOURCE LINES 46-53 Now, we need to define a predictive model. Thankfully, `skrub` provides a convenient function (:func:`skrub.tabular_pipeline`) when it comes to getting strong baseline predictive models with a single line of code. As its feature engineering is generic, it does not provide some handcrafted and tailored feature engineering but still provides a good starting point. So let's create a classifier for our task. .. GENERATED FROM PYTHON SOURCE LINES 53-58 .. code-block:: Python from skrub import tabular_pipeline estimator = tabular_pipeline("classifier") estimator .. raw:: html
Pipeline(steps=[('tablevectorizer',
                     TableVectorizer(low_cardinality=ToCategorical())),
                    ('histgradientboostingclassifier',
                     HistGradientBoostingClassifier())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 59-72 Getting insights from our estimator =================================== Introducing the :class:`skore.EstimatorReport` class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Now, we would be interested in getting some insights from our predictive model. One way is to use the :class:`skore.EstimatorReport` class which we will construct using the `evaluate` function. This function will detect that our estimator is unfitted and will fit it for us on the training data and return an :class:`~skore.EstimatorReport` object. Specifying a `splitter` of 0.2 will perform a 80/20 train-test split. .. GENERATED FROM PYTHON SOURCE LINES 72-78 .. code-block:: Python from skore import evaluate report = evaluate(estimator, X=df, y=y, pos_label=pos_label, splitter=0.2) report .. raw:: html
Pipeline(steps=[('tablevectorizer',
                     TableVectorizer(low_cardinality=ToCategorical())),
                    ('histgradientboostingclassifier',
                     HistGradientBoostingClassifier())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

1 issue(s), 1 tip(s), 3 passed, 6 not applicable, 0 ignored.


.. GENERATED FROM PYTHON SOURCE LINES 79-82 Once the report is created, we get some information regarding the available tools allowing us to get some insights from our specific model on our specific task by calling the :meth:`~skore.EstimatorReport.help` method. .. GENERATED FROM PYTHON SOURCE LINES 83-85 .. code-block:: Python report.help() .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 86-87 Be aware that we can access the help for each individual sub-accessor. For instance: .. GENERATED FROM PYTHON SOURCE LINES 88-90 .. code-block:: Python report.metrics.help() .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 91-99 Metrics computation with aggressive caching ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At this point, we might be interested to have a first look at the statistical performance of our model on the validation set that we provided. We can access it by calling any of the metrics displayed above. Since we are greedy, we want to get several metrics at once and we will use the :meth:`~skore.EstimatorReport.metrics.summarize` method. .. GENERATED FROM PYTHON SOURCE LINES 100-107 .. code-block:: Python import time start = time.time() metric_report = report.metrics.summarize().frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Score 0.951196
Accuracy 0.951196
Precision 0.728595
Recall 0.450549
ROC AUC 0.936815
Log loss 0.131858
Brier score 0.036981
Fit time (s) 5.248361
Predict time (s) 0.611445


.. GENERATED FROM PYTHON SOURCE LINES 108-110 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 111-118 An interesting feature provided by the :class:`skore.EstimatorReport` is the the caching mechanism. Indeed, when we have a large enough dataset, computing the predictions for a model is not cheap anymore. For instance, on our smallish dataset, it took a couple of seconds to compute the metrics. The report will cache the predictions and if we are interested in computing a metric again or an alternative metric that requires the same predictions, it will be faster. Let's check by requesting the same metrics report again. .. GENERATED FROM PYTHON SOURCE LINES 119-125 .. code-block:: Python start = time.time() metric_report = report.metrics.summarize().frame() end = time.time() metric_report .. raw:: html
HistGradientBoostingClassifier
Metric
Score 0.951196
Accuracy 0.951196
Precision 0.728595
Recall 0.450549
ROC AUC 0.936815
Log loss 0.131858
Brier score 0.036981
Fit time (s) 5.248361
Predict time (s) 0.611445


.. GENERATED FROM PYTHON SOURCE LINES 126-128 .. code-block:: Python print(f"Time taken to compute the metrics: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the metrics: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 129-131 Note that when the model is fitted or the predictions are computed, we additionally store the time the operation took: .. GENERATED FROM PYTHON SOURCE LINES 132-134 .. code-block:: Python report.metrics.timings() .. rst-class:: sphx-glr-script-out .. code-block:: none {'fit_time': 5.248360689999998, 'predict_time_train': 2.436770182999993, 'predict_time_test': 0.6114447749999954} .. GENERATED FROM PYTHON SOURCE LINES 135-137 Since we obtain a pandas dataframe, we can also use the plotting interface of pandas. .. GENERATED FROM PYTHON SOURCE LINES 138-141 .. code-block:: Python ax = metric_report.plot.barh() _ = ax.set_title("Metrics report") .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_001.png :alt: Metrics report :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 142-144 Whenever computing a metric, we check if the predictions are available in the cache and reload them if available. So for instance, let's compute the log loss. .. GENERATED FROM PYTHON SOURCE LINES 145-151 .. code-block:: Python start = time.time() log_loss = report.metrics.log_loss() end = time.time() log_loss .. rst-class:: sphx-glr-script-out .. code-block:: none 0.1318581181377569 .. GENERATED FROM PYTHON SOURCE LINES 152-154 .. code-block:: Python print(f"Time taken to compute the log loss: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the log loss: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 155-157 We can show that without initial cache, it would have taken more time to compute the log loss. .. GENERATED FROM PYTHON SOURCE LINES 158-165 .. code-block:: Python report.clear_cache() start = time.time() log_loss = report.metrics.log_loss() end = time.time() log_loss .. rst-class:: sphx-glr-script-out .. code-block:: none 0.1318581181377569 .. GENERATED FROM PYTHON SOURCE LINES 166-168 .. code-block:: Python print(f"Time taken to compute the log loss: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the log loss: 1.23 seconds .. GENERATED FROM PYTHON SOURCE LINES 169-172 By default, the metrics are computed on the test set only. However, if a training set is provided, we can also compute the metrics by specifying the `data_source` parameter. .. GENERATED FROM PYTHON SOURCE LINES 173-175 .. code-block:: Python report.metrics.log_loss(data_source="train") .. rst-class:: sphx-glr-script-out .. code-block:: none 0.09950479151151193 .. GENERATED FROM PYTHON SOURCE LINES 176-180 Be aware that we can also benefit from the caching mechanism with our own custom metrics. Skore only expects that we define our own metric function to take `y_true` and `y_pred` as the first two positional arguments. It can take any other arguments. Let's see an example. .. GENERATED FROM PYTHON SOURCE LINES 181-195 .. code-block:: Python def operational_decision_cost(y_true, y_pred, amount): mask_true_positive = (y_true == pos_label) & (y_pred == pos_label) mask_true_negative = (y_true == neg_label) & (y_pred == neg_label) mask_false_positive = (y_true == neg_label) & (y_pred == pos_label) mask_false_negative = (y_true == pos_label) & (y_pred == neg_label) fraudulent_refuse = mask_true_positive.sum() * 50 fraudulent_accept = -amount[mask_false_negative].sum() legitimate_refuse = mask_false_positive.sum() * -5 legitimate_accept = (amount[mask_true_negative] * 0.02).sum() return fraudulent_refuse + fraudulent_accept + legitimate_refuse + legitimate_accept .. GENERATED FROM PYTHON SOURCE LINES 196-200 In our use case, we have a operational decision to make that translate the classification outcome into a cost. It translate the confusion matrix into a cost matrix based on some amount linked to each sample in the dataset that are provided to us. Here, we randomly generate some amount as an illustration. .. GENERATED FROM PYTHON SOURCE LINES 201-212 .. code-block:: Python import numpy as np from sklearn.metrics import make_scorer rng = np.random.default_rng(42) amount = rng.integers(low=100, high=1000, size=len(report.y_test)) report.metrics.add(metric=make_scorer(operational_decision_cost, amount=amount)) cost = report.metrics.summarize(metric="operational_decision_cost") cost.frame() .. raw:: html
HistGradientBoostingClassifier
Metric
Operational Decision Cost -134634.96


.. GENERATED FROM PYTHON SOURCE LINES 213-216 By the way, skore caches the model predictions. It is really handy because it means that we can compute some additional metrics without having to recompute the the predictions. .. GENERATED FROM PYTHON SOURCE LINES 217-221 .. code-block:: Python report.metrics.summarize( metric=["precision", "recall", "operational_decision_cost"] ).frame() .. raw:: html
HistGradientBoostingClassifier
Metric
Precision 0.728595
Recall 0.450549
Operational Decision Cost -134634.960000


.. GENERATED FROM PYTHON SOURCE LINES 222-229 Effortless one-liner plotting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :class:`skore.EstimatorReport` class also implements a number of the most common data science plots. As for the metrics, we only provide the meaningful set of plots for the provided estimator. .. GENERATED FROM PYTHON SOURCE LINES 230-232 .. code-block:: Python report.metrics.help() .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 233-234 Let's start by plotting the ROC curve for our binary classification task. .. GENERATED FROM PYTHON SOURCE LINES 235-238 .. code-block:: Python display = report.metrics.roc() display.plot() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_002.png :alt: ROC Curve for HistGradientBoostingClassifier Positive label: allowed Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 239-243 The plot functionality is built upon the scikit-learn display objects. We return those display (slightly modified to improve the UI) in case we want to tweak some of the plot properties. We can have quick look at the available attributes and methods by calling the ``help`` method or simply by printing the display. .. GENERATED FROM PYTHON SOURCE LINES 244-246 .. code-block:: Python display.help() .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 247-251 .. code-block:: Python fig = display.plot() fig.axes[0].set_title("Example of a ROC curve") fig .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_003.png :alt: ROC Curve for HistGradientBoostingClassifier Positive label: allowed Data source: Test set, Example of a ROC curve :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 252-256 Similarly to the metrics, we aggressively use the caching to avoid recomputing the predictions of the model. We also cache the plot display object by detection if the input parameters are the same as the previous call. Let's demonstrate the kind of performance gain we can get. .. GENERATED FROM PYTHON SOURCE LINES 257-264 .. code-block:: Python start = time.time() # we already trigger the computation of the predictions in a previous call display = report.metrics.roc() fig = display.plot() end = time.time() fig .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_004.png :alt: ROC Curve for HistGradientBoostingClassifier Positive label: allowed Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 265-267 .. code-block:: Python print(f"Time taken to compute the ROC curve: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the ROC curve: 0.04 seconds .. GENERATED FROM PYTHON SOURCE LINES 268-269 Now, let's clean the cache and check if we get a slowdown. .. GENERATED FROM PYTHON SOURCE LINES 270-272 .. code-block:: Python report.clear_cache() .. GENERATED FROM PYTHON SOURCE LINES 273-279 .. code-block:: Python start = time.time() display = report.metrics.roc() fig = display.plot() end = time.time() fig .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_005.png :alt: ROC Curve for HistGradientBoostingClassifier Positive label: allowed Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 280-282 .. code-block:: Python print(f"Time taken to compute the ROC curve: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken to compute the ROC curve: 1.28 seconds .. GENERATED FROM PYTHON SOURCE LINES 283-284 As expected, since we need to recompute the predictions, it takes more time. .. GENERATED FROM PYTHON SOURCE LINES 286-291 Visualizing the confusion matrix ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another useful visualization for classification tasks is the confusion matrix, which shows the counts of correct and incorrect predictions for each class. .. GENERATED FROM PYTHON SOURCE LINES 293-294 Let's first start with a basic confusion matrix: .. GENERATED FROM PYTHON SOURCE LINES 294-297 .. code-block:: Python cm_display = report.metrics.confusion_matrix() cm_display.plot() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_006.png :alt: Confusion Matrix Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_006.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 298-302 In binary classification, a confusion matrix depends on the decision threshold used to convert predicted probabilities into class labels. By default, skore uses a threshold of 0.5, but confusion matrices are actually computed at every threshold internally. .. GENERATED FROM PYTHON SOURCE LINES 302-308 .. code-block:: Python # To visualize the confusion matrix at a different threshold, use the # ``threshold_value`` parameter. For example, a threshold of 0.3 will classify # more samples as positive: cm_display.plot(threshold_value=0.3) .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_007.png :alt: Confusion Matrix Decision threshold: 0.30 Positive label: allowed Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_007.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 309-311 We can normalize the confusion matrix to get percentages instead of raw counts. Here we normalize by true labels (rows): .. GENERATED FROM PYTHON SOURCE LINES 311-313 .. code-block:: Python cm_display.plot(normalize="true") .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_008.png :alt: Confusion Matrix Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_008.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 314-316 More plotting options are available via ``heatmap_kwargs``, which are passed to seaborn's heatmap. For example, we can customize the colormap and number format: .. GENERATED FROM PYTHON SOURCE LINES 316-319 .. code-block:: Python cm_display.set_style(heatmap_kwargs={"cmap": "Greens", "fmt": ".2e"}) cm_display.plot() .. image-sg:: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_009.png :alt: Confusion Matrix Data source: Test set :srcset: /auto_examples/model_evaluation/images/sphx_glr_plot_estimator_report_009.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 320-322 Finally, the confusion matrix can also be exported as a pandas DataFrame for further analysis: .. GENERATED FROM PYTHON SOURCE LINES 322-325 .. code-block:: Python cm_display.frame() .. raw:: html
true_label predicted_label value
0 allowed allowed 451
1 allowed disallowed 550
2 disallowed allowed 168
3 disallowed disallowed 13543


.. GENERATED FROM PYTHON SOURCE LINES 326-330 .. seealso:: For using the :class:`~skore.EstimatorReport` to inspect your models, see :ref:`example_feature_importance`. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 32.020 seconds) .. _sphx_glr_download_auto_examples_model_evaluation_plot_estimator_report.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_estimator_report.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_estimator_report.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_estimator_report.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_