Skip to content

Releases: ModelOriented/shapviz

CRAN release 0.10.1

23 Jun 18:13
7c2ad72
Compare
Choose a tag to compare

Maintenance

  • Bump ggplot2 and patchwork dependencies.

CRAN release 0.10.0

22 Jun 15:09
a5b5298
Compare
Choose a tag to compare

Major improvements

  • sv_dependence(): The new arguments ylim and share_y = FALSE allow to control the y-axis limits.
    They help to assess the importance in multiple dependence plots (#172).
    Later, we might change the default to share_y = TRUE (as in Python's SHAP dependence plots) (#171).
  • sv_interaction() has received a new visualization: kind = "bar" now shows mean absolute SHAP interactions/main effects as a barplot.
    Its appearance can be modified by the arguments fill and bar_width (#169).
  • We are now (cautiously) collecting axes, axis titles, and color guides via {patchwork} (#171).
    Currently fails for sv_force().

Minor user-visible changes

  • The color bars in sv_dependence2D() are less wide (#169).
  • Jittering now uses a seed (#174).

Minor API changes

  • In sv_dependence(), passing the same variable for v and color_var does not suppress the color axis anymore,
    except when interactions = TRUE (#171).
  • sv_dependence() and sv_dependence2D() has received a seed = 1 argument used for jittering.
    This does not modify the global seed (#174).

Maintenance

  • Update code coverage version (#168).
  • More unit tests (#173).
  • Bump minimal version of {ggplot2} (#174).

CRAN release 0.9.7

19 Jan 19:34
052c9f8
Compare
Choose a tag to compare

Documentation

  • H2O now supports passing background data for model agnostic SHAP. This is now easier visible in {shapviz}, see h2oai/h2o-3#16463.
  • H2O random forests (regression and binary classification) now support TreeSHAP as well #163.

Compatibility

  • Adapt for upcoming {xgboost} version.
  • Adapt for upcoming {shapr} version, thanks @martinju for the fix #162.

CRAN release 0.9.6

11 Oct 11:16
3a91c0b
Compare
Choose a tag to compare

Documentation

  • Fixed wrong link vignette #158.

CRAN release 0.9.5

14 Sep 21:27
96da952
Compare
Choose a tag to compare

User-visible changes

  • sv_waterfall() and sv_force(): The x label has been changed from "SHAP value" to "Prediction".

Documentation

  • Add vignette for Tidymodels.
  • Update vignettes.
  • Update README.

CRAN release 0.9.4

22 Aug 07:15
a33a26e
Compare
Choose a tag to compare

API improvements

  • Support both XGBoost 1.x.x as well as XGBoost 2.x.x, implemented in #144.

Other improvements

  • New argument sort_features = TRUE in sv_importance() and sv_interaction(). Set to FALSE to show the features as they appear in your SHAP matrix. In that case, the plots will show the first max_display features, not the most important features. Implements #137.

Bug fixes

  • shapviz.xgboost() would fail if a single row is passed. This has been fixed in #142. Thanks @sebsilas for reporting.

CRAN release 0.9.3

12 Jan 12:41
8e4c726
Compare
Choose a tag to compare

sv_dependence(): Control over automatic color feature selection

How is the color feature selected, anyway?

If no SHAP interaction values are available, by default, the color feature v' is selected by the heuristic potential_interaction(), which works as follows:

  1. If the feature v (the on the x-axis) is numeric, it is binned into nbins bins.
  2. Per bin, the SHAP values of v are regressed onto v' and the R-squared is calculated. Rows with missing v' are discarded.
  3. The R-squared are averaged over bins, weighted by the number of non-missing v' values.

This measures how much variability in the SHAP values of v is explained by v', after accounting for v.

We have introduced four parameters to control the heuristic. Their defaults are in line with the old behaviour.

  • nbin = NULL: Into how many quantile bins should a numeric v be binned? The default NULL equals the smaller of $n/20$ and $\sqrt n$ (rounded up), where $n$ is the sample size.
  • color_num Should color features be converted to numeric, even if they are factors/characters? Default is TRUE.
  • scale = FALSE: Should R-squared be multiplied with the sample variance of
    within-bin SHAP values? If TRUE, bins with stronger vertical scatter will get higher weight. The default is FALSE.
  • adjusted = FALSE: Should adjusted R-squared be calculated?

If SHAP interaction values are available, these parameters have no effect. In sv_dependence() they are called ih_nbin etc.

This partly implements the ideas in #119 of Roel Verbelen, thanks a lot for your patient explanations!

Further plans?

We will continue to experiment with the defaults, which might change in the future. A good alternative to the current (naive) defaults could be:

  • nbins = 7: Smaller than now to not overfit too strongly with factor/character color features.
  • color_num = FALSE: To not naively integer encode factors/characters.
  • scale = TRUE: To account for non-equal spread in bins.
  • adjusted = TRUE: To not put too much weight on factors with many categories.

Other user-visible changes

  • sv_dependence(): If color_var = "auto" (default) and no color feature seems to be relevant (SHAP interaction is NULL, or heuristic returns no positive value), there won't be any color scale. Furthermore, in some edge cases, a different
    color feature might be selected.
  • mshapviz() objects can now be rowbinded via rbind() or +. Implemented by @jmaspons in #110.
  • mshapviz() is more strict when combining multiple "shapviz" objects. These now need to have identical column names, see #114.

Small changes

  • The README is shorter and easier.
  • Updated vignettes.
  • print.shapviz() now shows top two rows of SHAP matrix.
  • Re-activate all unit tests.
  • Setting nthread = 1 in all calls to xgb.DMatrix() as suggested by @jmaspons in #109.
  • Added "How to contribute" to README.
  • permshap() connector is now part of {kerneshap} #122.

Bug fixes

  • sv_dependence2D(): In case add_vars are passed, x and/or y are removed from it in order to not use any variable twice. #116.
  • split.shapviz() now drops empty levels. They launched an error because empty "shapviz" objects are currently not supported. #117, #118

CRAN release 0.9.2

14 Oct 17:29
8980218
Compare
Choose a tag to compare

User-visible changes

  • sv_importance() of a "mshapviz" object now returns a dodged barplot instead of separate barplots via {patchwork}. Use the new argument bar_type to switch to a stacked barplot (bar_type = "stack"), to "facets" (via {ggplot2}), or "separate" for the old behaviour.

New features

  • Added connector to permshap, a package calculating permutation SHAP values for regression and (probabilistic) classification.

Other changes

  • Revised vignette on "mshapviz".
  • Commenting out most unit tests as they would not pass timings measured on Debian.

CRAN release 0.9.1

18 Jul 19:25
fd1a01f
Compare
Choose a tag to compare

New features

  • dimnames.shapviz() has received a replacement method. You can thus change the column names of SHAP matrix and feature data (as well as SHAP interactions) by colnames(x) <- ..., see #98

Maintenance

  • Fix for #100 (package_version() applied to numeric value will be deprecated in the future)

CRAN release 0.9.0

09 Jun 15:16
d88af20
Compare
Choose a tag to compare

New features

  • New plot function sv_dependence2D(): x and y coordinates are two features, while their summed SHAP values are shown on the color scale. If interaction = TRUE, SHAP interaction values are shown on the color scale instead. The function is vectorized in x and/or y. This visualization is especially useful for models with geographic components.
  • split(x, f) splits a "shapviz" object x into a "mshapviz" object.

Documentation

  • Slight improvements in help/docu.
  • New vignette on models with geographic components.
  • Added a fantastic house price dataset with about 14,000 houses sold in Miami-Date County, thanks Steven C. Bourassa.

API improvements

  • "mshapviz" object created from multioutput "kernelshap" object retains names.