Skip to content

Suggested enhancement: Allow for binning within stat_prop #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kieran-mace opened this issue May 22, 2025 · 1 comment
Open

Suggested enhancement: Allow for binning within stat_prop #98

kieran-mace opened this issue May 22, 2025 · 1 comment

Comments

@kieran-mace
Copy link

I'd love the ability to use your stat_prop abilities to normalize proportions, but do so within a bin (calculated using stat_bin)

See example below of why / what I'd like to achieve. I believe ggstats::stat_prop is very close to what's needed, but I wonder if its possible to combine it with the abilities in ggplot2::stat_bin specifically reducing continuous variables by binning with a bin_width

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(ggplot2) 
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# set up data
laker_player_plays = lakers |> 
  tibble::as_tibble() |> 
  filter(team == 'LAL', stringr::str_length(player) > 0) |> 
  mutate(date = ymd(date))

# calculate breaks, for solutions that can't use stat_bin
breaks = seq(min(laker_player_plays$date), max(laker_player_plays$date), by = 31)

# Desired output, achievable through preprocessing:
# just pre-processing the data
laker_player_plays |> 
  mutate(date_group = cut(date, breaks = breaks)) |>
  group_by(player, date_group) |> 
  count(name = 'plays') |> 
  group_by(date_group) |> 
  mutate(proportion_of_plays = plays/sum(plays)) |> 
  ggplot(aes(x = date_group, 
             y = proportion_of_plays,
             color = player,
             group = player)) +
  geom_point() +
  geom_line() +
  scale_y_continuous(labels=scales::percent)

Desired Output

# closest you can get from pure ggplot2: abandoning the lines, use geom_histogram + position = stack
# advantage: binwidth processed during stat
ggplot(laker_player_plays) +
  geom_histogram(aes(x = date, fill = player), position = 'fill', binwidth = 31)

Best possible with pure ggplot2

# Using ggstats::stat_prop to normalize the proportions, but pre-bin the x axis
# advantage: counts normalized during stat,
# disadvantage: binning must occur before
laker_player_plays |> 
  mutate(date_group = cut(date, breaks = breaks)) |>
ggplot() +
  ggstats::stat_prop(aes(x = date_group, 
                         by = date_group, 
                         group = player,
                         color = player, 
                         y = after_stat(prop)),
                     position = 'identity',
                     geom = 'line') +
  scale_y_continuous(labels=scales::percent)

Great ability using ggstats, but lacking the ability to do binning

# Desired capability:
# laker_player_plays |>
#   ggplot() +
#   ggstats::stat_prop_by_bin(aes(x = date,
#                                 group = player,
#                                 color = player,
#                                 y = after_stat(prop)),
#                             binwidth = 31,
#                             position = 'identity',
#                             geom = 'line') +
#   scale_y_continuous(labels=scales::percent)

Created on 2025-05-22 with reprex v2.1.1

@larmarange
Copy link
Owner

Dear @kieran-mace

it seems related to tidyverse/ggplot2#6478

Would it be enough to achieve what you are looking for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants