I just wrote a post about grouped sequences in dplyr and following that, I’ve been made aware of another couple of solutions to this problem (credit John Mackintosh). The solution involves using the consecutive_id()
function, available in dplyr since v1.1.0. In the help page for this function, it’s mentioned that it was inspired by rleid()
function from the data.table
package. These functions work similarly to the rle()
function I used last time (in what I called ‘the complicated solution’) but provide neater outputs. I’ll try them both out here.
library(tidyverse)
sports_df <- tibble::tribble(
~team, ~date, ~home_or_away, ~num_matches_away,
"Team A", "07/10/2022", "H", 0L,
"Team A", "14/10/2022", "A", 1L,
"Team A", "21/10/2022", "A", 2L,
"Team A", "28/10/2022", "H", 0L,
"Team A", "04/11/2022", "A", 1L,
"Team A", "11/11/2022", "A", 2L,
"Team A", "18/11/2022", "A", 3L,
"Team A", "25/11/2022", "H", 0L,
"Team A", "02/12/2022", "H", 0L,
"Team A", "09/12/2022", "H", 0L,
"Team B", "07/10/2022", "H", 0L,
"Team B", "14/10/2022", "A", 1L,
"Team B", "21/10/2022", "A", 2L,
"Team B", "28/10/2022", "A", 3L,
"Team B", "04/11/2022", "H", 0L,
"Team B", "11/11/2022", "A", 1L,
"Team B", "18/11/2022", "A", 2L,
"Team B", "25/11/2022", "H", 0L,
"Team B", "02/12/2022", "H", 0L,
"Team B", "09/12/2022", "A", 1L
) %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
arrange(team, date)
dplyr::consecutive_id()
sports_df2 <- sports_df %>%
group_by(team) %>%
mutate(consec_id = consecutive_id(home_or_away)) %>%
group_by(team, consec_id) %>%
mutate(days_on_road = 1:n(),
days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>%
ungroup()
sports_df2
## # A tibble: 20 × 6
## team date home_or_away num_matches_away consec_id days_on_road
## <chr> <date> <chr> <int> <int> <int>
## 1 Team A 2022-10-07 H 0 1 0
## 2 Team A 2022-10-14 A 1 2 1
## 3 Team A 2022-10-21 A 2 2 2
## 4 Team A 2022-10-28 H 0 3 0
## 5 Team A 2022-11-04 A 1 4 1
## 6 Team A 2022-11-11 A 2 4 2
## 7 Team A 2022-11-18 A 3 4 3
## 8 Team A 2022-11-25 H 0 5 0
## 9 Team A 2022-12-02 H 0 5 0
## 10 Team A 2022-12-09 H 0 5 0
## 11 Team B 2022-10-07 H 0 1 0
## 12 Team B 2022-10-14 A 1 2 1
## 13 Team B 2022-10-21 A 2 2 2
## 14 Team B 2022-10-28 A 3 2 3
## 15 Team B 2022-11-04 H 0 3 0
## 16 Team B 2022-11-11 A 1 4 1
## 17 Team B 2022-11-18 A 2 4 2
## 18 Team B 2022-11-25 H 0 5 0
## 19 Team B 2022-12-02 H 0 5 0
## 20 Team B 2022-12-09 A 1 6 1
identical(sports_df2$days_on_road, sports_df$num_matches_away)
## [1] TRUE
data.table::rleid()
sports_df3 <- sports_df %>%
group_by(team) %>%
mutate(rleid_id = data.table::rleid(home_or_away)) %>%
group_by(team, rleid_id) %>%
mutate(days_on_road = 1:n(),
days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>%
ungroup()
sports_df3
## # A tibble: 20 × 6
## team date home_or_away num_matches_away rleid_id days_on_road
## <chr> <date> <chr> <int> <int> <int>
## 1 Team A 2022-10-07 H 0 1 0
## 2 Team A 2022-10-14 A 1 2 1
## 3 Team A 2022-10-21 A 2 2 2
## 4 Team A 2022-10-28 H 0 3 0
## 5 Team A 2022-11-04 A 1 4 1
## 6 Team A 2022-11-11 A 2 4 2
## 7 Team A 2022-11-18 A 3 4 3
## 8 Team A 2022-11-25 H 0 5 0
## 9 Team A 2022-12-02 H 0 5 0
## 10 Team A 2022-12-09 H 0 5 0
## 11 Team B 2022-10-07 H 0 1 0
## 12 Team B 2022-10-14 A 1 2 1
## 13 Team B 2022-10-21 A 2 2 2
## 14 Team B 2022-10-28 A 3 2 3
## 15 Team B 2022-11-04 H 0 3 0
## 16 Team B 2022-11-11 A 1 4 1
## 17 Team B 2022-11-18 A 2 4 2
## 18 Team B 2022-11-25 H 0 5 0
## 19 Team B 2022-12-02 H 0 5 0
## 20 Team B 2022-12-09 A 1 6 1
identical(sports_df3$days_on_road, sports_df$num_matches_away)
## [1] TRUE
We now have 4 solutions to this problem - it’s often the case that there are many ways to do the same thing in R! It can be extra effort to remember different solutions but they work in different ways and can be applied to different problems so I do think it’s worthwhile trying these out. Which solution is your favourite?