-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Description
Feature request
Add a function shuffle_rows()
to randomly reorder rows in the current tibble/slice:
dataset_tbl |>
dplyr::shuffle_rows()
Motivation
Currently, shuffling all rows in a slice/tibble requires
dataset_tbl |> dplyr::slice_sample(prop = 1)
or similar variations.
This approach has two main problems:
- Unclear intent... because it is not obvious whether the goal is to shuffle rows or sample rows.
- Fragile code... because small changes to the arguments passed can silently alter the result:
- if
prop
is changed from1
, the output is no longer a complete shuffle; - if
replace = TRUE
is used (accidentally or via copy-paste), rows may repeat or disappear.
- if
Introducing shuffle_rows()
would make the goal clear and make code more robust.
Implementation sketch
shuffle_rows <- function(.data, ..., .by = NULL) {
dplyr::slice(
.data,
.by = {{ .by }},
base::local({
number_of_rows <- dplyr::n()
base::sample.int(n = number_of_rows, size = number_of_rows, replace = FALSE)
})
)
}
Example use
datasets::penguins |>
shuffle_rows()
# # A tibble: 3 × 8
# species island bill_len bill_dep flipper_len body_mass sex year
# <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
# 1 Adelie Biscoe 37.6 19.1 194 3750 male 2008
# 2 Chinstrap Dream 46.9 16.6 192 2700 female 2008
# 3 Chinstrap Dream 42.4 17.3 181 3600 female 2007
# ...
Metadata
Metadata
Assignees
Labels
No labels