This function provides a summary of a dataset, including both numeric and
non-numeric variables. For numeric variables, it calculates basic descriptive
statistics such as minimum, maximum, median, mean, and count of non-missing
values. Additionally, users can pass custom functions via the fn
argument to
compute additional statistics for numeric variables. For non-numeric variables,
it provides frequency
counts and proportions for each unique value.
Arguments
- data
A data frame containing the dataset to be summarized.
- ...
(Optional) Column to include in the summary. If no column is specifie, all columns in the data will be included.
- fn
A named list of functions to apply to numeric variables. Each function must accept
x
as a vector of numeric values and return a single value or a named vector. Additional arguments for these functions can be specified as a list. For example:fn = list('sum' = list(na.rm = TRUE), 'sd')
.
Examples
ct_describe_df(data = data.frame(x = c(1:3, NA),
y = c(3:4, NA, NA),
z = c("A", "A", "B", "A")),
y, x, z,
fn = list('sum' = list(na.rm = TRUE), 'sd' = list(na.rm = TRUE))
)
#> # A tibble: 4 × 12
#> Group Prop N Variable Min Max sum Median Mean `CI Left` `CI Right`
#> <chr> <dbl> <int> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 NA NA 2 y 3 4 7 3.5 3.5 -2.85 9.85
#> 2 NA NA 3 x 1 3 6 2 2 -0.484 4.48
#> 3 A 75 3 z NA NA NA NA NA NA NA
#> 4 B 25 1 z NA NA NA NA NA NA NA
#> # ℹ 1 more variable: sd <dbl>