Descriptive statistic on dataset
mm_describe_df.Rd
This function provides a summary of a dataset, including both numeric and
non-numeric variables. For numeric variables, it calculates basic descriptive
statistics such as minimum, maximum, median, mean, and count of non-missing
values. Additionally, users can pass custom functions via the fn
argument to
compute additional statistics for numeric variables. For non-numeric variables,
it provides frequency
counts and proportions for each unique value.
Arguments
- data
A data frame containing the dataset to be summarized.
- ...
(Optional) Column to include in the summary. If no column is specifie, all columns in the data will be included.
- fn
A named list of functions to apply to numeric variables. Each function must accept
x
as a vector of numeric values and return a single value or a named vector. Additional arguments for these functions can be specified as a list. For example:fn = list('sum' = list(na.rm = TRUE), 'sd')
.
Examples
mm_describe_df(data = data.frame(x = c(1:3, NA),
y = c(3:4, NA, NA),
z = c("A", "A", "B", "A")),
y, x, z,
fn = list('sum' = list(na.rm = TRUE), 'sd' = list(na.rm = TRUE))
)
#> # A tibble: 4 × 12
#> Variable Group Prop N Min Max Median Mean `CI Left` `CI Right` sum
#> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 y NA NA 2 3 4 3.5 3.5 -2.85 9.85 7
#> 2 x NA NA 3 1 3 2 2 -0.484 4.48 6
#> 3 z A 75 3 NA NA NA NA NA NA NA
#> 4 z B 25 1 NA NA NA NA NA NA NA
#> # ℹ 1 more variable: sd <dbl>