tell us
write a little paragraph about the package that the function is from, what it does, and how it might be useful to other psychology students
Describe is a function in the psych package that takes a set of data and produces a set of descriptive statistics in a data.frame. This is useful for psychology students as these statistics are some of the most frequently used, and can be checked to ensure there are no coding error as it produces the range for each variable. The descriptive data will only be produced when the data makes sense (e.g it will not be produced for alphanumeric data). There are other versions of this function such as describeData, which reports on the data types in the dataset, as well as describeFast, which produces less statistics for a more brief overview. There is also describeBy to describe the data for certain groups, similar to how one would use group_by() to group variables together.
show us
write a little demo to show how to install/load the package and use the function on some real data
install and load packages
library(palmerpenguins)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
get some data
penguins <- penguins use the function
#example 1, trial 1
penguins %>%
describe(bill_length_mm)
#example 1, trial 2
penguins %>%
describe(penguins$bill_depth_mm)#example 1 - looks at the descriptives
penguins %>%
describe()## vars n mean sd median trimmed mad min max
## species* 1 344 1.92 0.89 2.00 1.90 1.48 1.0 3.0
## island* 2 344 1.66 0.73 2.00 1.58 1.48 1.0 3.0
## bill_length_mm 3 342 43.92 5.46 44.45 43.91 7.04 32.1 59.6
## bill_depth_mm 4 342 17.15 1.97 17.30 17.17 2.22 13.1 21.5
## flipper_length_mm 5 342 200.92 14.06 197.00 200.34 16.31 172.0 231.0
## body_mass_g 6 342 4201.75 801.95 4050.00 4154.01 889.56 2700.0 6300.0
## sex* 7 333 1.50 0.50 2.00 1.51 0.00 1.0 2.0
## year 8 344 2008.03 0.82 2008.00 2008.04 1.48 2007.0 2009.0
## range skew kurtosis se
## species* 2.0 0.16 -1.73 0.05
## island* 2.0 0.61 -0.91 0.04
## bill_length_mm 27.5 0.05 -0.89 0.30
## bill_depth_mm 8.4 -0.14 -0.92 0.11
## flipper_length_mm 59.0 0.34 -1.00 0.76
## body_mass_g 3600.0 0.47 -0.74 43.36
## sex* 1.0 -0.02 -2.01 0.03
## year 2.0 -0.05 -1.51 0.04
#example 2
describe(penguins[, c("body_mass_g", "flipper_length_mm")])## vars n mean sd median trimmed mad min max range
## body_mass_g 1 342 4201.75 801.95 4050 4154.01 889.56 2700 6300 3600
## flipper_length_mm 2 342 200.92 14.06 197 200.34 16.31 172 231 59
## skew kurtosis se
## body_mass_g 0.47 -0.74 43.36
## flipper_length_mm 0.34 -1.00 0.76
more resources
write a little paragraph about how you learned about the function- what did you google? Include a list of the documentation that you found useful andresources that someone learning about the function might need. If you can find pictures or memes to include, great!!
To find out more about the function, we Googled it and found (https://methodenlehre.github.io/SGSCLM-R-course/descriptive-statistics.html)[this blog]. This told us the generic forms of the function which are as follows.
"The generic form is: describe(x, na.rm = TRUE, interp = FALSE, skew = TRUE, ranges = TRUE, trim = .1, type = 3, check = TRUE).
x stands for the data frame or the variable to be analyzed (df$variable). The defaults are: * interp = FALSE refers to the definition of the median (interp = TRUE uses our method of averaging adjacent values for an even n) * skew = TRUE displays skewness, kurtosis and the trimmed mean * ranges = TRUE displays the range * trim = .1 refers to the proportion of the distribution that is trimmed at the lower and upper ends for the trimmed mean (default trimming is 10% on both sides, thus the trimmed mean is computed from the middle 80% of the data) * type = 3 refers to the method of computing skewness and kurtosis (more here: ?psych::describe) * check = TRUE refers to checking for non-numeric variables in the dataset (for which describe has no use); if check = FALSE non-numeric variables exist, an error message is displayed."