Basic Statistics

Load Libraries

# if you haven't run this code before, you'll need to download the below packages first
# instructions on how to do this are included in the video
# but as a reminder, you use the packages tab to the right

library(psych) # for the describe() command
library(expss) # for the cross_cases() command

## Loading required package: maditr

## 
## To modify variables or add new variables:
##              let(mtcars, new_var = 42, new_var2 = new_var*hp) %>% head()

## 
## Attaching package: 'maditr'

## The following object is masked from 'package:base':
## 
##     sort_by

Import Data

# import our data for the lab
# for the homework, you will import the mydata.csv that we created in the Data Prep Lab

d2 <- read.csv(file="Data/mydata.csv", header = T)

Univariate Plots: Histograms & Tables

table(d2$race_rc) #the table command shows us what the levels of this variable are, and how many participants in each level

## 
##       asian       black    hispanic multiracial  nativeamer       other 
##         134         171         216         192           5          78 
##       white 
##        1240

table(d2$age)

## 
## 1 between 18 and 25 2 between 26 and 35 3 between 36 and 45           4 over 45 
##                1871                 111                  37                  17

hist(d2$moa_independence) #the hist command creates a histogram of the variable

hist(d2$moa_role)

hist(d2$moa_safety)

hist(d2$moa_maturity)

Univariate Normality

We analyzed the skew and kurtosis of our continuous variables and all were within the accepted range (-2/+2).(True for the lab!!! may not be true for hw!!!)

We analyzed the skew and kurtosis of our … and most were within the accepted range (-2/+2). However, some variables (list them in parentheses) were outside of the accepted range. For this analysis, we will use them anyway, but outside of this class this is bad practice.

describe(d2) #we use this to check univariate normality ... skew and kurtosis, (-2/+2)

##                  vars    n mean   sd median trimmed  mad  min max range  skew
## race_rc*            1 2036 5.43 2.15   7.00    5.75 0.00 1.00   7  6.00 -0.87
## age*                2 2036 1.12 0.43   1.00    1.00 0.00 1.00   4  3.00  4.36
## moa_independence    3 2036 3.54 0.46   3.67    3.61 0.49 1.00   4  3.00 -1.49
## moa_role            4 2036 2.97 0.72   3.00    3.00 0.74 1.00   4  3.00 -0.33
## moa_safety          5 2036 3.21 0.65   3.25    3.27 0.74 1.00   4  3.00 -0.70
## moa_maturity        6 2036 3.61 0.43   3.67    3.67 0.49 1.33   4  2.67 -1.24
##                  kurtosis   se
## race_rc*            -0.89 0.05
## age*                20.52 0.01
## moa_independence     2.74 0.01
## moa_role            -0.81 0.02
## moa_safety          -0.09 0.01
## moa_maturity         1.74 0.01

Bivariate Plots

Crosstabs

cross_cases(d2, race_rc, age) #update variable2 and variable3 with your categorical variable names

	age
	1 between 18 and 25	2 between 26 and 35	3 between 36 and 45	4 over 45
race_rc
asian	129	4	1
black	137	27	3	4
hispanic	192	18	6
multiracial	178	10	4
nativeamer	5
other	70	5	3
white	1160	47	20	13
#Total cases	1871	111	37	17

Scatterplots

plot(d2$moa_independence, d2$mmoa_safety,
     main="Scatterplot of moa_independence and moa_safety",
     xlab = "moa_independence",
     ylab = "mmoa_safety")

plot(d2$moa_role, d2$moa_maturity,
     main="Scatterplot of moa_role and moa_maturity",
     xlab = "moa_role",
     ylab = "moa_maturity")

Boxplots

# boxplots use one categorical and one continuous variable
# make sure that you enter them in the right order!!!!!!!!
# categorical variable goes BEFORE the tilde
# continuous variable goes AFTER the tilde!

boxplot(data=d2, moa_safety~race_rc,
        main="Boxplot of race_rc and moa_safety",
        xlab = "race_rc",
        ylab = "moa_safety")

boxplot(data=d2, moa_maturity~age,
        main="Boxplot of age and moa_maturity",
        xlab = "age",
        ylab = "moa_maturity")

P421 Lab - Basic Statistics Lab

Nge Li

2024-07-08