library(ggplot2)
library(foreign)
library(epiDisplay)
library(outliers)
library(psych)hw2_psy5100
data <- read.csv("/Users/iancopeland/Documents/R/stats_1/datasets/hw2_data.csv")1 Sample summary
1.2 Categorical variables
Gender
table(data$gender)
1 2
130 118
Education
table(data$education)
2 3 4 5 6
30 51 37 99 30
Ethnicity
table(data$ethnicity)
1 2 3 4 6
200 22 1 20 5
df <- data.frame(
Group = c("White", "Black", "American Indian or Alaska Native", "Asian", "Native Hawaiian or Pacific Islanders", "Other"),
value = c(200, 22, 1, 20, 0, 5)
)
ggplot(df, aes(x="", y=value, fill=Group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +
labs(
y= "",
x= "",
title = "Ethnicity",
subtitle = "Figure 1",
)1.3 Continuous variables
Hours worked
hours <- describe(data$hours_work,
na.rm=FALSE,
skew=TRUE)
print(hours,
signif=3) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 248 41.6 7.19 40 41.3 5.93 9 72 63 0.58 3.73 0.46
grubbs.test(data$hours_work)
Grubbs test for one outlier
data: data$hours_work
G = 4.5349, U = 0.9164, p-value = 0.0004546
alternative hypothesis: lowest value 9 is an outlier
hist(data$hours_work,
main = "Figure 2",
xlab = "Hours Worked",
col = "red"
)Age
age <- describe(data$age,
na.rm=FALSE,
skew=TRUE)
print(age,
signif=3) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 248 34.8 9.58 32 33.6 7.41 21 70 49 1.13 1.13 0.61
grubbs.test(data$age)
Grubbs test for one outlier
data: data$age
G = 3.67600, U = 0.94507, p-value = 0.02429
alternative hypothesis: highest value 70 is an outlier
hist(data$age,
main = "Figure 3",
xlab = "Age",
col = "blue"
)1.4 Summary
- In terms of the distribution in our categorical variables, the sample is relatively split even with gender, although men held a slight majority of 54.2% and women with 47.6%. No observations identified as “Other” in this sample. Each level of education was identified by at least 30 subjects, although a plurality (40.1%) of the sample consisted of those with a bachelor’s degree. Individuals identifying as “White” made up a strong majority of the sample, as seen in Figure 1. Our two continuous variables, hours worked and age, are shown in both Figure 2 and Figure 3. The amount hours worked in our sample was distributed relatively normally with a significant peak at the mean of 41.62 hours. However, the age of our sample was positively skewed compared to it’s mean of 34.79 with an outlier of 70.
2 Personality variables
2.1 Networking ability
shapiro.test(data$pskill_NA_mean)
Shapiro-Wilk normality test
data: data$pskill_NA_mean
W = 0.97524, p-value = 0.0002539
pskill_NA <- describe(data$pskill_NA_mean,
na.rm=FALSE,
skew=TRUE)
print(pskill_NA,
signif=3) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 248 4.52 1.38 4.67 4.58 1.48 1 7 6 -0.36 -0.59 0.09
hist(data$pskill_NA_mean,
prob = TRUE,
main = "Figure 4",
xlab = "Networking ability"
)
lines(density(data$pskill_NA_mean),
col = 2,
lwd = 3
)- Our sample’s self reported mean score of networking ability is not distributed normally (Shapiro-Wilk’s test p = .0002539), with a mean score of 4.52 and standard deviation of 1.38. As shown in Figure 4, this variable has a negative skew.
2.3 Big 5 measure: Extraversion
big5ex <- describe(data$bigfiveEx,
na.rm=FALSE,
skew=TRUE)
print(big5ex,
signif=3) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 248 2.91 1.07 3 2.9 1.48 1 5 4 0 -0.86 0.07
boxplot(bigfiveEx~education,
data=data,
main="Figure 5",
xlab="Education",
ylab="Extraversion",
col = "orange"
)- With a median score (3.0) larger than the mean score (2.9) of our extraversion measure, I was curious what other variable could be influencing these scores. When controlled for education level, Figure 5 shows that those with a post graduate degree reported higher levels of extraversion than other levels of education in this dataset.
2.4 Core self evaluations
shapiro.test(data$CSE_mean)
Shapiro-Wilk normality test
data: data$CSE_mean
W = 0.97662, p-value = 0.0004139
cse <- describe(data$CSE_mean,
na.rm=FALSE,
skew=TRUE)
print(cse,
signif=3) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 248 3.71 0.78 3.75 3.75 0.86 1.25 5 3.75 -0.42 -0.22 0.05
hist(data$CSE_mean,
prob = TRUE,
main = "Figure 6",
xlab = "Core self evaluations"
)
lines(density(data$CSE_mean),
col = 2,
lwd = 3
)- This sample’s core self evaluations were distributed non-normally, in a negatively skewed direction as shown by Figure 6. With mean of 3.71 and standard deviation of .78, variance among scores was low and clustered around the mean.