hw2_psy5100

library(ggplot2)
library(foreign)
library(epiDisplay)
library(outliers)
library(psych)
data <- read.csv("/Users/iancopeland/Documents/R/stats_1/datasets/hw2_data.csv")

1 Sample summary

1.2 Categorical variables

Gender

table(data$gender)

  1   2 
130 118 

Education

table(data$education)

 2  3  4  5  6 
30 51 37 99 30 

Ethnicity

table(data$ethnicity)

  1   2   3   4   6 
200  22   1  20   5 
df <- data.frame(
  Group = c("White", "Black", "American Indian or Alaska Native", "Asian", "Native Hawaiian or Pacific Islanders", "Other"),
  value = c(200, 22, 1, 20, 0, 5)
  )

ggplot(df, aes(x="", y=value, fill=Group)) +
  geom_bar(stat="identity", width=1) +
  coord_polar("y", start=0) +
  labs(
    y= "",
    x= "",
    title = "Ethnicity",
    subtitle = "Figure 1",
    )

1.3 Continuous variables

Hours worked

hours <- describe(data$hours_work,
         na.rm=FALSE,
         skew=TRUE)

print(hours, 
      signif=3)
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 248 41.6 7.19     40    41.3 5.93   9  72    63 0.58     3.73 0.46
grubbs.test(data$hours_work)

    Grubbs test for one outlier

data:  data$hours_work
G = 4.5349, U = 0.9164, p-value = 0.0004546
alternative hypothesis: lowest value 9 is an outlier
hist(data$hours_work, 
     main = "Figure 2", 
     xlab = "Hours Worked", 
     col = "red"
     )

Age

age <- describe(data$age,
         na.rm=FALSE,
         skew=TRUE)

print(age, 
      signif=3)
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 248 34.8 9.58     32    33.6 7.41  21  70    49 1.13     1.13 0.61
grubbs.test(data$age)

    Grubbs test for one outlier

data:  data$age
G = 3.67600, U = 0.94507, p-value = 0.02429
alternative hypothesis: highest value 70 is an outlier
hist(data$age, 
     main = "Figure 3", 
     xlab = "Age", 
     col = "blue"
     )

1.4 Summary

  • In terms of the distribution in our categorical variables, the sample is relatively split even with gender, although men held a slight majority of 54.2% and women with 47.6%. No observations identified as “Other” in this sample. Each level of education was identified by at least 30 subjects, although a plurality (40.1%) of the sample consisted of those with a bachelor’s degree. Individuals identifying as “White” made up a strong majority of the sample, as seen in Figure 1. Our two continuous variables, hours worked and age, are shown in both Figure 2 and Figure 3. The amount hours worked in our sample was distributed relatively normally with a significant peak at the mean of 41.62 hours. However, the age of our sample was positively skewed compared to it’s mean of 34.79 with an outlier of 70.

2 Personality variables

2.1 Networking ability

shapiro.test(data$pskill_NA_mean)

    Shapiro-Wilk normality test

data:  data$pskill_NA_mean
W = 0.97524, p-value = 0.0002539
pskill_NA <- describe(data$pskill_NA_mean,
         na.rm=FALSE,
         skew=TRUE)

print(pskill_NA, 
      signif=3)
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 248 4.52 1.38   4.67    4.58 1.48   1   7     6 -0.36    -0.59 0.09
hist(data$pskill_NA_mean, 
     prob = TRUE,
     main = "Figure 4",
     xlab = "Networking ability"
     )
lines(density(data$pskill_NA_mean), 
      col = 2, 
      lwd = 3
      )

  • Our sample’s self reported mean score of networking ability is not distributed normally (Shapiro-Wilk’s test p = .0002539), with a mean score of 4.52 and standard deviation of 1.38. As shown in Figure 4, this variable has a negative skew.

2.3 Big 5 measure: Extraversion

big5ex <- describe(data$bigfiveEx,
         na.rm=FALSE,
         skew=TRUE)

print(big5ex, 
      signif=3)
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 248 2.91 1.07      3     2.9 1.48   1   5     4    0    -0.86 0.07
boxplot(bigfiveEx~education,
        data=data, 
        main="Figure 5",
        xlab="Education", 
        ylab="Extraversion",
        col = "orange"
        )

  • With a median score (3.0) larger than the mean score (2.9) of our extraversion measure, I was curious what other variable could be influencing these scores. When controlled for education level, Figure 5 shows that those with a post graduate degree reported higher levels of extraversion than other levels of education in this dataset.

2.4 Core self evaluations

shapiro.test(data$CSE_mean)

    Shapiro-Wilk normality test

data:  data$CSE_mean
W = 0.97662, p-value = 0.0004139
cse <- describe(data$CSE_mean,
         na.rm=FALSE,
         skew=TRUE)

print(cse, 
      signif=3)
   vars   n mean   sd median trimmed  mad  min max range  skew kurtosis   se
X1    1 248 3.71 0.78   3.75    3.75 0.86 1.25   5  3.75 -0.42    -0.22 0.05
hist(data$CSE_mean, 
     prob = TRUE,
     main = "Figure 6",
     xlab = "Core self evaluations"
     )
lines(density(data$CSE_mean), 
      col = 2, 
      lwd = 3
      )

  • This sample’s core self evaluations were distributed non-normally, in a negatively skewed direction as shown by Figure 6. With mean of 3.71 and standard deviation of .78, variance among scores was low and clustered around the mean.