Nazija Akter

Introduction

In this skills drill, you will be asked to practice the programming skills you have learned so far in order to investigate differences in work absenteeism between cocaine users and non-cocaine users.

The data you are analyzing is from the National Survey of Drug Use and Health (NSDUH), a survey conducted annually since 1971 by the Substance Abuse and Mental Health Services Adminsitration. You will be studying the following two variables:

  • cocaine_month: Did this respondent report using cocaine in the past 30 days? (Yes/No)
  • SelectiveLeave: # of times in the past 30 days where the respondent skipped work because they “just didnt want to be there”

Step 1: Load Packages

Load the packages necessary to (1)import, (2)manipulate, and (3)visualize data.

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Step 2: Import Data

Import your data into R

data <- read_csv("/Users/Nazija/Downloads/SkillsDrill1Data.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   marij_month = col_character(),
##   cocaine_month = col_character(),
##   pharmamonth = col_character()
## )
## See spec(...) for full column specifications.

Step 3: Preview Data

Preview the first 6 rows of your data

head(data)
## # A tibble: 6 x 47
##   marij_ever marij_month marij_year cocaine_ever cocaine_month cocaine_year
##        <dbl> <chr>            <dbl>        <dbl> <chr>                <dbl>
## 1          1 Used Marij…          1            1 Did not use …            0
## 2          0 Did not us…          0            0 Did not use …            0
## 3          1 Did not us…          0            0 Did not use …            0
## 4          0 Did not us…          0            0 Did not use …            0
## 5          0 Did not us…          0            0 Did not use …            0
## 6          0 Did not us…          0            0 Did not use …            0
## # … with 41 more variables: crack_ever <dbl>, crack_month <dbl>,
## #   crack_year <dbl>, heroin_ever <dbl>, heroin_month <dbl>, heroin_year <dbl>,
## #   hallucinogen_ever <dbl>, hallucinogen_month <dbl>, hallucinogen_year <dbl>,
## #   inhalant_ever <dbl>, inhalant_month <dbl>, inhalant_year <dbl>,
## #   meth_ever <dbl>, meth_month <dbl>, meth_year <dbl>, painrelieve_ever <dbl>,
## #   painrelieve_month <dbl>, painrelieve_year <dbl>, tranq_ever <dbl>,
## #   tranq_month <dbl>, tranq_year <dbl>, stimulant_ever <dbl>,
## #   stimulant_month <dbl>, stimulant_year <dbl>, sedative_ever <dbl>,
## #   sedative_month <dbl>, sedative_year <dbl>, anydrugever <dbl>,
## #   pharmamonth <chr>, nonpharmamonth <dbl>, nonpharmamonth_nomj <dbl>,
## #   anydrugmonth <dbl>, anydrugyear <dbl>, anydrugever_nomj <dbl>,
## #   anydrugmonth_nomj <dbl>, anydrugyear_nomj <dbl>, countofdrugs_ever <dbl>,
## #   countofdrugs_month <dbl>, countofdrugs_year <dbl>, SelectiveLeave <dbl>,
## #   SkipSick <dbl>

Step 4: Avg Days Skipped Work

Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork

data%>%
  select(cocaine_month, SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  summarize(avgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 1 x 1
##   avgDaysSkipped
##            <dbl>
## 1          0.327

Step 5: Avg Days Skipped Work by Cocaine Use

Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork, by cocaine use (cocaine_month)

data%>%
  select(cocaine_month, SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(cocaine_month)%>%
  summarize(avgDaysSkipped = mean(DaysSkippedWork))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
##   cocaine_month                    avgDaysSkipped
##   <chr>                                     <dbl>
## 1 Did not use Cocaine Past 30 days          0.322
## 2 Used Cocaine Past 30 days                 0.85

Step 6: Interpretation

Respondents that used cocaine in the past 30 days skipped more days of work on average than respondents who did not use cocaine in the past 30 days.

Step 7: Visualization

Copy the code from step 5, and paste into this code chunk. Add onto the code to create a visualization which shows a

  • column (col) chart
  • Cocaine use on the x-axis
  • average days of selective work absence on the y-axis
  • vary the fill of the line acording to the average days of selective work absence.
data%>%
  select(cocaine_month, SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(cocaine_month)%>%
  summarize(avgDaysSkipped = mean(DaysSkippedWork))%>%
  ggplot()+
  geom_col(aes(x = cocaine_month, y = avgDaysSkipped, fill = avgDaysSkipped))
## `summarise()` ungrouping output (override with `.groups` argument)

Extra Credit:

(1pt) In the following code chunk, produce the table which compares average absenteeism between marijuana users & non-users (marij_month). Be sure to filter to only keep those observations where DaysSkippedWork is less than than 31.

data%>%
  select(marij_month, SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(marij_month)%>%
  summarize(avgDaysSkipped = mean(DaysSkippedWork))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
##   marij_month                        avgDaysSkipped
##   <chr>                                       <dbl>
## 1 Did not use Marijuana Past 30 days          0.290
## 2 Used Marijuana Past 30 days                 0.591

(0.5pt) In the following code chunk, produce a column chart to visualize the average absenteeism for marijuana users vs. non-users

data%>%
  select(marij_month, SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(marij_month)%>%
  summarize(avgDaysSkipped = mean(DaysSkippedWork))%>%
  ggplot()+
  geom_col(aes(x = marij_month, y = avgDaysSkipped, fill = avgDaysSkipped))
## `summarise()` ungrouping output (override with `.groups` argument)

(0.5pt) Interpret the result of your analysis of the relationship between marijuana use & absenteeism.

Respondents who used marijuana in the past 30 days skipped more days of work on average than respondents who did not use marijuana in the past 30 days.

Step 8: Post to Rpubs

Post this to Rpubs & post the Rpubs URL on blackboard