Variables of interest

  • cocaine_month: Did this respondent report using cocaine in the past 30 days? (Yes/No)
  • SelectiveLeave: # of times in the past 30 days where the respondent skipped work because they “just didnt want to be there”

Load Packages

Load the packages necessary to (1)import, (2)manipulate, and (3)visualize data.

library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5

Import Data

Import your data into R

workabsenteeism<- read_csv("C:/Users/12055/Downloads/SkillsDrill1Data.csv")
## Rows: 32040 Columns: 47
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (3): marij_month, cocaine_month, pharmamonth
## dbl (44): marij_ever, marij_year, cocaine_ever, cocaine_year, crack_ever, cr...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Preview Data

Preview the first 6 rows of your data

head(workabsenteeism)
## # A tibble: 6 x 47
##   marij_ever marij_month      marij_year cocaine_ever cocaine_month cocaine_year
##        <dbl> <chr>                 <dbl>        <dbl> <chr>                <dbl>
## 1          1 Used Marijuana ~          1            1 Did not use ~            0
## 2          0 Did not use Mar~          0            0 Did not use ~            0
## 3          1 Did not use Mar~          0            0 Did not use ~            0
## 4          0 Did not use Mar~          0            0 Did not use ~            0
## 5          0 Did not use Mar~          0            0 Did not use ~            0
## 6          0 Did not use Mar~          0            0 Did not use ~            0
## # ... with 41 more variables: crack_ever <dbl>, crack_month <dbl>,
## #   crack_year <dbl>, heroin_ever <dbl>, heroin_month <dbl>, heroin_year <dbl>,
## #   hallucinogen_ever <dbl>, hallucinogen_month <dbl>, hallucinogen_year <dbl>,
## #   inhalant_ever <dbl>, inhalant_month <dbl>, inhalant_year <dbl>,
## #   meth_ever <dbl>, meth_month <dbl>, meth_year <dbl>, painrelieve_ever <dbl>,
## #   painrelieve_month <dbl>, painrelieve_year <dbl>, tranq_ever <dbl>,
## #   tranq_month <dbl>, tranq_year <dbl>, stimulant_ever <dbl>, ...

Avg Days Skipped Work

Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork

workabsenteeism%>%
  select(cocaine_month,SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 1 x 1
##   AvgDaysSkipped
##            <dbl>
## 1          0.327

Avg Days Skipped Work by Cocaine Use

Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork, by cocaine use (cocaine_month)

workabsenteeism%>%
  select(cocaine_month,SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(cocaine_month)%>%
  summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 2 x 2
##   cocaine_month                    AvgDaysSkipped
##   <chr>                                     <dbl>
## 1 Did not use Cocaine Past 30 days          0.322
## 2 Used Cocaine Past 30 days                 0.85

Interpretation

Individuals who had consumed cocaine within the past 30 days showed more than double the average days skippped from work.

Visualization

Copy the code from step 5, and paste into this code chunk. Add onto the code to create a visualization which shows a

  • column (col) chart
  • Cocaine use on the x-axis
  • average days of selective work absence on the y-axis
  • vary the fill of the line acording to the average days of selective work absence.
workabsenteeism%>%
  select(cocaine_month,SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(cocaine_month)%>%
  summarize(AvgDaysSkipped = mean(DaysSkippedWork))%>%
  ggplot()+
  geom_col(aes(x=cocaine_month, y=AvgDaysSkipped, fill=AvgDaysSkipped))

Compare average absenteeism between marijuana users & non-users (marij_month). Be sure to filter to only keep those observations where DaysSkippedWork is less than than 31.

workabsenteeism%>%
  select(marij_month,SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(marij_month)%>%
  summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 2 x 2
##   marij_month                        AvgDaysSkipped
##   <chr>                                       <dbl>
## 1 Did not use Marijuana Past 30 days          0.290
## 2 Used Marijuana Past 30 days                 0.591

Produce a column chart to visualize the average absenteeism for marijuana users vs. non-users

workabsenteeism%>%
  select(marij_month,SelectiveLeave)%>%
  rename(DaysSkippedWork = SelectiveLeave)%>%
  filter(DaysSkippedWork < 31)%>%
  group_by(marij_month)%>%
  summarize(AvgDaysSkipped = mean(DaysSkippedWork))%>%
  ggplot()+
  geom_col(aes(x=marij_month, y=AvgDaysSkipped, fill=AvgDaysSkipped))

Interpret the result of your analysis of the relationship between marijuana use & absenteeism.

Based on results from the National Survey of Drug Use and Health, marijuana users were more likely to choose to skip work.