Load the packages necessary to (1)import, (2)manipulate, and (3)visualize data.
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
Import your data into R
workabsenteeism<- read_csv("C:/Users/12055/Downloads/SkillsDrill1Data.csv")
## Rows: 32040 Columns: 47
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): marij_month, cocaine_month, pharmamonth
## dbl (44): marij_ever, marij_year, cocaine_ever, cocaine_year, crack_ever, cr...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Preview the first 6 rows of your data
head(workabsenteeism)
## # A tibble: 6 x 47
## marij_ever marij_month marij_year cocaine_ever cocaine_month cocaine_year
## <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 1 Used Marijuana ~ 1 1 Did not use ~ 0
## 2 0 Did not use Mar~ 0 0 Did not use ~ 0
## 3 1 Did not use Mar~ 0 0 Did not use ~ 0
## 4 0 Did not use Mar~ 0 0 Did not use ~ 0
## 5 0 Did not use Mar~ 0 0 Did not use ~ 0
## 6 0 Did not use Mar~ 0 0 Did not use ~ 0
## # ... with 41 more variables: crack_ever <dbl>, crack_month <dbl>,
## # crack_year <dbl>, heroin_ever <dbl>, heroin_month <dbl>, heroin_year <dbl>,
## # hallucinogen_ever <dbl>, hallucinogen_month <dbl>, hallucinogen_year <dbl>,
## # inhalant_ever <dbl>, inhalant_month <dbl>, inhalant_year <dbl>,
## # meth_ever <dbl>, meth_month <dbl>, meth_year <dbl>, painrelieve_ever <dbl>,
## # painrelieve_month <dbl>, painrelieve_year <dbl>, tranq_ever <dbl>,
## # tranq_month <dbl>, tranq_year <dbl>, stimulant_ever <dbl>, ...
Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork
workabsenteeism%>%
select(cocaine_month,SelectiveLeave)%>%
rename(DaysSkippedWork = SelectiveLeave)%>%
filter(DaysSkippedWork < 31)%>%
summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 1 x 1
## AvgDaysSkipped
## <dbl>
## 1 0.327
Select the cocaine_month and SelectiveLeave variables from the data Rename the SelectiveLeave variable to DaysSkippedWork filter to only keep those observations where DaysSkippedWork is less than than 31 Calculate the mean of the DaysSkippedWork, by cocaine use (cocaine_month)
workabsenteeism%>%
select(cocaine_month,SelectiveLeave)%>%
rename(DaysSkippedWork = SelectiveLeave)%>%
filter(DaysSkippedWork < 31)%>%
group_by(cocaine_month)%>%
summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 2 x 2
## cocaine_month AvgDaysSkipped
## <chr> <dbl>
## 1 Did not use Cocaine Past 30 days 0.322
## 2 Used Cocaine Past 30 days 0.85
Individuals who had consumed cocaine within the past 30 days showed more than double the average days skippped from work.
Copy the code from step 5, and paste into this code chunk. Add onto the code to create a visualization which shows a
workabsenteeism%>%
select(cocaine_month,SelectiveLeave)%>%
rename(DaysSkippedWork = SelectiveLeave)%>%
filter(DaysSkippedWork < 31)%>%
group_by(cocaine_month)%>%
summarize(AvgDaysSkipped = mean(DaysSkippedWork))%>%
ggplot()+
geom_col(aes(x=cocaine_month, y=AvgDaysSkipped, fill=AvgDaysSkipped))
Compare average absenteeism between marijuana users & non-users (marij_month). Be sure to filter to only keep those observations where DaysSkippedWork is less than than 31.
workabsenteeism%>%
select(marij_month,SelectiveLeave)%>%
rename(DaysSkippedWork = SelectiveLeave)%>%
filter(DaysSkippedWork < 31)%>%
group_by(marij_month)%>%
summarize(AvgDaysSkipped = mean(DaysSkippedWork))
## # A tibble: 2 x 2
## marij_month AvgDaysSkipped
## <chr> <dbl>
## 1 Did not use Marijuana Past 30 days 0.290
## 2 Used Marijuana Past 30 days 0.591
Produce a column chart to visualize the average absenteeism for marijuana users vs. non-users
workabsenteeism%>%
select(marij_month,SelectiveLeave)%>%
rename(DaysSkippedWork = SelectiveLeave)%>%
filter(DaysSkippedWork < 31)%>%
group_by(marij_month)%>%
summarize(AvgDaysSkipped = mean(DaysSkippedWork))%>%
ggplot()+
geom_col(aes(x=marij_month, y=AvgDaysSkipped, fill=AvgDaysSkipped))
Interpret the result of your analysis of the relationship between marijuana use & absenteeism.
Based on results from the National Survey of Drug Use and Health, marijuana users were more likely to choose to skip work.