I will be analyzing data from the provided National Survey of Drug Use and Health data set. I hypothesize that there is a statistically significant difference between the mean values of risk of mental illness among nonmeth users and meth users who responded to the survey. The two variables that I will be analyzing are meth use (categorical variable) and risk of mental illness (dependent variable).
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
dandh<-read.csv("C:/Users/12055/Downloads/Skills drill 3/SOC333_NSDUH_2016.csv")
DandH<-dandh%>%
select(k6score, meth_month)%>%
filter(meth_month %in% c("Yes", "No"))%>%
filter(!is.na(k6score))
DandH%>%
group_by(meth_month)%>%
summarize(k6score=mean(k6score, na.rm=TRUE))
## # A tibble: 2 x 2
## meth_month k6score
## <chr> <dbl>
## 1 No 4.40
## 2 Yes 11.0
According to the mean k6score variable reflecting risk of mental illness, meth users are at a much greater risk of mental illness than nonmeth users.
DandH%>%
group_by(meth_month)%>%
summarize(k6score = mean(k6score, na.rm=TRUE))%>%
ggplot()+
geom_col(aes(x=meth_month, y=k6score, fill=meth_month))+
scale_fill_manual(values = c("No" = "white", "Yes" ="black"))
Based on the bar graph above that compares the k6scores of non-meth users versus meth users, the elevated risk of mental illness from use is staggering.
DandH%>%
ggplot()+
geom_histogram(aes(x=k6score, fill=meth_month))+
facet_wrap(~meth_month)+
scale_fill_manual(values = c("No" = "White", "Yes" ="Black"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Visualization of the distribution of non-meth users versus meth users in this data set demonstrates that a very small number of respondents reported using meth. The distribution of the k6score for non users does not follow a normal distribution.
nonmethUsers_DandH<-DandH%>%
filter(meth_month=="No")
methUsers_DandH<-DandH%>%
filter(meth_month=="Yes")
nonmethUsers_Samp_Distro<-replicate(10000, sample(nonmethUsers_DandH$k6score, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)
methUsers_DandH_Samp_Distro<-replicate(10000, sample(methUsers_DandH$k6score, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)
ggplot()+
geom_histogram(data=nonmethUsers_Samp_Distro, aes(x=mean), fill="white")+
geom_histogram(data=methUsers_DandH_Samp_Distro, aes(x=mean), fill="black")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The sampling distributions of non meth users and meth users are both distributed normally, but the distribution of meth users lies farther right on the x-axis, reflecting the higher k6scores, or risk of mental illness of meth users.
t.test(k6score~meth_month, data=DandH)
##
## Welch Two Sample t-test
##
## data: k6score by meth_month
## t = -11.894, df = 146.49, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.667249 -5.482351
## sample estimates:
## mean in group No mean in group Yes
## 4.404792 10.979592
According to the results of the t-test, there is a statistically significant difference between the mean k6scores within the sampling distributions of respondents who do not use meth and respondents who use meth based on a confidence interval of alpha = 0.05.