Variable Selection & Research Question

I will be analyzing data from the provided National Survey of Drug Use and Health data set. I hypothesize that there is a statistically significant difference between the mean values of risk of mental illness among nonmeth users and meth users who responded to the survey. The two variables that I will be analyzing are meth use (categorical variable) and risk of mental illness (dependent variable).

Data Prep

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

dandh<-read.csv("C:/Users/12055/Downloads/Skills drill 3/SOC333_NSDUH_2016.csv")

DandH<-dandh%>%
  select(k6score, meth_month)%>%
  filter(meth_month %in% c("Yes", "No"))%>%
  filter(!is.na(k6score))

Comparison of Means

Table

DandH%>%
  group_by(meth_month)%>%
  summarize(k6score=mean(k6score, na.rm=TRUE))

## # A tibble: 2 x 2
##   meth_month k6score
##   <chr>        <dbl>
## 1 No            4.40
## 2 Yes          11.0

According to the mean k6score variable reflecting risk of mental illness, meth users are at a much greater risk of mental illness than nonmeth users.

Visualization

DandH%>%
  group_by(meth_month)%>%
  summarize(k6score = mean(k6score, na.rm=TRUE))%>%
  ggplot()+
  geom_col(aes(x=meth_month, y=k6score, fill=meth_month))+
     scale_fill_manual(values = c("No" = "white", "Yes" ="black"))

Interpretation

Based on the bar graph above that compares the k6scores of non-meth users versus meth users, the elevated risk of mental illness from use is staggering.

Comparison of Distributions

Visualization

DandH%>%
ggplot()+
  geom_histogram(aes(x=k6score, fill=meth_month))+
  facet_wrap(~meth_month)+
  scale_fill_manual(values = c("No" = "White", "Yes" ="Black"))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Interpretation

Visualization of the distribution of non-meth users versus meth users in this data set demonstrates that a very small number of respondents reported using meth. The distribution of the k6score for non users does not follow a normal distribution.

Sampling Distribution and T-test

Sampling Distribution

nonmethUsers_DandH<-DandH%>%
  filter(meth_month=="No")

methUsers_DandH<-DandH%>%
  filter(meth_month=="Yes")

nonmethUsers_Samp_Distro<-replicate(10000, sample(nonmethUsers_DandH$k6score, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)

methUsers_DandH_Samp_Distro<-replicate(10000, sample(methUsers_DandH$k6score, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)

ggplot()+
  geom_histogram(data=nonmethUsers_Samp_Distro, aes(x=mean), fill="white")+
  geom_histogram(data=methUsers_DandH_Samp_Distro, aes(x=mean), fill="black")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Interpretation of Sampling Distributions

The sampling distributions of non meth users and meth users are both distributed normally, but the distribution of meth users lies farther right on the x-axis, reflecting the higher k6scores, or risk of mental illness of meth users.

T-test

t.test(k6score~meth_month, data=DandH)

## 
##  Welch Two Sample t-test
## 
## data:  k6score by meth_month
## t = -11.894, df = 146.49, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.667249 -5.482351
## sample estimates:
##  mean in group No mean in group Yes 
##          4.404792         10.979592

According to the results of the t-test, there is a statistically significant difference between the mean k6scores within the sampling distributions of respondents who do not use meth and respondents who use meth based on a confidence interval of alpha = 0.05.

Analysis of risk of mental illness based on meth use

Variable Selection & Research Question

Data Prep

Comparison of Means

Table

Visualization

Interpretation

Comparison of Distributions

Visualization

Interpretation

Sampling Distribution and T-test

Sampling Distribution

Interpretation of Sampling Distributions

T-test