I will be analyzing data from the provided NHIS data set. I hypothesize that there is a significant difference between the mean values of the bmi of married respondents and never married respondents. The two variables that I will be analyzing are marital status (independent variable) and health bmi(dependent variable).
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
NHISdata <- read.csv("C:/Users/12055/Documents/CUNY - Undergrad/Spring 2021 - CUNY/Data 333/Data/NHIS Data.csv")
nhisdata<-NHISdata%>%
select(Health_BMI_N, Demo_marital_C)%>%
filter(Demo_marital_C %in% c("Married", "Never Married"))
nhisdata%>%
summarize(Health_BMI_N=mean(Health_BMI_N, na.rm=TRUE))
## Health_BMI_N
## 1 27.21887
nhisdata%>%
group_by(Demo_marital_C)%>%
summarize(Health_BMI_N = mean(Health_BMI_N, na.rm=TRUE))
## # A tibble: 2 x 2
## Demo_marital_C Health_BMI_N
## <chr> <dbl>
## 1 Married 27.4
## 2 Never Married 26.9
nhisdata%>%
group_by(Demo_marital_C)%>%
summarize(Health_BMI_N = mean(Health_BMI_N, na.rm=TRUE))%>%
ggplot()+
geom_col(aes(x=Demo_marital_C, y=Health_BMI_N, fill=Demo_marital_C))+
scale_fill_manual(values = c("Married" = "purple", "Never Married" ="pink"))
## Interpretation
The mean health BMI for the data set is 27.22. Comparing this mean to the mean health BMIs between respondents who are married and those who are never married, it seems that the BMIs do not differ greatly. This may indicate that there is no significant difference between respondents based on marital status.
nhisdata%>%
ggplot()+
geom_histogram(aes(x=Health_BMI_N, fill=Demo_marital_C))+
facet_wrap(~Demo_marital_C)+
scale_fill_manual(values = c("Married" = "purple", "Never Married" ="pink"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 16905 rows containing non-finite values (stat_bin).
Both categories of respondents have bell-shaped distributions. There is a slight skew on the right tail of both distributions.
Married_nhisdata<-nhisdata%>%
filter(Demo_marital_C=="Married")
NeverMarried_nhisdata<-nhisdata%>%
filter(Demo_marital_C=="Never Married")
Married_Samp_Distro<-replicate(10000, sample(Married_nhisdata$Health_BMI_N, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)
NeverMarried_Samp_Distro<-replicate(10000, sample(NeverMarried_nhisdata$Health_BMI_N, 40)%>%
mean(na.rm=TRUE))%>%
data.frame()%>%
rename("mean"=1)
ggplot()+
geom_histogram(data=Married_Samp_Distro, aes(x=mean), fill="purple")+
geom_histogram(data=NeverMarried_Samp_Distro, aes(x=mean), fill="pink")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
According to the results of the t-test, there is a significant difference in the mean values between the BMI of married respondents and never married respondents based on a confidence interval of alpha = 0.05.
t.test(Health_BMI_N~Demo_marital_C, data=nhisdata)
##
## Welch Two Sample t-test
##
## data: Health_BMI_N by Demo_marital_C
## t = 27.974, df = 277324, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.5135481 0.5909325
## sample estimates:
## mean in group Married mean in group Never Married
## 27.41481 26.86257