Safiya
Instructor Turner
11/28/2018
Assignment 13: Data Analysis Draft
This analysis hopes to identify if there are any significant differences between the sexes- male and female in their adolescent lives. Specifically, this analysis will explore how respondents differ in their feelings, actions and thoughts toward suicide, sexual intercourse and drug & alcohol abuse.
Note: Wherever the word ‘Biosex’ shows up in the questions, it refers to the both categories- male and female that belong to variable sex.
Note to viewer: for the sole purpose of this analysis. Variables ‘FirstSex’ and ‘SexPartners’ were re-coded to create continuous variables. The YRBSS codebook does not have these variables listed as such.
library(readr)
library(dplyr)
library(ggplot2)
library(knitr)
FinalYRBSS<-read_csv("/Users/safiesaf/Downloads/YRBS1991_2017(2).csv")
FinalYRBSS<-FinalYRBSS%>%
rename("BioSex"=sex,
"ConsiderSuicide"=qn26,
"ActualSuicide"=qn28,
"InjurySuicide"=qn29,
"FirstSex"=qn60,
"SexPartners"=qn61,
"BCMethod"=qn65,
"AlcoholConsumption"=qn40,
"FirstAlcohol"=qn41,
"MarijuanaUse"=qn46)%>%
mutate(BioSex=ifelse(BioSex==1,"Female",
ifelse(BioSex==2,"Male",NA)),
ConsiderSuicide=ifelse(ConsiderSuicide==1,"Considered attempting suicide",
ifelse(ConsiderSuicide==2,"Have not considered attempting suicide",NA)),
ActualSuicide=ifelse(ActualSuicide==1,"Have attempted suicide",
ifelse(ActualSuicide==2,"Have not attempted suicide",NA)),
InjurySuicide=ifelse(InjurySuicide==1,"Suicide attempt resulted in injury",
ifelse(InjurySuicide==2,"Suicide attempt did not result in injury",NA)),
FirstSex=ifelse(FirstSex==1,"Had sex for the first time before age 13",
ifelse(FirstSex==2,"Have not had sex for the first time before age 13",NA)),
SexPartners=ifelse(SexPartners==1,"Had sex with four or more persons in their life",
ifelse(SexPartners==2,"Have not had sex with four or more persons in their life",NA)),
BCMethod=ifelse(BCMethod==1,"Have used birth control before last sexual intercourse",
ifelse(BCMethod==2,"Have not used birth control before last sexual intercourse",NA)),
AlcoholConsumption=ifelse(AlcoholConsumption==1,"Have drank at least one drink of alcohol",
ifelse(AlcoholConsumption==2,"Have not drank at least one drink of alcohol",NA)),
FirstAlcohol=ifelse(FirstAlcohol==1,"Had first drink of alcohol before age 13",
ifelse(FirstAlcohol==2,"Have not had first drink of alcohol before age 13",NA)),
MarijuanaUse=ifelse(MarijuanaUse==1,"Have used marijuana one or more times in life",
ifelse(MarijuanaUse==2,"Have not used marijuana one or more times in life",NA)))%>%
select(BioSex,ConsiderSuicide,ActualSuicide,InjurySuicide,
FirstSex,SexPartners,BCMethod,AlcoholConsumption,
FirstAlcohol,MarijuanaUse)
#summary(FinalYRBSS)
#unique(FinalYRBSS)
#table(FinalYRBSS)
To see how suicide varies by sex, we will study 3 indicatiors:
First I shall run a prop.table to observe how the both sexes differ in their serious consideration of attempted suicide.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$ConsiderSuicide),1)%>%
round(2)%>%
kable()
| Considered attempting suicide | Have not considered attempting suicide | |
|---|---|---|
| Female | 0.24 | 0.76 |
| Male | 0.13 | 0.87 |
Results show that 76% of females and 87% of males have NOT seriously considered attempting suicide. However, when it comes to the sex that did seriously consider attempting suicide the most,females considered attempting suicide approximately 11% more than males.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
ConsiderSuicide=="Considered attempting suicide"|
ConsiderSuicide=="Have not considered attempting suicide")%>%
group_by(BioSex,ConsiderSuicide)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Consideration of Suicide by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=ConsiderSuicide))
I shall then run a chi squared test to observe whether or not there is a statistically significant difference between a person’s sex and their serious consideration of attempted suicide.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must reject the null hypothesis. In other words, there is notable difference in consideration of suicde attempt between the sexes.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$ConsiderSuicide)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$ConsiderSuicide
## X-squared = 3612.9, df = 1, p-value < 2.2e-16
First I shall run a prop.table to observe how both sexes differ in the frequency of suicide attempts.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$ActualSuicide),1)%>%
round(2)%>%
kable()
| Have attempted suicide | Have not attempted suicide | |
|---|---|---|
| Female | 0.11 | 0.89 |
| Male | 0.06 | 0.94 |
Results show that 11% of females and 6% of males have attempted suicide one or more times during the past 12 months prior to being surveyed. On the other hand, 89% of females and 94% of males have NOT attempted suicide one or more times in the past 12 months prior to being surveyed.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
ActualSuicide=="Have attempted suicide"|
ActualSuicide=="Have not attempted suicide")%>%
group_by(BioSex,ActualSuicide)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Actual Suicide Attempts by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=ActualSuicide))
I shall then run a chi squared test to oberve whether or not there is a statistically significant relationship between a person’s sex and their attempt to commit suicide one or more times during the past 12 months (prior to being surveyed)
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must reject the null hypothesis. There is notable difference in attempted suicide one or more times during the past 12 months (prior to being surveyed) between the sexes.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$ActualSuicide)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$ActualSuicide
## X-squared = 1892.6, df = 1, p-value < 2.2e-16
First I shall run a prop.table to observe how the both sexes differ in terms of any injuries that were possibly caused by suicide.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$InjurySuicide),1)%>%
round(2)%>%
kable()
| Suicide attempt did not result in injury | Suicide attempt resulted in injury | |
|---|---|---|
| Female | 0.97 | 0.03 |
| Male | 0.98 | 0.02 |
Results show that 97% of females and 98% of males have NOT experienced any injury, poisoning or overdose etc.that was a result from attempted suicide. Those respondents that did experience injury, 3% were girls and 2% were boys.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
InjurySuicide=="Suicide attempt resulted in injury"|
InjurySuicide=="Suicide attempt did not result in injury")%>%
group_by(BioSex,InjurySuicide)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Suicidal Injuries by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=InjurySuicide))
I shall then run a chi squared test to oberve whether or not there is a statistically significant relationship between a person’s sex, and the injury caused by actual suicide attempts.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of possible injury caused by suicide attempts.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$InjurySuicide)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$InjurySuicide
## X-squared = 331.89, df = 1, p-value < 2.2e-16
To see how sexual intercourse varies by sex, we will study 3 indicatiors:
First I shall run a prop.table to observe how both sexes differ in their sexual intercourse experience for the first time before age 13.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$FirstSex),1)%>%
round(2)%>%
kable()
| Had sex for the first time before age 13 | Have not had sex for the first time before age 13 | |
|---|---|---|
| Female | 0.04 | 0.96 |
| Male | 0.13 | 0.87 |
Results show that 42% of females and 13% of males had sexual intercourse for the first time before age 13. Whereas, 96% of females and 87% of males did not have sexual intercourse for the first time before age 13.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
FirstSex=="Had sex for the first time before age 13"|
FirstSex=="Have not had sex for the first time before age 13")%>%
group_by(BioSex,FirstSex)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Age of first sexual encounter by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=FirstSex))
I shall then run a Chi Square to oberve whether or not there is a statistically significant relationship between the sexes in terms of sexual intercourse for the first time before age 13.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of sexual intercourse for the first time before age 13.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$FirstSex)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$FirstSex
## X-squared = 4683.8, df = 1, p-value < 2.2e-16
First I shall run a prop.table to observe how both sexes differ in sexual intercourse with four or more persons during their life.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$SexPartners),1)%>%
round(2)%>%
kable()
| Had sex with four or more persons in their life | Have not had sex with four or more persons in their life | |
|---|---|---|
| Female | 0.13 | 0.87 |
| Male | 0.23 | 0.77 |
Results show that 13% of females and 23% of males had sex with four or more persons during their life. Whereas, 87% of females and 77% of males reported that they have not had four or more persons during their life.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
SexPartners=="Had sex with four or more persons in their life"|
SexPartners=="Have not had sex with four or more persons in their life")%>%
group_by(BioSex,SexPartners)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Amount of Sexual Partners by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=SexPartners))
I shall then run a Chi Square test to observe whether or not there is a statistically significant relationship between the sexes in terms of sexual intercourse with four or more persons during their life.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of sexual intercourse with four or more persons during their life.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$SexPartners)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$SexPartners
## X-squared = 3119.8, df = 1, p-value < 2.2e-16
First I shall run a prop.table to observe how both sexes differ in usage of birth control prior to their last sexual intercourse.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$BCMethod),1)%>%
round(2)%>%
kable()
| Have not used birth control before last sexual intercourse | Have used birth control before last sexual intercourse | |
|---|---|---|
| Female | 0.82 | 0.18 |
| Male | 0.88 | 0.12 |
Results show that 18% of females and 12% of males reportedly use birth control prior to their last sexual intercourse. Whereas, 82% of females and 88% of males did not use birth control prior to their last sexual intercourse.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
BCMethod=="Have used birth control before last sexual intercourse"|
BCMethod=="Have not used birth control before last sexual intercourse")%>%
group_by(BioSex,BCMethod)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Safe Sex Practices by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=BCMethod))
I shall then run a chi squared test to oberve whether or not there is a statistically significant relationship between a person’s sex, and their use of birth control prior to their last sexual intercourse.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of birth control use prior to their last sexual intercourse.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$BCMethod)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$BCMethod
## X-squared = 438.94, df = 1, p-value < 2.2e-16
To see how alcohol & drugs varies by sex, we will study 3 indicatiors:
First I shall run a prop.table to observe how the both sexes differ in their alcohol consumption (at least one drink of alcohol on at least one day during their life)
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$AlcoholConsumption),1)%>%
round(2)%>%
kable()
| Have drank at least one drink of alcohol | Have not drank at least one drink of alcohol | |
|---|---|---|
| Female | 0.75 | 0.25 |
| Male | 0.74 | 0.26 |
Results show that 75% of females and 74% of males reportedly drank alcohol (at least one drink on at least one day during their life.) Whereas, 25% of females and 26% of males have not drank one drink of alcohol on at least one day during their life.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
AlcoholConsumption=="Have drank at least one drink of alcohol"|
AlcoholConsumption=="Have not drank at least one drink of alcohol")%>%
group_by(BioSex,AlcoholConsumption)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Alcohol Consumption by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=AlcoholConsumption))
I shall then run a chi squared test to oberve whether or not there is a statistically significant relationship between a person’s sex, and if they ever drank alcohol (at least one drink on at least one day during their life)
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of birth control use prior to their last sexual intercourse.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$AlcoholConsumption)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$AlcoholConsumption
## X-squared = 15.924, df = 1, p-value = 6.592e-05
First I shall run a prop.table to observe how the both sexes differ in their first drink of alcohol BEFORE age 13.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$FirstAlcohol),1)%>%
round(2)%>%
kable()
| Had first drink of alcohol before age 13 | Have not had first drink of alcohol before age 13 | |
|---|---|---|
| Female | 0.22 | 0.78 |
| Male | 0.30 | 0.70 |
Results show that 22% of females and 30% of males had their first sips of alcohol BEFORE age 13. Whereas 78% of females and 70% of males have not had a few sips of alcohol before the age of 13.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
FirstAlcohol=="Had first drink of alcohol before age 13"|
FirstAlcohol=="Have not had first drink of alcohol before age 13")%>%
group_by(BioSex,FirstAlcohol)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("First time alcohol use by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=FirstAlcohol))
I shall then run a chi squared test to observe whether or not there is a statistically significant relationship between a person’s sex, and their first sips of alcohol before the age of 13.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of their first sips of alcohol prior to the age of 13.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$FirstAlcohol)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$FirstAlcohol
## X-squared = 1798.5, df = 1, p-value < 2.2e-16
First I shall run a prop.table to observe how the both sexes differ in utilization of marijuana one or more times during their life.
prop.table(table(FinalYRBSS$BioSex,FinalYRBSS$MarijuanaUse),1)%>%
round(2)%>%
kable()
| Have not used marijuana one or more times in life | Have used marijuana one or more times in life | |
|---|---|---|
| Female | 0.63 | 0.37 |
| Male | 0.56 | 0.44 |
Results show that 37% of females and 44% of males have used marijuana one or more times during their life. Whereas, 63% of females and 56% of males have reportedly used marijuana one or more times during their life.
FinalYRBSS%>%
filter(BioSex=="Male"|
BioSex=="Female",
MarijuanaUse=="Have used marijuana one or more times in life"|
MarijuanaUse=="Have not used marijuana one or more times in life")%>%
group_by(BioSex,MarijuanaUse)%>%
summarize(n=n())%>%
mutate(freq=n/sum(n))%>%
ggplot()+
ggtitle("Marijuana use by Sex")+
geom_col(aes(x=BioSex, y=freq, fill=MarijuanaUse))
I shall then run a chi squared test to oberve whether or not there is a statistically significant relationship between a person’s sex, and their utilization of marijuana one or more times during their life.
According to the analysis, when running Pearson’s Chi Square test. The p value of the categorical variables is < 0.05, therefore, it IS statistically different and we must REJECT the null hypothesis. There is notable difference between the sexes in terms of marijuana use one or more times during their life.
chisq.test(FinalYRBSS$BioSex,FinalYRBSS$MarijuanaUse)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: FinalYRBSS$BioSex and FinalYRBSS$MarijuanaUse
## X-squared = 1118, df = 1, p-value < 2.2e-16
After running various statistical tests on the data, it was found that in terms of suicide, females led the way, over males, in terms of considerattion of suicide, frequency of suicide, injury caused by attempting suicide and actual suicide attempts. In terms of sexual intercourse, males led the way, over females, where sexual intercourse was concerned before age 13 and sex with four or more people in their lifetime. In terms of safe sex practices with birth control, females led the way in this category. Lastly, when dealing with alcohol and drugs, females have reportedly had one drink of alcohol on at least one day during their life. Males on the other hand led the way over females in terms of their first alcoholic drink before age 13 and their first marijuana use (one or more times during their life)