### load data
data<-read.csv("https://raw.githubusercontent.com/zahirf/Data606/master/raw.csv", sep=",")
names(data)
## [1] "ï..country" "year" "sex"
## [4] "age" "suicides_no" "population"
## [7] "suicides.100k.pop" "country.year" "HDI.for.year"
## [10] "gdp_for_year...." "gdp_per_capita...." "generation"
head(data)
## ï..country year sex age suicides_no population
## 1 Antigua and Barbuda 2014 female 15-24 years 0 8537
## 2 Antigua and Barbuda 2014 female 25-34 years 0 7578
## 3 Antigua and Barbuda 2014 female 35-54 years 0 15273
## 4 Antigua and Barbuda 2014 female 5-14 years 0 8296
## 5 Antigua and Barbuda 2014 female 55-74 years 0 6085
## 6 Antigua and Barbuda 2014 female 75+ years 0 1686
## suicides.100k.pop country.year HDI.for.year gdp_for_year....
## 1 0 Antigua and Barbuda2014 0.783 1280133333
## 2 0 Antigua and Barbuda2014 0.783 1280133333
## 3 0 Antigua and Barbuda2014 0.783 1280133333
## 4 0 Antigua and Barbuda2014 0.783 1280133333
## 5 0 Antigua and Barbuda2014 0.783 1280133333
## 6 0 Antigua and Barbuda2014 0.783 1280133333
## gdp_per_capita.... generation
## 1 14093 Millenials
## 2 14093 Millenials
## 3 14093 Generation X
## 4 14093 Generation Z
## 5 14093 Boomers
## 6 14093 Silent
summary(data)
## ï..country year sex age
## Antigua and Barbuda: 12 Min. :2014 female:468 15-24 years:156
## Argentina : 12 1st Qu.:2014 male :468 25-34 years:156
## Armenia : 12 Median :2014 35-54 years:156
## Australia : 12 Mean :2014 5-14 years :156
## Austria : 12 3rd Qu.:2014 55-74 years:156
## Bahrain : 12 Max. :2014 75+ years :156
## (Other) :864
## suicides_no population suicides.100k.pop
## Min. : 0.0 Min. : 960 Min. : 0.000
## 1st Qu.: 4.0 1st Qu.: 172142 1st Qu.: 1.268
## Median : 29.0 Median : 525932 Median : 5.565
## Mean : 238.2 Mean : 2042796 Mean : 11.011
## 3rd Qu.: 126.0 3rd Qu.: 1677010 3rd Qu.: 14.178
## Max. :11455.0 Max. :41858354 Max. :124.450
##
## country.year HDI.for.year gdp_for_year....
## Antigua and Barbuda2014: 12 Min. :0.6270 Min. :7.252e+08
## Argentina2014 : 12 1st Qu.:0.7500 1st Qu.:3.134e+10
## Armenia2014 : 12 Median :0.8180 Median :1.180e+11
## Australia2014 : 12 Mean :0.8085 Mean :7.185e+11
## Austria2014 : 12 3rd Qu.:0.8830 3rd Qu.:4.993e+11
## Bahrain2014 : 12 Max. :0.9440 Max. :1.743e+13
## (Other) :864 NA's :36
## gdp_per_capita.... generation
## Min. : 1465 Boomers :156
## 1st Qu.: 8849 Generation X:156
## Median : 15950 Generation Z:156
## Mean : 27420 Millenials :312
## 3rd Qu.: 41869 Silent :156
## Max. :126352
##
Is suicide rate affected by Human Development Index Score and Generation? ###The research question about the relationship between suicide rates and millenials popped up on my mind after reading the article cited below ###https://www.businessinsider.com/perfectionism-causing-more-early-deaths-and-suicides-among-millennials-2018-9 ###Regarding HDI, as we expect the life expectancy and other human development scores to increase in a country, it may be expected that lower suicide rates will prevail.
There are 936 cases in the dataset Each case summarizes the no of suicides in the year 2014 by generation and by gender The sub classifications used for generation are Millenials, Boomers, Generation X, Generation Y and Silent
summary(data$generation)
The data is collected from Kaggle. The original dataset contains data from 1985 to 2016 and contains socio economic data and suicide rates by year and country I have formed a subset using only 2014 data which is the latest dataset containing the largest sample size. The subset can be found at https://raw.githubusercontent.com/zahirf/Data606/master/raw.csv **
This is an observational study
The dependant variable is the rate of suicides as a % of population. It is a discrete quantitative variable I intend to run inferences on differences in proportion between suicide rates between different categories of generation. Also intend to run correlation on HDI Index scores and suicide rates
There are two independent variables. The first one is the generation the person belongs to. It is a qualitative variable. The second one is the Human Development Index score of the country that the person belongs to. It is a continuous quantitative variable.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(data$suicides_no)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 4.0 29.0 238.2 126.0 11455.0
summary(data$HDI.for.year)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.6270 0.7500 0.8180 0.8085 0.8830 0.9440 36
summary(data$generation)
## Boomers Generation X Generation Z Millenials Silent
## 156 156 156 312 156
plot(aggregate(data$suicides_no/data$population, by=list(Category=data$generation), FUN=sum))
plot(aggregate(data$suicides_no/data$population, by=list(Category=data$HDI.for.year), FUN=sum))