Data Preparation

### load data
data<-read.csv("https://raw.githubusercontent.com/zahirf/Data606/master/raw.csv", sep=",")
names(data)
##  [1] "ï..country"         "year"               "sex"               
##  [4] "age"                "suicides_no"        "population"        
##  [7] "suicides.100k.pop"  "country.year"       "HDI.for.year"      
## [10] "gdp_for_year...."   "gdp_per_capita...." "generation"
head(data)
##            ï..country year    sex         age suicides_no population
## 1 Antigua and Barbuda 2014 female 15-24 years           0       8537
## 2 Antigua and Barbuda 2014 female 25-34 years           0       7578
## 3 Antigua and Barbuda 2014 female 35-54 years           0      15273
## 4 Antigua and Barbuda 2014 female  5-14 years           0       8296
## 5 Antigua and Barbuda 2014 female 55-74 years           0       6085
## 6 Antigua and Barbuda 2014 female   75+ years           0       1686
##   suicides.100k.pop            country.year HDI.for.year gdp_for_year....
## 1                 0 Antigua and Barbuda2014        0.783       1280133333
## 2                 0 Antigua and Barbuda2014        0.783       1280133333
## 3                 0 Antigua and Barbuda2014        0.783       1280133333
## 4                 0 Antigua and Barbuda2014        0.783       1280133333
## 5                 0 Antigua and Barbuda2014        0.783       1280133333
## 6                 0 Antigua and Barbuda2014        0.783       1280133333
##   gdp_per_capita....   generation
## 1              14093   Millenials
## 2              14093   Millenials
## 3              14093 Generation X
## 4              14093 Generation Z
## 5              14093      Boomers
## 6              14093       Silent
summary(data)
##                ï..country       year          sex               age     
##  Antigua and Barbuda: 12   Min.   :2014   female:468   15-24 years:156  
##  Argentina          : 12   1st Qu.:2014   male  :468   25-34 years:156  
##  Armenia            : 12   Median :2014                35-54 years:156  
##  Australia          : 12   Mean   :2014                5-14 years :156  
##  Austria            : 12   3rd Qu.:2014                55-74 years:156  
##  Bahrain            : 12   Max.   :2014                75+ years  :156  
##  (Other)            :864                                                
##   suicides_no        population       suicides.100k.pop
##  Min.   :    0.0   Min.   :     960   Min.   :  0.000  
##  1st Qu.:    4.0   1st Qu.:  172142   1st Qu.:  1.268  
##  Median :   29.0   Median :  525932   Median :  5.565  
##  Mean   :  238.2   Mean   : 2042796   Mean   : 11.011  
##  3rd Qu.:  126.0   3rd Qu.: 1677010   3rd Qu.: 14.178  
##  Max.   :11455.0   Max.   :41858354   Max.   :124.450  
##                                                        
##                   country.year  HDI.for.year    gdp_for_year....   
##  Antigua and Barbuda2014: 12   Min.   :0.6270   Min.   :7.252e+08  
##  Argentina2014          : 12   1st Qu.:0.7500   1st Qu.:3.134e+10  
##  Armenia2014            : 12   Median :0.8180   Median :1.180e+11  
##  Australia2014          : 12   Mean   :0.8085   Mean   :7.185e+11  
##  Austria2014            : 12   3rd Qu.:0.8830   3rd Qu.:4.993e+11  
##  Bahrain2014            : 12   Max.   :0.9440   Max.   :1.743e+13  
##  (Other)                :864   NA's   :36                          
##  gdp_per_capita....        generation 
##  Min.   :  1465     Boomers     :156  
##  1st Qu.:  8849     Generation X:156  
##  Median : 15950     Generation Z:156  
##  Mean   : 27420     Millenials  :312  
##  3rd Qu.: 41869     Silent      :156  
##  Max.   :126352                       
## 

Research question

Is suicide rate affected by Human Development Index Score and Generation? ###The research question about the relationship between suicide rates and millenials popped up on my mind after reading the article cited below ###https://www.businessinsider.com/perfectionism-causing-more-early-deaths-and-suicides-among-millennials-2018-9 ###Regarding HDI, as we expect the life expectancy and other human development scores to increase in a country, it may be expected that lower suicide rates will prevail.

Cases

There are 936 cases in the dataset Each case summarizes the no of suicides in the year 2014 by generation and by gender The sub classifications used for generation are Millenials, Boomers, Generation X, Generation Y and Silent

summary(data$generation)

Data collection

The data is collected from Kaggle. The original dataset contains data from 1985 to 2016 and contains socio economic data and suicide rates by year and country I have formed a subset using only 2014 data which is the latest dataset containing the largest sample size. The subset can be found at https://raw.githubusercontent.com/zahirf/Data606/master/raw.csv **

Type of study

This is an observational study

Data Source

**https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016**

Dependent Variable

The dependant variable is the rate of suicides as a % of population. It is a discrete quantitative variable I intend to run inferences on differences in proportion between suicide rates between different categories of generation. Also intend to run correlation on HDI Index scores and suicide rates

Independent Variable

There are two independent variables. The first one is the generation the person belongs to. It is a qualitative variable. The second one is the Human Development Index score of the country that the person belongs to. It is a continuous quantitative variable.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(data$suicides_no)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     4.0    29.0   238.2   126.0 11455.0
summary(data$HDI.for.year)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.6270  0.7500  0.8180  0.8085  0.8830  0.9440      36
summary(data$generation)
##      Boomers Generation X Generation Z   Millenials       Silent 
##          156          156          156          312          156
plot(aggregate(data$suicides_no/data$population, by=list(Category=data$generation), FUN=sum))

plot(aggregate(data$suicides_no/data$population, by=list(Category=data$HDI.for.year), FUN=sum))