Data Preparation

# load data
library(dplyr)
#The cases are limited to 2010 statistics for males only
# GoitHub repo: https://github.com/robertwelk/DATA606.git
dat <-read.csv('DATA606_project.csv') %>% as_tibble() %>% 
                                          filter(year==2010, sex=='male') %>% 
                                          select(country=ï..country, year, age, 
                                                 suicide.rate=suicides.100k.pop, gdp=gdp_per_capita....)

head(dat)

Research question

Was suicide rate in the year 2010 for males dependent on age group and per capita income?

Cases

The cases are a age groups of males for each country in the study. There are 528

Data collection

This dataset was compliled by Kaggle from other sources including the UN Developmental Program, World Bank, and World Health OrganiZation

Type of study

This is an observational study

Data Source

https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016#master.csv

Dependent Variable

The response variable is suicide rate/100,000 people and is quantatative

Independent Variable

quantatative independent variable: gdp per capita qualatative independent variable: age

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

# summary stats for each variable
library(ggplot2)

summary(dat$age)
## 15-24 years 25-34 years 35-54 years  5-14 years 55-74 years   75+ years 
##          88          88          88          88          88          88
summary(dat$gdp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     991    7008   13818   23857   36326  111328
#variation in gdp by county 
dat %>% group_by(country) %>%  ggplot(aes(x=country,y=gdp)) +                                                                                  geom_bar(stat='identity') +
                                        coord_flip()

#distribution of suicide rate by age
dat %>% ggplot(aes(x=age,y=suicide.rate)) + geom_boxplot()

# suicide rate as a function of gdp - not much of a relationship
dat %>% ggplot(aes(x=gdp,y=suicide.rate)) + geom_point()