Borjan data analysis

Borjan Jovanov

#open the data
countries <- read.csv("~/Downloads/country_indicators_merged.csv")
gdi <-  read.csv("~/Downloads/data.csv")

#combine the data
data = merge(countries,gdi,by = "country")

#remove missing gini index values
mydf= data %>% 
   group_by(gini.index) %>%
   filter(!any(is.na(gini.index)))

#remove duplicates 
mydf = distinct(mydf, gini.index, .keep_all = TRUE)

Here we can plot the descriptives – which country has highest or lowest

par(cex.axis=0.2)
q = ggplot(mydf, aes(x = reorder(country,gini.index), y = gini.index, width=.3)) + geom_col(width = 0.5) +geom_bar(stat = "identity") 
q = q + theme(axis.text.x = element_text(angle = 90))
q = q + scale_x_discrete(guide = guide_axis(n.dodge=1)) +
  labs(title="Gini index by country", x = "country", y = "gini index") + theme_bw() +
  theme(axis.text.x=element_text(size=rel(0.68), angle=90))
q + scale_fill_distiller(palette = "RdPu")

Here we can plot the analyses – which factors predict the gini index most accurately and can we build predictive models

References

Please find any references to add to document about gini index or the topic or how you got the data