I have some troubles with the configuration of python, so I decided to do the first exercise using R and later on I will fix my installation to continue my project with Python. I tried to use the same or simmilar commands as in Python.
setwd("~/Documentos/Coursera/DMV") #set working directory
options(scipen = 10) #scientific rounding avoiding
datos <- read.csv("gapminder.csv") #read the dataset into a dataframe
names(datos) #lists the names of the variables
## [1] "country" "incomeperperson" "alcconsumption"
## [4] "armedforcesrate" "breastcancerper100th" "co2emissions"
## [7] "femaleemployrate" "hivrate" "internetuserate"
## [10] "lifeexpectancy" "oilperperson" "polityscore"
## [13] "relectricperperson" "suicideper100th" "employrate"
## [16] "urbanrate"
My chosen variables are: incomeperperson, suicideper100th and co2emissions
Number of missing cases:
table(is.na(datos$incomeperperson))
##
## FALSE TRUE
## 190 23
table(is.na(datos$co2emissions))
##
## FALSE TRUE
## 200 13
table(is.na(datos$suicideper100th))
##
## FALSE TRUE
## 191 22
We need to work only with complete cases in order to obtain the frequency of the data
subset1 <- datos[is.na(datos$incomeperperson)==F & is.na(datos$co2emissions)==F
& is.na(datos$suicideper100th)==F, c("incomeperperson","co2emissions","suicideper100th") ]
dim(subset1)
## [1] 173 3
So there is 173 countries with all the variables available
Now we can obtain frecuency tables for the three variables
as.data.frame(hist(subset1$incomeperperson,breaks=5)[c(4,2,3)])
## mids counts density
## 1 5000 136 0.0000786127168
## 2 15000 15 0.0000086705202
## 3 25000 12 0.0000069364162
## 4 35000 9 0.0000052023121
## 5 45000 0 0.0000000000000
## 6 55000 1 0.0000005780347
as.data.frame(hist(subset1$co2emissions,breaks=5)[c(4,2,3)])
## mids counts density
## 1 25000000000 170 0.0000000000196531792
## 2 75000000000 1 0.0000000000001156069
## 3 125000000000 1 0.0000000000001156069
## 4 175000000000 0 0.0000000000000000000
## 5 225000000000 0 0.0000000000000000000
## 6 275000000000 0 0.0000000000000000000
## 7 325000000000 1 0.0000000000001156069
as.data.frame(hist(subset1$suicideper100th,breaks=5)[c(4,2,3)])
## mids counts density
## 1 5 103 0.059537572
## 2 15 57 0.032947977
## 3 25 11 0.006358382
## 4 35 2 0.001156069