R Script for data management and visualization

I have some troubles with the configuration of python, so I decided to do the first exercise using R and later on I will fix my installation to continue my project with Python. I tried to use the same or simmilar commands as in Python.

setwd("~/Documentos/Coursera/DMV") #set working directory
options(scipen = 10) #scientific rounding avoiding
datos <- read.csv("gapminder.csv") #read the dataset into a dataframe
names(datos) #lists the names of the variables

##  [1] "country"              "incomeperperson"      "alcconsumption"      
##  [4] "armedforcesrate"      "breastcancerper100th" "co2emissions"        
##  [7] "femaleemployrate"     "hivrate"              "internetuserate"     
## [10] "lifeexpectancy"       "oilperperson"         "polityscore"         
## [13] "relectricperperson"   "suicideper100th"      "employrate"          
## [16] "urbanrate"

My chosen variables are: incomeperperson, suicideper100th and co2emissions

Number of missing cases:

table(is.na(datos$incomeperperson))

## 
## FALSE  TRUE 
##   190    23

table(is.na(datos$co2emissions))

## 
## FALSE  TRUE 
##   200    13

table(is.na(datos$suicideper100th))

## 
## FALSE  TRUE 
##   191    22

We need to work only with complete cases in order to obtain the frequency of the data

subset1 <- datos[is.na(datos$incomeperperson)==F & is.na(datos$co2emissions)==F 
                 & is.na(datos$suicideper100th)==F, c("incomeperperson","co2emissions","suicideper100th") ]
dim(subset1)

## [1] 173   3

So there is 173 countries with all the variables available

Now we can obtain frecuency tables for the three variables

as.data.frame(hist(subset1$incomeperperson,breaks=5)[c(4,2,3)])

##    mids counts         density
## 1  5000    136 0.0000786127168
## 2 15000     15 0.0000086705202
## 3 25000     12 0.0000069364162
## 4 35000      9 0.0000052023121
## 5 45000      0 0.0000000000000
## 6 55000      1 0.0000005780347

as.data.frame(hist(subset1$co2emissions,breaks=5)[c(4,2,3)])

##           mids counts               density
## 1  25000000000    170 0.0000000000196531792
## 2  75000000000      1 0.0000000000001156069
## 3 125000000000      1 0.0000000000001156069
## 4 175000000000      0 0.0000000000000000000
## 5 225000000000      0 0.0000000000000000000
## 6 275000000000      0 0.0000000000000000000
## 7 325000000000      1 0.0000000000001156069

as.data.frame(hist(subset1$suicideper100th,breaks=5)[c(4,2,3)])

##   mids counts     density
## 1    5    103 0.059537572
## 2   15     57 0.032947977
## 3   25     11 0.006358382
## 4   35      2 0.001156069

R Script for data management and visualization

Jorge de la Vega

7 de agosto de 2016