R TUTORIAL (I)

Note: If you see have an SSL ERROR when you are getting data from Github, you can try this:

options(width = 320, RCurlOptions = list(verbose = FALSE, capath = system.file("CurlSSL", 
    "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
getOption("RCurlOptions")  # just to confirm ssl status!
## $verbose
## [1] FALSE
## 
## $capath
## [1] "/Library/Frameworks/R.framework/Versions/3.0/Resources/library/RCurl/CurlSSL/cacert.pem"
## 
## $ssl.verifypeer
## [1] FALSE

BUT first try to do it without the lines above

This is our first “basic” step-by-step tutorial where I will simply show how to collect some data,

In general, social data appear in a spreadsheet-like format, where different columns can have different types of values, that is numeric and string, while numeric can also represent different scales.However, data collected from social events may not follow a strict protocol and may bring some challenges when we need to work on it.For instance, take a look at the image below:

Anyway, if the source is idenytified and it looks like the image above we have several options using R. First we can think of copying and pasting the data into an MSExcel file, where we can save it as csv. If the data is in that format, we can get it quickly with a simple command:

setwd("~/Documents/GITHUBrepositories/Tutorials/TemplatesR")
DATA_LocalFile = read.csv("warI.csv")
head(DATA_LocalFile)
##   Start                      Finish                   Conflict                              Combatants Fatalities
## 1  1947                        1948 Indo-Pakistani War of 1947                    Pakistan \xd0  India     ~3,000
## 2  1950 Present (CF signed in 1953)                 Korean War              South Korea -  North Korea 2 419 010+
## 3  1962                        1962            Sino-Indian War                         PRC \xd0  India     ~4,000
## 4  1965                        1965 Indo-Pakistani War of 1965                    Pakistan \xd0  India     ~6,800
## 5  1966                        1989   South African Border War               Angola \xd0  South Africa    Unknown
## 6  1967                        1967             Chola incident  India \xd0  People's Republic of China         ~5

It is sometimes a good idea to store it in GoogleDrive so that you do not depend on your hard drive availability, as shown below:

You can see the previous steps in http://blog.revolutionanalytics.com/.
In our case, the proces is:

require(RCurl)
## Loading required package: RCurl
## Loading required package: bitops
CsvGDocs <- getURL("https://docs.google.com/spreadsheet/pub?key=0AhVqDdZgThPldEZpWlc2Z3FuVm8taTlBVlc1a3VnT2c&output=csv")
DATA_Gdocs = read.csv(text = CsvGDocs)
head(DATA_Gdocs)
##   Start                      Finish                   Conflict                                Combatants Fatalities
## 1  1947                        1948 Indo-Pakistani War of 1947                   Pakistan Ã\u0090  India     ~3,000
## 2  1950 Present (CF signed in 1953)                 Korean War                South Korea -  North Korea 2 419 010+
## 3  1962                        1962            Sino-Indian War                        PRC Ã\u0090  India     ~4,000
## 4  1965                        1965 Indo-Pakistani War of 1965                   Pakistan Ã\u0090  India     ~6,800
## 5  1966                        1989   South African Border War              Angola Ã\u0090  South Africa    Unknown
## 6  1967                        1967             Chola incident India Ã\u0090  People's Republic of China         ~5
str(DATA_Gdocs)
## 'data.frame':    20 obs. of  5 variables:
##  $ Start     : int  1947 1950 1962 1965 1966 1967 1969 1971 1979 1980 ...
##  $ Finish    : Factor w/ 18 levels "1948","1962",..: 1 18 2 3 12 4 5 6 13 11 ...
##  $ Conflict  : Factor w/ 20 levels "1982 EthiopianÃ\u0090Somali Border War",..: 7 12 16 8 19 5 17 9 18 10 ...
##  $ Combatants: Factor w/ 15 levels "Angola Ã\u0090  South Africa",..: 10 13 11 10 1 7 12 10 15 8 ...
##  $ Fatalities: Factor w/ 16 levels "~1,000","~10",..: 4 14 5 10 16 9 16 6 15 12 ...

There are other alternatives as dropbox, but a much better one could be GitHub, where not only your data can be stored, but specially your code:

library(RCurl)
DATA_Github <- getURL("https://raw.github.com/MAGALLANESJoseManuel/SocialScienceDataTools/master/TemplatesR/warI.csv")
DATA_git <- read.csv(text = DATA_Github)
head(DATA_git)
##   Start                      Finish                   Conflict                           Combatants Fatalities
## 1  1947                        1948 Indo-Pakistani War of 1947                    Pakistan Ð  India     ~3,000
## 2  1950 Present (CF signed in 1953)                 Korean War           South Korea -  North Korea 2 419 010+
## 3  1962                        1962            Sino-Indian War                         PRC Ð  India     ~4,000
## 4  1965                        1965 Indo-Pakistani War of 1965                    Pakistan Ð  India     ~6,800
## 5  1966                        1989   South African Border War               Angola Ð  South Africa    Unknown
## 6  1967                        1967             Chola incident  India Ð  People's Republic of China         ~5

We can also get the table from where it has been published (Wikipedia):

Link = "http://en.wikipedia.org/wiki/List_of_border_wars"
whichTable = 3
library(XML)
wikiDataWWW = getNodeSet(htmlParse(Link), "//table")[[whichTable]]
DATA_www = readHTMLTable(wikiDataWWW)
head(DATA_www)
##      V1                          V2                         V3                                 V4         V5
## 1 Start                      Finish                   Conflict                         Combatants Fatalities
## 2  1947                        1948 Indo-Pakistani War of 1947      Pakistan â\u0080\u0093  India     ~3,000
## 3  1950 Present (CF signed in 1953)                 Korean War         South Korea -  North Korea 2 419 010+
## 4  1962                        1962            Sino-Indian War           PRC â\u0080\u0093  India     ~4,000
## 5  1965                        1965 Indo-Pakistani War of 1965      Pakistan â\u0080\u0093  India     ~6,800
## 6  1966                        1989   South African Border War Angola â\u0080\u0093  South Africa    Unknown

Evidently, some more work may be needed, we will see that later.