R Bridge Course Week 2 Assignment

One of the challenges in working with data is wrangling. In this assignment we will use R to perform several tasks. Here is a list of data sets: http://vincentarelbundock.github.io/Rdatasets/ We will be selecting a set from this csv index list

Selecting a CSV, “Statistics of Deadly Quarrels”

#Statistics of Deadly Quarrels, 'Quarrels' from the HistData package
library( HistData )
#nrow( Quarrels )
#ncol( Quarrels )
head( Quarrels )
##   ID year international colonial revolution nat.grp grp.grpSame grp.grpDif
## 1  1 1914             1        0          0       0           0          0
## 2  2 1914             1        0          0       0           0          0
## 3  3 1914             1        0          0       0           0          0
## 4  4 1914             0        0          0       1           0          0
## 5  5 1914             1        0          0       0           0          0
## 6  6 1914             0        0          0       1           0          0
##   numGroups months pairs monthsPairs logDeaths   deaths exchangeGoods
## 1        16     52    44        1436       7.2 15900000             0
## 2        17     43    44        1436       7.2 15900000             0
## 3        17     52    44        1436       7.2 15900000             0
## 4        15     53    44        1436       7.2 15900000             0
## 5        17     52    44        1436       7.2 15900000             0
## 6        16     33    44        1436       7.2 15900000             0
##   obstacleGoods intermarriageOK intermarriageBan simBody difBody simDress
## 1             1               0                0       0       0        0
## 2             0               0                0       0       0        0
## 3             0               0                0       0       0        0
## 4             0               0                0       0       0        0
## 5             0               0                0       0       0        0
## 6             0               0                0       0       0        0
##   difDress eqWealth difWealth simMariagCust difMariagCust simRelig difRelig
## 1        0        0         0             0             0        0        0
## 2        0        0         0             0             0        1        0
## 3        0        0         0             0             0        1        0
## 4        0        0         0             0             0        0        2
## 5        0        0         0             0             0        1        0
## 6        0        0         0             0             0        0        2
##   philanthropy restrictMigration sameLanguage difLanguage simArtSci travel
## 1            0                 0            0           1         0      0
## 2            0                 0            0           1         0      0
## 3            0                 0            0           1         0      0
## 4            0                 0            0           1         0      0
## 5            1                 0            0           1         0      0
## 6            0                 0            0           1         0      0
##   ignorance simPersLiberty difPersLiberty sameGov sameGovYrs prevConflict
## 1         0              0              0       0          0            0
## 2         0              0              0       0          0            0
## 3         0              0              0       0          0            1
## 4         0              0              0       0          0            0
## 5         0              0              0       0          0            0
## 6         0              0              0       0          0            0
##   prevConflictYrs chronicFighting persFriendship persResentment difLegal
## 1               0               0              0              0        0
## 2               0               0              0              0        0
## 3              55               0              0              0        0
## 4               0               0              0              0        0
## 5               0               0              0              0        0
## 6               0               0              0              0        0
##   nonintervention thirdParty supportEnemy attackAlly rivalsLand rivalsTrade
## 1               0          0            0          0          0           0
## 2               0          0            0          0          0           0
## 3               0          0            0          0          0           0
## 4               0          0            0          0          0           0
## 5               0          0            0          0          0           0
## 6               0          0            0          0          0           0
##   churchPower noExtension territory habitation minerals StrongHold taxation
## 1           0           0         0          0        0          0        0
## 2           0           0         0          0        0          0        0
## 3           0           0         0          0        0          0        0
## 4           0           0         0          0        0          0        0
## 5           0           0         0          0        0          0        0
## 6           0           0         0          0        0          0        0
##   loot objectedWar enjoyFight pride overpopulated fightForPay joinWinner
## 1    0           0          0     0             0           0          0
## 2    0           0          0     0             0           0          0
## 3    0           0          0     0             0           0          0
## 4    0           0          0     0             0           0          0
## 5    0           0          0     0             0           0          0
## 6    0           0          0     0             0           0          0
##   otherDesiredWar propaganda3rd protection sympathy debt prevAllies yearsAllies
## 1               0             0          0        1    0          0           0
## 2               0             0          0        0    0          1          14
## 3               0             0          0        0    0          1          14
## 4               0             0          0        0    0          0           0
## 5               0             0          0        0    0          0           0
## 6               0             0          0        0    0          0           0
##   intermingled interbreeding propadanda orderedObey commerceOther feltStronger
## 1            0             0          0           0             0            0
## 2            0             0          0           0             0            0
## 3            0             0          0           0             0            0
## 4            0             0          0           0             0            0
## 5            0             0          0           0             0            0
## 6            0             0          0           0             0            0
##   competeIntellect insecureGovt prepWar RegionalError CasualtyError Auxiliaries
## 1                0            0       0             3             3           2
## 2                0            0       2             3             3           2
## 3                0            0       2             3             3           2
## 4                0            0       0             2             3           1
## 5                0            0       2             3             3           2
## 6                0            0       0             2             3           1

Task 1:

Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes

#Using the summary function to gain an overview of the data set
#colnames( Quarrels )
#dim( Quarrels )
summary( Quarrels )
##        ID             year      international       colonial     
##  Min.   :  1.0   Min.   :1807   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:195.5   1st Qu.:1859   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :390.0   Median :1898   Median :0.0000   Median :0.0000  
##  Mean   :390.0   Mean   :1892   Mean   :0.2721   Mean   :0.1412  
##  3rd Qu.:584.5   3rd Qu.:1925   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :779.0   Max.   :1949   Max.   :1.0000   Max.   :1.0000  
##    revolution        nat.grp        grp.grpSame       grp.grpDif     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.00000  
##  Mean   :0.1399   Mean   :0.1759   Mean   :0.1926   Mean   :0.07831  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##    numGroups          months           pairs        monthsPairs    
##  Min.   : 2.000   Min.   :  1.00   Min.   : 1.00   Min.   :   1.0  
##  1st Qu.: 3.000   1st Qu.:  6.00   1st Qu.: 2.00   1st Qu.:  13.0  
##  Median : 4.000   Median : 13.00   Median : 4.00   Median :  64.0  
##  Mean   : 6.593   Mean   : 27.25   Mean   :16.51   Mean   : 532.1  
##  3rd Qu.: 8.000   3rd Qu.: 36.00   3rd Qu.:10.50   3rd Qu.: 360.0  
##  Max.   :75.000   Max.   :642.00   Max.   :90.00   Max.   :3010.0  
##    logDeaths         deaths         exchangeGoods obstacleGoods intermarriageOK
##  Min.   :2.400   Min.   :       0   0:742         0:758         0:778          
##  1st Qu.:3.000   1st Qu.:    1000   1: 11         1: 21         1:  1          
##  Median :4.000   Median :   10000   2: 26                                      
##  Mean   :4.418   Mean   : 3272504                                              
##  3rd Qu.:5.400   3rd Qu.:  251000                                              
##  Max.   :7.300   Max.   :20000000                                              
##  intermarriageBan simBody difBody simDress difDress eqWealth difWealth
##  0:778            0:688   0:652   0:725    0:641    0:776    0:729    
##  2:  1            1: 91   1:127   1: 54    1:138    1:  3    1: 50    
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  simMariagCust difMariagCust simRelig difRelig philanthropy restrictMigration
##  0:774         0:767         0:611    0:438    0:777        0:762            
##  1:  5         1: 12         1:168    1:  9    1:  2        1: 17            
##                                       2:332                                  
##                                                                              
##                                                                              
##                                                                              
##  sameLanguage difLanguage simArtSci travel  ignorance simPersLiberty
##  0:612        0:316       0:775     0:777   0:773     0:779         
##  1:167        1:463       1:  4     1:  1   1:  2                   
##                                     2:  1   2:  4                   
##                                                                     
##                                                                     
##                                                                     
##  difPersLiberty sameGov   sameGovYrs    prevConflict prevConflictYrs
##  0:736          0:545   Min.   :  0.0   0:568        Min.   :  0.0  
##  1: 43          1:234   1st Qu.:  0.0   1:211        1st Qu.:  0.0  
##                         Median :  0.0                Median :  0.0  
##                         Mean   :132.7                Mean   : 10.4  
##                         3rd Qu.:  6.0                3rd Qu.:  1.0  
##                         Max.   :999.0                Max.   :999.0  
##  chronicFighting persFriendship persResentment difLegal nonintervention
##  0:764           0:778          0:749          0:768    0:777          
##  1: 15           1:  1          1: 14          1: 11    1:  2          
##                                 2: 16                                  
##                                                                        
##                                                                        
##                                                                        
##  thirdParty supportEnemy attackAlly rivalsLand rivalsTrade churchPower
##  0:739      0:765        0:757      0:773      0:776       0:768      
##  1: 40      1: 14        1: 22      1:  6      1:  3       1: 11      
##                                                                       
##                                                                       
##                                                                       
##                                                                       
##  noExtension territory habitation minerals StrongHold taxation loot   
##  0:778       0:634     0:771      0:769    0:769      0:751    0:751  
##  1:  1       1: 66     1:  5      1:  7    1:  8      1: 28    1: 26  
##              2: 79     2:  3      2:  3    2:  2               2:  2  
##                                                                       
##                                                                       
##                                                                       
##  objectedWar enjoyFight pride   overpopulated fightForPay joinWinner
##  0:777       0:768      0:766   0:770         0:770       0:770     
##  1:  2       1:  7      1:  7   1:  8         1:  7       1:  9     
##              2:  4      2:  6   2:  1         2:  2                 
##                                                                     
##                                                                     
##                                                                     
##  otherDesiredWar propaganda3rd protection sympathy debt    prevAllies
##  0:734           0:779         0:779      0:737    0:762   0:684     
##  1: 45                                    1: 39    1: 17   1: 95     
##                                           2:  3                      
##                                                                      
##                                                                      
##                                                                      
##   yearsAllies      intermingled interbreeding propadanda orderedObey
##  Min.   :  0.000   0:736        0:776         0:777      0:697      
##  1st Qu.:  0.000   1: 43        1:  3         1:  2      1: 82      
##  Median :  0.000                                                    
##  Mean   :  1.682                                                    
##  3rd Qu.:  0.000                                                    
##  Max.   :116.000                                                    
##  commerceOther feltStronger competeIntellect insecureGovt prepWar
##  0:771         0:771        0:778            0:769        0:753  
##  1:  8         1:  8        1:  1            1:  9        1: 12  
##                                              2:  1        2: 14  
##                                                                  
##                                                                  
##                                                                  
##  RegionalError   CasualtyError    Auxiliaries   
##  Min.   :1.000   Min.   :1.000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median :3.000   Median :2.000  
##  Mean   :1.837   Mean   :2.616   Mean   :1.901  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:2.000  
##  Max.   :3.000   Max.   :6.000   Max.   :2.000
#displaying the mean and median of 2 attributes

deaths_descriptiveSTATS <- list( Mean = mean( Quarrels$deaths ),
                                 Median = median(Quarrels$deaths ) ) 
months_descriptiveSTATS <- list( Mean = mean( Quarrels$months ),
                                 Median = median( Quarrels$months ) )
example_descriptiveSTATS <- list( DeathSTATs = deaths_descriptiveSTATS,
                                  MonthSTATs = months_descriptiveSTATS )
example_descriptiveSTATS
## $DeathSTATs
## $DeathSTATs$Mean
## [1] 3272504
## 
## $DeathSTATs$Median
## [1] 10000
## 
## 
## $MonthSTATs
## $MonthSTATs$Mean
## [1] 27.24775
## 
## $MonthSTATs$Median
## [1] 13

Task #2:

Create a new data frame with a subset of the columns and rows. Make sure to rename it.

subQuarrels <- Quarrels
subQuarrels <- subQuarrels[, c( 'year', 'international', 'colonial', 'revolution', 'deaths', 'months' ) ]
head( subQuarrels )
##   year international colonial revolution   deaths months
## 1 1914             1        0          0 15900000     52
## 2 1914             1        0          0 15900000     43
## 3 1914             1        0          0 15900000     52
## 4 1914             0        0          0 15900000     53
## 5 1914             1        0          0 15900000     52
## 6 1914             0        0          0 15900000     33

Task #3:

Create new column names for the new data frame

names( subQuarrels ) <- c( 'Year', 'International', 'Colonial',
                           'Revolution', 'Deaths', 'Months')
head( subQuarrels )
##   Year International Colonial Revolution   Deaths Months
## 1 1914             1        0          0 15900000     52
## 2 1914             1        0          0 15900000     43
## 3 1914             1        0          0 15900000     52
## 4 1914             0        0          0 15900000     53
## 5 1914             1        0          0 15900000     52
## 6 1914             0        0          0 15900000     33

Task #4:

Use the summary function to create an overview of your new data frame. Print the mean and median for the same two attributes. Please compare.

summary( subQuarrels )
##       Year      International       Colonial        Revolution    
##  Min.   :1807   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1859   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1898   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :1892   Mean   :0.2721   Mean   :0.1412   Mean   :0.1399  
##  3rd Qu.:1925   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1949   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      Deaths             Months      
##  Min.   :       0   Min.   :  1.00  
##  1st Qu.:    1000   1st Qu.:  6.00  
##  Median :   10000   Median : 13.00  
##  Mean   : 3272504   Mean   : 27.25  
##  3rd Qu.:  251000   3rd Qu.: 36.00  
##  Max.   :20000000   Max.   :642.00
#class( subQuarrels )
deaths_descriptiveSTATS2 <- list( Mean = mean( subQuarrels$Deaths ),
                                  Median = median(subQuarrels$Deaths)) 
months_descriptiveSTATS2 <-list(Mean=mean(subQuarrels$Months ),
                                 Median = median( subQuarrels$Months ))
example_descriptiveSTATS2 <-list( DeathSTATs =deaths_descriptiveSTATS2,
                                  MonthSTATs =months_descriptiveSTATS2)
#display the descriptive stats for the same 2 features (deaths & months)
example_descriptiveSTATS2
## $DeathSTATs
## $DeathSTATs$Mean
## [1] 3272504
## 
## $DeathSTATs$Median
## [1] 10000
## 
## 
## $MonthSTATs
## $MonthSTATs$Mean
## [1] 27.24775
## 
## $MonthSTATs$Median
## [1] 13
#use the function 'identical' to directly compare the same descriptive stats from the previous calculation.
#identical returned 'TRUE', therefore the values are the same
identical( example_descriptiveSTATS, example_descriptiveSTATS2 )
## [1] TRUE

Task #5:

For at least 3 values in a column, please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter ‘e’ in one column. Rename those values so that all 20 would show as excellent.

#reassigning the values in the 'Colonial' and 'Revolution' collumns to 2 and 3 respectively
#but first, I will display the headers before any changes and order the dataframe such that the entries that will be changes are listed at the top.
head( subQuarrels[ order( -subQuarrels$Colonial ), ] )
##     Year International Colonial Revolution Deaths Months
## 203 1868             0        1          0 198000    110
## 212 1895             0        1          0 200000     41
## 216 1905             0        1          0 251000     18
## 217 1873             0        1          0 251000    421
## 218 1880             0        1          0 251000    177
## 220 1903             0        1          0  79000      5
head( subQuarrels[ order( -subQuarrels$Revolution ), ] )
##     Year International Colonial Revolution   Deaths Months
## 12  1917             0        0          1 15900000     17
## 33  1915             0        0          1 15900000     44
## 55  1933             0        0          1 20000000    143
## 96  1941             0        0          1 20000000      1
## 134 1851             0        0          1  2000000    156
## 135 1861             0        0          1   631000     48
#now to change the values
subQuarrels$Colonial[ subQuarrels$Colonial == 1 ] <- 2
subQuarrels$Revolution[ subQuarrels$Revolution == 1 ] <- 3

Task #6:

Display enough rows to see examples of all of steps 1-5 above.

#redisplay the header ordered such that we can varify the change
head( subQuarrels[ order( -subQuarrels$Colonial ), ] )
##     Year International Colonial Revolution Deaths Months
## 203 1868             0        2          0 198000    110
## 212 1895             0        2          0 200000     41
## 216 1905             0        2          0 251000     18
## 217 1873             0        2          0 251000    421
## 218 1880             0        2          0 251000    177
## 220 1903             0        2          0  79000      5
head( subQuarrels[ order( -subQuarrels$Revolution ), ] )
##     Year International Colonial Revolution   Deaths Months
## 12  1917             0        0          3 15900000     17
## 33  1915             0        0          3 15900000     44
## 55  1933             0        0          3 20000000    143
## 96  1941             0        0          3 20000000      1
## 134 1851             0        0          3  2000000    156
## 135 1861             0        0          3   631000     48

BONUS!

place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

#will start by writing the data
write.csv( Quarrels, file ='Quarrels.csv', row.names = FALSE)
#Great! I see it in my directory, so now i'll beam it up to git.....
#......
#That worked, what a time to be alive.
library(RCurl)
## Loading required package: bitops
myURL <- 'https://raw.githubusercontent.com/SmilodonCub/MSDS2020_Bridge/master/Quarrels.csv'
gitQuarrels <- read.csv( url( myURL ) )
head( gitQuarrels )
##   ID year international colonial revolution nat.grp grp.grpSame grp.grpDif
## 1  1 1914             1        0          0       0           0          0
## 2  2 1914             1        0          0       0           0          0
## 3  3 1914             1        0          0       0           0          0
## 4  4 1914             0        0          0       1           0          0
## 5  5 1914             1        0          0       0           0          0
## 6  6 1914             0        0          0       1           0          0
##   numGroups months pairs monthsPairs logDeaths   deaths exchangeGoods
## 1        16     52    44        1436       7.2 15900000             0
## 2        17     43    44        1436       7.2 15900000             0
## 3        17     52    44        1436       7.2 15900000             0
## 4        15     53    44        1436       7.2 15900000             0
## 5        17     52    44        1436       7.2 15900000             0
## 6        16     33    44        1436       7.2 15900000             0
##   obstacleGoods intermarriageOK intermarriageBan simBody difBody simDress
## 1             1               0                0       0       0        0
## 2             0               0                0       0       0        0
## 3             0               0                0       0       0        0
## 4             0               0                0       0       0        0
## 5             0               0                0       0       0        0
## 6             0               0                0       0       0        0
##   difDress eqWealth difWealth simMariagCust difMariagCust simRelig difRelig
## 1        0        0         0             0             0        0        0
## 2        0        0         0             0             0        1        0
## 3        0        0         0             0             0        1        0
## 4        0        0         0             0             0        0        2
## 5        0        0         0             0             0        1        0
## 6        0        0         0             0             0        0        2
##   philanthropy restrictMigration sameLanguage difLanguage simArtSci travel
## 1            0                 0            0           1         0      0
## 2            0                 0            0           1         0      0
## 3            0                 0            0           1         0      0
## 4            0                 0            0           1         0      0
## 5            1                 0            0           1         0      0
## 6            0                 0            0           1         0      0
##   ignorance simPersLiberty difPersLiberty sameGov sameGovYrs prevConflict
## 1         0              0              0       0          0            0
## 2         0              0              0       0          0            0
## 3         0              0              0       0          0            1
## 4         0              0              0       0          0            0
## 5         0              0              0       0          0            0
## 6         0              0              0       0          0            0
##   prevConflictYrs chronicFighting persFriendship persResentment difLegal
## 1               0               0              0              0        0
## 2               0               0              0              0        0
## 3              55               0              0              0        0
## 4               0               0              0              0        0
## 5               0               0              0              0        0
## 6               0               0              0              0        0
##   nonintervention thirdParty supportEnemy attackAlly rivalsLand rivalsTrade
## 1               0          0            0          0          0           0
## 2               0          0            0          0          0           0
## 3               0          0            0          0          0           0
## 4               0          0            0          0          0           0
## 5               0          0            0          0          0           0
## 6               0          0            0          0          0           0
##   churchPower noExtension territory habitation minerals StrongHold taxation
## 1           0           0         0          0        0          0        0
## 2           0           0         0          0        0          0        0
## 3           0           0         0          0        0          0        0
## 4           0           0         0          0        0          0        0
## 5           0           0         0          0        0          0        0
## 6           0           0         0          0        0          0        0
##   loot objectedWar enjoyFight pride overpopulated fightForPay joinWinner
## 1    0           0          0     0             0           0          0
## 2    0           0          0     0             0           0          0
## 3    0           0          0     0             0           0          0
## 4    0           0          0     0             0           0          0
## 5    0           0          0     0             0           0          0
## 6    0           0          0     0             0           0          0
##   otherDesiredWar propaganda3rd protection sympathy debt prevAllies yearsAllies
## 1               0             0          0        1    0          0           0
## 2               0             0          0        0    0          1          14
## 3               0             0          0        0    0          1          14
## 4               0             0          0        0    0          0           0
## 5               0             0          0        0    0          0           0
## 6               0             0          0        0    0          0           0
##   intermingled interbreeding propadanda orderedObey commerceOther feltStronger
## 1            0             0          0           0             0            0
## 2            0             0          0           0             0            0
## 3            0             0          0           0             0            0
## 4            0             0          0           0             0            0
## 5            0             0          0           0             0            0
## 6            0             0          0           0             0            0
##   competeIntellect insecureGovt prepWar RegionalError CasualtyError Auxiliaries
## 1                0            0       0             3             3           2
## 2                0            0       2             3             3           2
## 3                0            0       2             3             3           2
## 4                0            0       0             2             3           1
## 5                0            0       2             3             3           2
## 6                0            0       0             2             3           1
#there is a god. & it's name is StackOverflow