One of the challenges in working with data is wrangling. In this assignment we will use R to perform several tasks. Here is a list of data sets: http://vincentarelbundock.github.io/Rdatasets/ We will be selecting a set from this csv index list
#Statistics of Deadly Quarrels, 'Quarrels' from the HistData package
library( HistData )
#nrow( Quarrels )
#ncol( Quarrels )
head( Quarrels )
## ID year international colonial revolution nat.grp grp.grpSame grp.grpDif
## 1 1 1914 1 0 0 0 0 0
## 2 2 1914 1 0 0 0 0 0
## 3 3 1914 1 0 0 0 0 0
## 4 4 1914 0 0 0 1 0 0
## 5 5 1914 1 0 0 0 0 0
## 6 6 1914 0 0 0 1 0 0
## numGroups months pairs monthsPairs logDeaths deaths exchangeGoods
## 1 16 52 44 1436 7.2 15900000 0
## 2 17 43 44 1436 7.2 15900000 0
## 3 17 52 44 1436 7.2 15900000 0
## 4 15 53 44 1436 7.2 15900000 0
## 5 17 52 44 1436 7.2 15900000 0
## 6 16 33 44 1436 7.2 15900000 0
## obstacleGoods intermarriageOK intermarriageBan simBody difBody simDress
## 1 1 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## difDress eqWealth difWealth simMariagCust difMariagCust simRelig difRelig
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 1 0
## 3 0 0 0 0 0 1 0
## 4 0 0 0 0 0 0 2
## 5 0 0 0 0 0 1 0
## 6 0 0 0 0 0 0 2
## philanthropy restrictMigration sameLanguage difLanguage simArtSci travel
## 1 0 0 0 1 0 0
## 2 0 0 0 1 0 0
## 3 0 0 0 1 0 0
## 4 0 0 0 1 0 0
## 5 1 0 0 1 0 0
## 6 0 0 0 1 0 0
## ignorance simPersLiberty difPersLiberty sameGov sameGovYrs prevConflict
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 1
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## prevConflictYrs chronicFighting persFriendship persResentment difLegal
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 55 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## nonintervention thirdParty supportEnemy attackAlly rivalsLand rivalsTrade
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## churchPower noExtension territory habitation minerals StrongHold taxation
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## loot objectedWar enjoyFight pride overpopulated fightForPay joinWinner
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## otherDesiredWar propaganda3rd protection sympathy debt prevAllies yearsAllies
## 1 0 0 0 1 0 0 0
## 2 0 0 0 0 0 1 14
## 3 0 0 0 0 0 1 14
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## intermingled interbreeding propadanda orderedObey commerceOther feltStronger
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## competeIntellect insecureGovt prepWar RegionalError CasualtyError Auxiliaries
## 1 0 0 0 3 3 2
## 2 0 0 2 3 3 2
## 3 0 0 2 3 3 2
## 4 0 0 0 2 3 1
## 5 0 0 2 3 3 2
## 6 0 0 0 2 3 1
Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes
#Using the summary function to gain an overview of the data set
#colnames( Quarrels )
#dim( Quarrels )
summary( Quarrels )
## ID year international colonial
## Min. : 1.0 Min. :1807 Min. :0.0000 Min. :0.0000
## 1st Qu.:195.5 1st Qu.:1859 1st Qu.:0.0000 1st Qu.:0.0000
## Median :390.0 Median :1898 Median :0.0000 Median :0.0000
## Mean :390.0 Mean :1892 Mean :0.2721 Mean :0.1412
## 3rd Qu.:584.5 3rd Qu.:1925 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :779.0 Max. :1949 Max. :1.0000 Max. :1.0000
## revolution nat.grp grp.grpSame grp.grpDif
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.1399 Mean :0.1759 Mean :0.1926 Mean :0.07831
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## numGroups months pairs monthsPairs
## Min. : 2.000 Min. : 1.00 Min. : 1.00 Min. : 1.0
## 1st Qu.: 3.000 1st Qu.: 6.00 1st Qu.: 2.00 1st Qu.: 13.0
## Median : 4.000 Median : 13.00 Median : 4.00 Median : 64.0
## Mean : 6.593 Mean : 27.25 Mean :16.51 Mean : 532.1
## 3rd Qu.: 8.000 3rd Qu.: 36.00 3rd Qu.:10.50 3rd Qu.: 360.0
## Max. :75.000 Max. :642.00 Max. :90.00 Max. :3010.0
## logDeaths deaths exchangeGoods obstacleGoods intermarriageOK
## Min. :2.400 Min. : 0 0:742 0:758 0:778
## 1st Qu.:3.000 1st Qu.: 1000 1: 11 1: 21 1: 1
## Median :4.000 Median : 10000 2: 26
## Mean :4.418 Mean : 3272504
## 3rd Qu.:5.400 3rd Qu.: 251000
## Max. :7.300 Max. :20000000
## intermarriageBan simBody difBody simDress difDress eqWealth difWealth
## 0:778 0:688 0:652 0:725 0:641 0:776 0:729
## 2: 1 1: 91 1:127 1: 54 1:138 1: 3 1: 50
##
##
##
##
## simMariagCust difMariagCust simRelig difRelig philanthropy restrictMigration
## 0:774 0:767 0:611 0:438 0:777 0:762
## 1: 5 1: 12 1:168 1: 9 1: 2 1: 17
## 2:332
##
##
##
## sameLanguage difLanguage simArtSci travel ignorance simPersLiberty
## 0:612 0:316 0:775 0:777 0:773 0:779
## 1:167 1:463 1: 4 1: 1 1: 2
## 2: 1 2: 4
##
##
##
## difPersLiberty sameGov sameGovYrs prevConflict prevConflictYrs
## 0:736 0:545 Min. : 0.0 0:568 Min. : 0.0
## 1: 43 1:234 1st Qu.: 0.0 1:211 1st Qu.: 0.0
## Median : 0.0 Median : 0.0
## Mean :132.7 Mean : 10.4
## 3rd Qu.: 6.0 3rd Qu.: 1.0
## Max. :999.0 Max. :999.0
## chronicFighting persFriendship persResentment difLegal nonintervention
## 0:764 0:778 0:749 0:768 0:777
## 1: 15 1: 1 1: 14 1: 11 1: 2
## 2: 16
##
##
##
## thirdParty supportEnemy attackAlly rivalsLand rivalsTrade churchPower
## 0:739 0:765 0:757 0:773 0:776 0:768
## 1: 40 1: 14 1: 22 1: 6 1: 3 1: 11
##
##
##
##
## noExtension territory habitation minerals StrongHold taxation loot
## 0:778 0:634 0:771 0:769 0:769 0:751 0:751
## 1: 1 1: 66 1: 5 1: 7 1: 8 1: 28 1: 26
## 2: 79 2: 3 2: 3 2: 2 2: 2
##
##
##
## objectedWar enjoyFight pride overpopulated fightForPay joinWinner
## 0:777 0:768 0:766 0:770 0:770 0:770
## 1: 2 1: 7 1: 7 1: 8 1: 7 1: 9
## 2: 4 2: 6 2: 1 2: 2
##
##
##
## otherDesiredWar propaganda3rd protection sympathy debt prevAllies
## 0:734 0:779 0:779 0:737 0:762 0:684
## 1: 45 1: 39 1: 17 1: 95
## 2: 3
##
##
##
## yearsAllies intermingled interbreeding propadanda orderedObey
## Min. : 0.000 0:736 0:776 0:777 0:697
## 1st Qu.: 0.000 1: 43 1: 3 1: 2 1: 82
## Median : 0.000
## Mean : 1.682
## 3rd Qu.: 0.000
## Max. :116.000
## commerceOther feltStronger competeIntellect insecureGovt prepWar
## 0:771 0:771 0:778 0:769 0:753
## 1: 8 1: 8 1: 1 1: 9 1: 12
## 2: 1 2: 14
##
##
##
## RegionalError CasualtyError Auxiliaries
## Min. :1.000 Min. :1.000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000
## Median :2.000 Median :3.000 Median :2.000
## Mean :1.837 Mean :2.616 Mean :1.901
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :3.000 Max. :6.000 Max. :2.000
#displaying the mean and median of 2 attributes
deaths_descriptiveSTATS <- list( Mean = mean( Quarrels$deaths ),
Median = median(Quarrels$deaths ) )
months_descriptiveSTATS <- list( Mean = mean( Quarrels$months ),
Median = median( Quarrels$months ) )
example_descriptiveSTATS <- list( DeathSTATs = deaths_descriptiveSTATS,
MonthSTATs = months_descriptiveSTATS )
example_descriptiveSTATS
## $DeathSTATs
## $DeathSTATs$Mean
## [1] 3272504
##
## $DeathSTATs$Median
## [1] 10000
##
##
## $MonthSTATs
## $MonthSTATs$Mean
## [1] 27.24775
##
## $MonthSTATs$Median
## [1] 13
Create a new data frame with a subset of the columns and rows. Make sure to rename it.
subQuarrels <- Quarrels
subQuarrels <- subQuarrels[, c( 'year', 'international', 'colonial', 'revolution', 'deaths', 'months' ) ]
head( subQuarrels )
## year international colonial revolution deaths months
## 1 1914 1 0 0 15900000 52
## 2 1914 1 0 0 15900000 43
## 3 1914 1 0 0 15900000 52
## 4 1914 0 0 0 15900000 53
## 5 1914 1 0 0 15900000 52
## 6 1914 0 0 0 15900000 33
Create new column names for the new data frame
names( subQuarrels ) <- c( 'Year', 'International', 'Colonial',
'Revolution', 'Deaths', 'Months')
head( subQuarrels )
## Year International Colonial Revolution Deaths Months
## 1 1914 1 0 0 15900000 52
## 2 1914 1 0 0 15900000 43
## 3 1914 1 0 0 15900000 52
## 4 1914 0 0 0 15900000 53
## 5 1914 1 0 0 15900000 52
## 6 1914 0 0 0 15900000 33
Use the summary function to create an overview of your new data frame. Print the mean and median for the same two attributes. Please compare.
summary( subQuarrels )
## Year International Colonial Revolution
## Min. :1807 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1859 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1898 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :1892 Mean :0.2721 Mean :0.1412 Mean :0.1399
## 3rd Qu.:1925 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1949 Max. :1.0000 Max. :1.0000 Max. :1.0000
## Deaths Months
## Min. : 0 Min. : 1.00
## 1st Qu.: 1000 1st Qu.: 6.00
## Median : 10000 Median : 13.00
## Mean : 3272504 Mean : 27.25
## 3rd Qu.: 251000 3rd Qu.: 36.00
## Max. :20000000 Max. :642.00
#class( subQuarrels )
deaths_descriptiveSTATS2 <- list( Mean = mean( subQuarrels$Deaths ),
Median = median(subQuarrels$Deaths))
months_descriptiveSTATS2 <-list(Mean=mean(subQuarrels$Months ),
Median = median( subQuarrels$Months ))
example_descriptiveSTATS2 <-list( DeathSTATs =deaths_descriptiveSTATS2,
MonthSTATs =months_descriptiveSTATS2)
#display the descriptive stats for the same 2 features (deaths & months)
example_descriptiveSTATS2
## $DeathSTATs
## $DeathSTATs$Mean
## [1] 3272504
##
## $DeathSTATs$Median
## [1] 10000
##
##
## $MonthSTATs
## $MonthSTATs$Mean
## [1] 27.24775
##
## $MonthSTATs$Median
## [1] 13
#use the function 'identical' to directly compare the same descriptive stats from the previous calculation.
#identical returned 'TRUE', therefore the values are the same
identical( example_descriptiveSTATS, example_descriptiveSTATS2 )
## [1] TRUE
For at least 3 values in a column, please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter ‘e’ in one column. Rename those values so that all 20 would show as excellent.
#reassigning the values in the 'Colonial' and 'Revolution' collumns to 2 and 3 respectively
#but first, I will display the headers before any changes and order the dataframe such that the entries that will be changes are listed at the top.
head( subQuarrels[ order( -subQuarrels$Colonial ), ] )
## Year International Colonial Revolution Deaths Months
## 203 1868 0 1 0 198000 110
## 212 1895 0 1 0 200000 41
## 216 1905 0 1 0 251000 18
## 217 1873 0 1 0 251000 421
## 218 1880 0 1 0 251000 177
## 220 1903 0 1 0 79000 5
head( subQuarrels[ order( -subQuarrels$Revolution ), ] )
## Year International Colonial Revolution Deaths Months
## 12 1917 0 0 1 15900000 17
## 33 1915 0 0 1 15900000 44
## 55 1933 0 0 1 20000000 143
## 96 1941 0 0 1 20000000 1
## 134 1851 0 0 1 2000000 156
## 135 1861 0 0 1 631000 48
#now to change the values
subQuarrels$Colonial[ subQuarrels$Colonial == 1 ] <- 2
subQuarrels$Revolution[ subQuarrels$Revolution == 1 ] <- 3
Display enough rows to see examples of all of steps 1-5 above.
#redisplay the header ordered such that we can varify the change
head( subQuarrels[ order( -subQuarrels$Colonial ), ] )
## Year International Colonial Revolution Deaths Months
## 203 1868 0 2 0 198000 110
## 212 1895 0 2 0 200000 41
## 216 1905 0 2 0 251000 18
## 217 1873 0 2 0 251000 421
## 218 1880 0 2 0 251000 177
## 220 1903 0 2 0 79000 5
head( subQuarrels[ order( -subQuarrels$Revolution ), ] )
## Year International Colonial Revolution Deaths Months
## 12 1917 0 0 3 15900000 17
## 33 1915 0 0 3 15900000 44
## 55 1933 0 0 3 20000000 143
## 96 1941 0 0 3 20000000 1
## 134 1851 0 0 3 2000000 156
## 135 1861 0 0 3 631000 48
place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
#will start by writing the data
write.csv( Quarrels, file ='Quarrels.csv', row.names = FALSE)
#Great! I see it in my directory, so now i'll beam it up to git.....
#......
#That worked, what a time to be alive.
library(RCurl)
## Loading required package: bitops
myURL <- 'https://raw.githubusercontent.com/SmilodonCub/MSDS2020_Bridge/master/Quarrels.csv'
gitQuarrels <- read.csv( url( myURL ) )
head( gitQuarrels )
## ID year international colonial revolution nat.grp grp.grpSame grp.grpDif
## 1 1 1914 1 0 0 0 0 0
## 2 2 1914 1 0 0 0 0 0
## 3 3 1914 1 0 0 0 0 0
## 4 4 1914 0 0 0 1 0 0
## 5 5 1914 1 0 0 0 0 0
## 6 6 1914 0 0 0 1 0 0
## numGroups months pairs monthsPairs logDeaths deaths exchangeGoods
## 1 16 52 44 1436 7.2 15900000 0
## 2 17 43 44 1436 7.2 15900000 0
## 3 17 52 44 1436 7.2 15900000 0
## 4 15 53 44 1436 7.2 15900000 0
## 5 17 52 44 1436 7.2 15900000 0
## 6 16 33 44 1436 7.2 15900000 0
## obstacleGoods intermarriageOK intermarriageBan simBody difBody simDress
## 1 1 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## difDress eqWealth difWealth simMariagCust difMariagCust simRelig difRelig
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 1 0
## 3 0 0 0 0 0 1 0
## 4 0 0 0 0 0 0 2
## 5 0 0 0 0 0 1 0
## 6 0 0 0 0 0 0 2
## philanthropy restrictMigration sameLanguage difLanguage simArtSci travel
## 1 0 0 0 1 0 0
## 2 0 0 0 1 0 0
## 3 0 0 0 1 0 0
## 4 0 0 0 1 0 0
## 5 1 0 0 1 0 0
## 6 0 0 0 1 0 0
## ignorance simPersLiberty difPersLiberty sameGov sameGovYrs prevConflict
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 1
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## prevConflictYrs chronicFighting persFriendship persResentment difLegal
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 55 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## nonintervention thirdParty supportEnemy attackAlly rivalsLand rivalsTrade
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## churchPower noExtension territory habitation minerals StrongHold taxation
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## loot objectedWar enjoyFight pride overpopulated fightForPay joinWinner
## 1 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## otherDesiredWar propaganda3rd protection sympathy debt prevAllies yearsAllies
## 1 0 0 0 1 0 0 0
## 2 0 0 0 0 0 1 14
## 3 0 0 0 0 0 1 14
## 4 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0
## intermingled interbreeding propadanda orderedObey commerceOther feltStronger
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## competeIntellect insecureGovt prepWar RegionalError CasualtyError Auxiliaries
## 1 0 0 0 3 3 2
## 2 0 0 2 3 3 2
## 3 0 0 2 3 3 2
## 4 0 0 0 2 3 1
## 5 0 0 2 3 3 2
## 6 0 0 0 2 3 1
#there is a god. & it's name is StackOverflow