Overview of the items covered in this submission:
Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
Create a new data frame with a subset of the columns and rows. Make sure to rename it.
Create new column names for the new data frame.
Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
Display enough rows to see examples of all of steps 1-5 above.
BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
Assignment Items
## id vote age
## Min. : 1 Conservative :462 Min. :24.00
## 1st Qu.: 382 Labour :720 1st Qu.:41.00
## Median : 763 Liberal Democrat:343 Median :53.00
## Mean : 763 Mean :54.18
## 3rd Qu.:1144 3rd Qu.:67.00
## Max. :1525 Max. :93.00
## economic.cond.national economic.cond.household Blair
## Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:2.000
## Median :3.000 Median :3.00 Median :4.000
## Mean :3.246 Mean :3.14 Mean :3.334
## 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000
## Max. :5.000 Max. :5.00 Max. :5.000
## Hague Kennedy Europe political.knowledge
## Min. :1.000 Min. :1.000 Min. : 1.000 Average:782
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 4.000 Expert :250
## Median :2.000 Median :3.000 Median : 6.000 Little : 38
## Mean :2.747 Mean :3.135 Mean : 6.729 None :455
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:10.000
## Max. :5.000 Max. :5.000 Max. :11.000
## gender
## female:812
## male :713
##
##
##
##
## [1] The Mean of column AGE is 54.1822950819672 and the Median is 53
## [1] The Mean of column EUROPE is 6.72852459016393 and the Median is 6
sub_BEPS <- subset(BEPS, gender == "male",
Select = "vote":"gender")
colnames(sub_BEPS) [colnames(sub_BEPS)=='age'] <-"sub_age"
colnames(sub_BEPS) [colnames(sub_BEPS)=='Europe'] <-"sub_Europe"
print(paste ("The summary of original dataset is..."), quote = FALSE)
## [1] The summary of original dataset is...
summary (BEPS)
## id vote age
## Min. : 1 Conservative :462 Min. :24.00
## 1st Qu.: 382 Labour :720 1st Qu.:41.00
## Median : 763 Liberal Democrat:343 Median :53.00
## Mean : 763 Mean :54.18
## 3rd Qu.:1144 3rd Qu.:67.00
## Max. :1525 Max. :93.00
## economic.cond.national economic.cond.household Blair
## Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:2.000
## Median :3.000 Median :3.00 Median :4.000
## Mean :3.246 Mean :3.14 Mean :3.334
## 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000
## Max. :5.000 Max. :5.00 Max. :5.000
## Hague Kennedy Europe political.knowledge
## Min. :1.000 Min. :1.000 Min. : 1.000 Average:782
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 4.000 Expert :250
## Median :2.000 Median :3.000 Median : 6.000 Little : 38
## Mean :2.747 Mean :3.135 Mean : 6.729 None :455
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:10.000
## Max. :5.000 Max. :5.000 Max. :11.000
## gender
## female:812
## male :713
##
##
##
##
print(paste ("The summary of NEW dataset is..."), quote = FALSE)
## [1] The summary of NEW dataset is...
summary(sub_BEPS)
## id vote sub_age
## Min. : 2.0 Conservative :203 Min. :24.00
## 1st Qu.: 419.0 Labour :348 1st Qu.:41.00
## Median : 779.0 Liberal Democrat:162 Median :53.00
## Mean : 782.3 Mean :53.85
## 3rd Qu.:1161.0 3rd Qu.:66.00
## Max. :1524.0 Max. :91.00
## economic.cond.national economic.cond.household Blair
## Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :4.000
## Mean :3.297 Mean :3.174 Mean :3.422
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000
## Hague Kennedy sub_Europe political.knowledge
## Min. :1.000 Min. :1.000 Min. : 1.000 Average:360
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 4.000 Expert :165
## Median :2.000 Median :4.000 Median : 6.000 Little : 15
## Mean :2.708 Mean :3.116 Mean : 6.456 None :173
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:10.000
## Max. :5.000 Max. :5.000 Max. :11.000
## gender
## female: 0
## male :713
##
##
##
##
sub_BEPS_age_mean <- mean (sub_BEPS$sub_age, na.rm=TRUE)
sub_BEPS_age_median <- median (sub_BEPS$sub_age, na.rm=TRUE)
sub_BEPS_europe_mean <- mean (sub_BEPS$sub_Europe, na.rm=TRUE,)
sub_BEPS_europe_median <- median (sub_BEPS$sub_Europe, na.rm=TRUE)
print(paste ("The Mean of original column AGE is", BEPS_age_mean, "and the Median is", BEPS_age_median,". After subsetting the data the values are", sub_BEPS_age_mean, "and", sub_BEPS_age_median), quote = FALSE)
## [1] The Mean of original column AGE is 54.1822950819672 and the Median is 53 . After subsetting the data the values are 53.851332398317 and 53
print(paste ("The Mean of original column Europe is", BEPS_europe_mean, "and the Median is", BEPS_europe_median, ". After subsetting the data the new values are", sub_BEPS_europe_mean, "and", sub_BEPS_europe_median), quote = FALSE)
## [1] The Mean of original column Europe is 6.72852459016393 and the Median is 6 . After subsetting the data the new values are 6.45582047685834 and 6
print(paste ("A view of the original column values:"), quote = FALSE)
## [1] A view of the original column values:
head(sub_BEPS, n=20)
## id vote sub_age economic.cond.national
## 2 2 Labour 36 4
## 3 3 Labour 35 4
## 5 5 Labour 41 2
## 6 6 Labour 47 3
## 7 7 Liberal Democrat 57 2
## 8 8 Labour 77 3
## 10 10 Labour 70 3
## 12 12 Labour 66 4
## 16 16 Labour 51 4
## 19 19 Labour 79 3
## 21 21 Labour 38 3
## 22 22 Liberal Democrat 53 2
## 23 23 Labour 59 3
## 24 24 Conservative 44 2
## 29 29 Labour 44 3
## 30 30 Labour 61 4
## 32 32 Labour 66 3
## 34 34 Labour 62 4
## 38 38 Labour 52 4
## 45 45 Labour 37 4
## economic.cond.household Blair Hague Kennedy sub_Europe
## 2 4 4 4 4 5
## 3 4 5 2 3 3
## 5 2 1 1 4 6
## 6 4 4 4 2 4
## 7 2 4 4 2 11
## 8 4 4 1 4 1
## 10 2 5 1 1 11
## 12 3 4 4 4 9
## 16 4 4 4 4 5
## 19 3 4 2 4 1
## 21 3 4 4 2 7
## 22 1 2 4 4 5
## 23 3 4 2 2 1
## 24 4 4 4 4 9
## 29 3 4 2 4 1
## 30 3 5 1 2 1
## 32 2 2 2 2 6
## 34 3 4 2 2 1
## 38 3 4 4 4 3
## 45 3 4 2 4 4
## political.knowledge gender
## 2 Average male
## 3 Average male
## 5 Average male
## 6 Average male
## 7 Average male
## 8 None male
## 10 Average male
## 12 Average male
## 16 None male
## 19 None male
## 21 None male
## 22 Average male
## 23 Average male
## 24 Average male
## 29 Average male
## 30 Average male
## 32 None male
## 34 Average male
## 38 Average male
## 45 Average male
sub_BEPS$political.knowledge <- as.character(sub_BEPS$political.knowledge)
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "None"] <- 1
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Little"] <- 2
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Average"] <- 3
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Expert"] <- 4
sub_BEPS$political.knowledge <- as.integer(sub_BEPS$political.knowledge)
print(paste ("A view of the replaced column values:"), quote = FALSE)
## [1] A view of the replaced column values:
head(sub_BEPS, n=20)
## id vote sub_age economic.cond.national
## 2 2 Labour 36 4
## 3 3 Labour 35 4
## 5 5 Labour 41 2
## 6 6 Labour 47 3
## 7 7 Liberal Democrat 57 2
## 8 8 Labour 77 3
## 10 10 Labour 70 3
## 12 12 Labour 66 4
## 16 16 Labour 51 4
## 19 19 Labour 79 3
## 21 21 Labour 38 3
## 22 22 Liberal Democrat 53 2
## 23 23 Labour 59 3
## 24 24 Conservative 44 2
## 29 29 Labour 44 3
## 30 30 Labour 61 4
## 32 32 Labour 66 3
## 34 34 Labour 62 4
## 38 38 Labour 52 4
## 45 45 Labour 37 4
## economic.cond.household Blair Hague Kennedy sub_Europe
## 2 4 4 4 4 5
## 3 4 5 2 3 3
## 5 2 1 1 4 6
## 6 4 4 4 2 4
## 7 2 4 4 2 11
## 8 4 4 1 4 1
## 10 2 5 1 1 11
## 12 3 4 4 4 9
## 16 4 4 4 4 5
## 19 3 4 2 4 1
## 21 3 4 4 2 7
## 22 1 2 4 4 5
## 23 3 4 2 2 1
## 24 4 4 4 4 9
## 29 3 4 2 4 1
## 30 3 5 1 2 1
## 32 2 2 2 2 6
## 34 3 4 2 2 1
## 38 3 4 4 4 3
## 45 3 4 2 4 4
## political.knowledge gender
## 2 3 male
## 3 3 male
## 5 3 male
## 6 3 male
## 7 3 male
## 8 1 male
## 10 3 male
## 12 3 male
## 16 1 male
## 19 1 male
## 21 1 male
## 22 3 male
## 23 3 male
## 24 3 male
## 29 3 male
## 30 3 male
## 32 1 male
## 34 3 male
## 38 3 male
## 45 3 male
print(paste ("The bonus is answered in the first chunk ~ lines 12-15"), quote = FALSE)
## [1] The bonus is answered in the first chunk ~ lines 12-15