R Markdown

Overview of the items covered in this submission:

  1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

  2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

  3. Create new column names for the new data frame.

  4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

  5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

  6. Display enough rows to see examples of all of steps 1-5 above.

  7. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.


Assignment Items

  1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
##        id                     vote          age       
##  Min.   :   1   Conservative    :462   Min.   :24.00  
##  1st Qu.: 382   Labour          :720   1st Qu.:41.00  
##  Median : 763   Liberal Democrat:343   Median :53.00  
##  Mean   : 763                          Mean   :54.18  
##  3rd Qu.:1144                          3rd Qu.:67.00  
##  Max.   :1525                          Max.   :93.00  
##  economic.cond.national economic.cond.household     Blair      
##  Min.   :1.000          Min.   :1.00            Min.   :1.000  
##  1st Qu.:3.000          1st Qu.:3.00            1st Qu.:2.000  
##  Median :3.000          Median :3.00            Median :4.000  
##  Mean   :3.246          Mean   :3.14            Mean   :3.334  
##  3rd Qu.:4.000          3rd Qu.:4.00            3rd Qu.:4.000  
##  Max.   :5.000          Max.   :5.00            Max.   :5.000  
##      Hague          Kennedy          Europe       political.knowledge
##  Min.   :1.000   Min.   :1.000   Min.   : 1.000   Average:782        
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.: 4.000   Expert :250        
##  Median :2.000   Median :3.000   Median : 6.000   Little : 38        
##  Mean   :2.747   Mean   :3.135   Mean   : 6.729   None   :455        
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:10.000                      
##  Max.   :5.000   Max.   :5.000   Max.   :11.000                      
##     gender   
##  female:812  
##  male  :713  
##              
##              
##              
## 
## [1] The Mean of column AGE is 54.1822950819672 and the Median is 53
## [1] The Mean of column EUROPE is 6.72852459016393 and the Median is 6
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
sub_BEPS <- subset(BEPS, gender == "male",
  Select = "vote":"gender")
  1. Create new column names for the new data frame.
colnames(sub_BEPS) [colnames(sub_BEPS)=='age'] <-"sub_age"
colnames(sub_BEPS) [colnames(sub_BEPS)=='Europe'] <-"sub_Europe"
  1. Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.
print(paste ("The summary of original dataset is..."), quote = FALSE)
## [1] The summary of original dataset is...
summary (BEPS)
##        id                     vote          age       
##  Min.   :   1   Conservative    :462   Min.   :24.00  
##  1st Qu.: 382   Labour          :720   1st Qu.:41.00  
##  Median : 763   Liberal Democrat:343   Median :53.00  
##  Mean   : 763                          Mean   :54.18  
##  3rd Qu.:1144                          3rd Qu.:67.00  
##  Max.   :1525                          Max.   :93.00  
##  economic.cond.national economic.cond.household     Blair      
##  Min.   :1.000          Min.   :1.00            Min.   :1.000  
##  1st Qu.:3.000          1st Qu.:3.00            1st Qu.:2.000  
##  Median :3.000          Median :3.00            Median :4.000  
##  Mean   :3.246          Mean   :3.14            Mean   :3.334  
##  3rd Qu.:4.000          3rd Qu.:4.00            3rd Qu.:4.000  
##  Max.   :5.000          Max.   :5.00            Max.   :5.000  
##      Hague          Kennedy          Europe       political.knowledge
##  Min.   :1.000   Min.   :1.000   Min.   : 1.000   Average:782        
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.: 4.000   Expert :250        
##  Median :2.000   Median :3.000   Median : 6.000   Little : 38        
##  Mean   :2.747   Mean   :3.135   Mean   : 6.729   None   :455        
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:10.000                      
##  Max.   :5.000   Max.   :5.000   Max.   :11.000                      
##     gender   
##  female:812  
##  male  :713  
##              
##              
##              
## 
print(paste ("The summary of NEW dataset is..."), quote = FALSE)
## [1] The summary of NEW dataset is...
summary(sub_BEPS)
##        id                       vote        sub_age     
##  Min.   :   2.0   Conservative    :203   Min.   :24.00  
##  1st Qu.: 419.0   Labour          :348   1st Qu.:41.00  
##  Median : 779.0   Liberal Democrat:162   Median :53.00  
##  Mean   : 782.3                          Mean   :53.85  
##  3rd Qu.:1161.0                          3rd Qu.:66.00  
##  Max.   :1524.0                          Max.   :91.00  
##  economic.cond.national economic.cond.household     Blair      
##  Min.   :1.000          Min.   :1.000           Min.   :1.000  
##  1st Qu.:3.000          1st Qu.:3.000           1st Qu.:2.000  
##  Median :3.000          Median :3.000           Median :4.000  
##  Mean   :3.297          Mean   :3.174           Mean   :3.422  
##  3rd Qu.:4.000          3rd Qu.:4.000           3rd Qu.:4.000  
##  Max.   :5.000          Max.   :5.000           Max.   :5.000  
##      Hague          Kennedy        sub_Europe     political.knowledge
##  Min.   :1.000   Min.   :1.000   Min.   : 1.000   Average:360        
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.: 4.000   Expert :165        
##  Median :2.000   Median :4.000   Median : 6.000   Little : 15        
##  Mean   :2.708   Mean   :3.116   Mean   : 6.456   None   :173        
##  3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:10.000                      
##  Max.   :5.000   Max.   :5.000   Max.   :11.000                      
##     gender   
##  female:  0  
##  male  :713  
##              
##              
##              
## 
sub_BEPS_age_mean <- mean (sub_BEPS$sub_age, na.rm=TRUE)
sub_BEPS_age_median <- median (sub_BEPS$sub_age, na.rm=TRUE)

sub_BEPS_europe_mean <- mean (sub_BEPS$sub_Europe, na.rm=TRUE,)
sub_BEPS_europe_median <- median (sub_BEPS$sub_Europe, na.rm=TRUE)

print(paste ("The Mean of original column AGE is", BEPS_age_mean, "and the Median is", BEPS_age_median,". After subsetting the data the values are", sub_BEPS_age_mean, "and", sub_BEPS_age_median), quote = FALSE)
## [1] The Mean of original column AGE is 54.1822950819672 and the Median is 53 . After subsetting the data the values are 53.851332398317 and 53
print(paste ("The Mean of original column Europe is", BEPS_europe_mean, "and the Median is", BEPS_europe_median, ". After subsetting the data the new values are", sub_BEPS_europe_mean, "and", sub_BEPS_europe_median), quote = FALSE)
## [1] The Mean of original column Europe is 6.72852459016393 and the Median is 6 . After subsetting the data the new values are 6.45582047685834 and 6
  1. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
print(paste ("A view of the original column values:"), quote = FALSE)
## [1] A view of the original column values:
head(sub_BEPS, n=20)
##    id             vote sub_age economic.cond.national
## 2   2           Labour      36                      4
## 3   3           Labour      35                      4
## 5   5           Labour      41                      2
## 6   6           Labour      47                      3
## 7   7 Liberal Democrat      57                      2
## 8   8           Labour      77                      3
## 10 10           Labour      70                      3
## 12 12           Labour      66                      4
## 16 16           Labour      51                      4
## 19 19           Labour      79                      3
## 21 21           Labour      38                      3
## 22 22 Liberal Democrat      53                      2
## 23 23           Labour      59                      3
## 24 24     Conservative      44                      2
## 29 29           Labour      44                      3
## 30 30           Labour      61                      4
## 32 32           Labour      66                      3
## 34 34           Labour      62                      4
## 38 38           Labour      52                      4
## 45 45           Labour      37                      4
##    economic.cond.household Blair Hague Kennedy sub_Europe
## 2                        4     4     4       4          5
## 3                        4     5     2       3          3
## 5                        2     1     1       4          6
## 6                        4     4     4       2          4
## 7                        2     4     4       2         11
## 8                        4     4     1       4          1
## 10                       2     5     1       1         11
## 12                       3     4     4       4          9
## 16                       4     4     4       4          5
## 19                       3     4     2       4          1
## 21                       3     4     4       2          7
## 22                       1     2     4       4          5
## 23                       3     4     2       2          1
## 24                       4     4     4       4          9
## 29                       3     4     2       4          1
## 30                       3     5     1       2          1
## 32                       2     2     2       2          6
## 34                       3     4     2       2          1
## 38                       3     4     4       4          3
## 45                       3     4     2       4          4
##    political.knowledge gender
## 2              Average   male
## 3              Average   male
## 5              Average   male
## 6              Average   male
## 7              Average   male
## 8                 None   male
## 10             Average   male
## 12             Average   male
## 16                None   male
## 19                None   male
## 21                None   male
## 22             Average   male
## 23             Average   male
## 24             Average   male
## 29             Average   male
## 30             Average   male
## 32                None   male
## 34             Average   male
## 38             Average   male
## 45             Average   male
sub_BEPS$political.knowledge <- as.character(sub_BEPS$political.knowledge)
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "None"] <- 1
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Little"] <- 2
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Average"] <- 3
sub_BEPS$political.knowledge[sub_BEPS$political.knowledge == "Expert"] <- 4
sub_BEPS$political.knowledge <- as.integer(sub_BEPS$political.knowledge)
  1. Display enough rows to see examples of all of steps 1-5 above.
print(paste ("A view of the replaced column values:"), quote = FALSE)
## [1] A view of the replaced column values:
head(sub_BEPS, n=20)
##    id             vote sub_age economic.cond.national
## 2   2           Labour      36                      4
## 3   3           Labour      35                      4
## 5   5           Labour      41                      2
## 6   6           Labour      47                      3
## 7   7 Liberal Democrat      57                      2
## 8   8           Labour      77                      3
## 10 10           Labour      70                      3
## 12 12           Labour      66                      4
## 16 16           Labour      51                      4
## 19 19           Labour      79                      3
## 21 21           Labour      38                      3
## 22 22 Liberal Democrat      53                      2
## 23 23           Labour      59                      3
## 24 24     Conservative      44                      2
## 29 29           Labour      44                      3
## 30 30           Labour      61                      4
## 32 32           Labour      66                      3
## 34 34           Labour      62                      4
## 38 38           Labour      52                      4
## 45 45           Labour      37                      4
##    economic.cond.household Blair Hague Kennedy sub_Europe
## 2                        4     4     4       4          5
## 3                        4     5     2       3          3
## 5                        2     1     1       4          6
## 6                        4     4     4       2          4
## 7                        2     4     4       2         11
## 8                        4     4     1       4          1
## 10                       2     5     1       1         11
## 12                       3     4     4       4          9
## 16                       4     4     4       4          5
## 19                       3     4     2       4          1
## 21                       3     4     4       2          7
## 22                       1     2     4       4          5
## 23                       3     4     2       2          1
## 24                       4     4     4       4          9
## 29                       3     4     2       4          1
## 30                       3     5     1       2          1
## 32                       2     2     2       2          6
## 34                       3     4     2       2          1
## 38                       3     4     4       4          3
## 45                       3     4     2       4          4
##    political.knowledge gender
## 2                    3   male
## 3                    3   male
## 5                    3   male
## 6                    3   male
## 7                    3   male
## 8                    1   male
## 10                   3   male
## 12                   3   male
## 16                   1   male
## 19                   1   male
## 21                   1   male
## 22                   3   male
## 23                   3   male
## 24                   3   male
## 29                   3   male
## 30                   3   male
## 32                   1   male
## 34                   3   male
## 38                   3   male
## 45                   3   male
  1. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
print(paste ("The bonus is answered in the first chunk ~ lines 12-15"), quote = FALSE)
## [1] The bonus is answered in the first chunk ~ lines 12-15