7. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

cdata<-read.csv(url("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Stat2Data/Election08.csv"))

1.Use the summary function to gain an overview of the data set.

summary(cdata)
##        X                State         Abr         Income     
##  Min.   : 1.0   Alabama    : 1   AK     : 1   Min.   :28845  
##  1st Qu.:13.5   Alaska     : 1   AL     : 1   1st Qu.:33537  
##  Median :26.0   Arizona    : 1   AR     : 1   Median :36047  
##  Mean   :26.0   Arkansas   : 1   AZ     : 1   Mean   :37642  
##  3rd Qu.:38.5   California : 1   CA     : 1   3rd Qu.:40544  
##  Max.   :51.0   Colorado   : 1   CO     : 1   Max.   :61092  
##                 (Other)    :45   (Other):45                  
##        HS              BA           Dem.Rep          ObamaWin     
##  Min.   :78.50   Min.   :17.30   Min.   :-23.00   Min.   :0.0000  
##  1st Qu.:83.00   1st Qu.:24.20   1st Qu.:  3.00   1st Qu.:0.0000  
##  Median :87.00   Median :25.80   Median : 12.00   Median :1.0000  
##  Mean   :86.00   Mean   :27.15   Mean   : 12.31   Mean   :0.5686  
##  3rd Qu.:89.05   3rd Qu.:29.65   3rd Qu.: 19.00   3rd Qu.:1.0000  
##  Max.   :91.20   Max.   :47.50   Max.   : 75.00   Max.   :1.0000  
## 

Then display the mean and median for at least two attributes.

mean(cdata$Income)
## [1] 37641.57
mean(cdata$HS)
## [1] 86
median(cdata$Income)
## [1] 36047
median(cdata$HS)
## [1] 87

2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

newData=subset(cdata, cdata$Income>40000 & cdata$ObamaWin==1)
summary(newData)
##        X                          State        Abr        Income     
##  Min.   : 5.00   California          :1   CA     :1   Min.   :40322  
##  1st Qu.: 8.50   Colorado            :1   CO     :1   1st Qu.:40821  
##  Median :22.00   Connecticut         :1   CT     :1   Median :41512  
##  Mean   :22.27   Delaware            :1   DC     :1   Mean   :45015  
##  3rd Qu.:30.50   District of Columbia:1   DE     :1   3rd Qu.:48234  
##  Max.   :48.00   Illinois            :1   IL     :1   Max.   :61092  
##                  (Other)             :9   (Other):9                  
##        HS              BA           Dem.Rep         ObamaWin
##  Min.   :80.20   Min.   :21.80   Min.   : 9.00   Min.   :1  
##  1st Qu.:85.70   1st Qu.:29.90   1st Qu.:14.00   1st Qu.:1  
##  Median :87.40   Median :32.50   Median :19.00   Median :1  
##  Mean   :86.88   Mean   :32.68   Mean   :23.27   Mean   :1  
##  3rd Qu.:88.65   3rd Qu.:34.85   3rd Qu.:26.00   3rd Qu.:1  
##  Max.   :91.00   Max.   :47.50   Max.   :75.00   Max.   :1  
## 
print(newData)
##     X                State Abr Income   HS   BA Dem.Rep ObamaWin
## 5   5          California   CA  41571 80.2 29.5      19        1
## 6   6            Colorado   CO  41042 88.9 35.0      11        1
## 7   7         Connecticut   CT  54117 88.0 34.7      26        1
## 8   8            Delaware   DE  40608 87.4 26.1      23        1
## 9   9 District of Columbia  DC  61092 85.7 47.5      75        1
## 14 14            Illinois   IL  40322 85.7 29.5      24        1
## 21 21            Maryland   MD  46021 87.4 35.2      26        1
## 22 22       Massachusetts   MA  49082 88.4 37.9      34        1
## 24 24           Minnesota   MN  41034 91.0 31.0      15        1
## 29 29              Nevada   NV  40480 83.7 21.8      11        1
## 30 30       New Hampshire   NH  41512 90.5 32.5      13        1
## 31 31          New Jersey   NJ  49194 87.0 33.9      19        1
## 33 33            New York   NY  47385 84.1 31.7      27        1
## 47 47            Virginia   VA  41347 85.9 33.6       9        1
## 48 48          Washington   WA  40414 89.3 30.3      17        1

3. Create new column names for the new data frame.

names(newData)
## [1] "X"        "State"    "Abr"      "Income"   "HS"       "BA"      
## [7] "Dem.Rep"  "ObamaWin"
names(newData)[4]<-"StateIncome"
names(newData)[8]<-"BarakWin"
newData
##     X                State Abr StateIncome   HS   BA Dem.Rep BarakWin
## 5   5          California   CA       41571 80.2 29.5      19        1
## 6   6            Colorado   CO       41042 88.9 35.0      11        1
## 7   7         Connecticut   CT       54117 88.0 34.7      26        1
## 8   8            Delaware   DE       40608 87.4 26.1      23        1
## 9   9 District of Columbia  DC       61092 85.7 47.5      75        1
## 14 14            Illinois   IL       40322 85.7 29.5      24        1
## 21 21            Maryland   MD       46021 87.4 35.2      26        1
## 22 22       Massachusetts   MA       49082 88.4 37.9      34        1
## 24 24           Minnesota   MN       41034 91.0 31.0      15        1
## 29 29              Nevada   NV       40480 83.7 21.8      11        1
## 30 30       New Hampshire   NH       41512 90.5 32.5      13        1
## 31 31          New Jersey   NJ       49194 87.0 33.9      19        1
## 33 33            New York   NY       47385 84.1 31.7      27        1
## 47 47            Virginia   VA       41347 85.9 33.6       9        1
## 48 48          Washington   WA       40414 89.3 30.3      17        1

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(newData)
##        X                          State        Abr     StateIncome   
##  Min.   : 5.00   California          :1   CA     :1   Min.   :40322  
##  1st Qu.: 8.50   Colorado            :1   CO     :1   1st Qu.:40821  
##  Median :22.00   Connecticut         :1   CT     :1   Median :41512  
##  Mean   :22.27   Delaware            :1   DC     :1   Mean   :45015  
##  3rd Qu.:30.50   District of Columbia:1   DE     :1   3rd Qu.:48234  
##  Max.   :48.00   Illinois            :1   IL     :1   Max.   :61092  
##                  (Other)             :9   (Other):9                  
##        HS              BA           Dem.Rep         BarakWin
##  Min.   :80.20   Min.   :21.80   Min.   : 9.00   Min.   :1  
##  1st Qu.:85.70   1st Qu.:29.90   1st Qu.:14.00   1st Qu.:1  
##  Median :87.40   Median :32.50   Median :19.00   Median :1  
##  Mean   :86.88   Mean   :32.68   Mean   :23.27   Mean   :1  
##  3rd Qu.:88.65   3rd Qu.:34.85   3rd Qu.:26.00   3rd Qu.:1  
##  Max.   :91.00   Max.   :47.50   Max.   :75.00   Max.   :1  
## 
mean(newData$StateIncome)
## [1] 45014.73
mean(newData$HS)
## [1] 86.88
median(newData$StateIncome)
## [1] 41512
median(newData$HS)
## [1] 87.4

5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

rData<-as.data.frame(sapply(newData,gsub,pattern="e",replacement="Excellent",ignore.case = TRUE))
print(rData)
##     X                               State        Abr StateIncome   HS   BA
## 1   5                         California          CA       41571 80.2 29.5
## 2   6                           Colorado          CO       41042 88.9   35
## 3   7                ConnExcellentcticut          CT       54117   88 34.7
## 4   8           DExcellentlawarExcellent  DExcellent       40608 87.4 26.1
## 5   9                District of Columbia         DC       61092 85.7 47.5
## 6  14                           Illinois          IL       40322 85.7 29.5
## 7  21                           Maryland          MD       46021 87.4 35.2
## 8  22              MassachusExcellenttts          MA       49082 88.4 37.9
## 9  24                  MinnExcellentsota          MN       41034   91   31
## 10 29                     NExcellentvada          NV       40480 83.7 21.8
## 11 30      NExcellentw HampshirExcellent          NH       41512 90.5 32.5
## 12 31 NExcellentw JExcellentrsExcellenty          NJ       49194   87 33.9
## 13 33                   NExcellentw York          NY       47385 84.1 31.7
## 14 47                           Virginia          VA       41347 85.9 33.6
## 15 48                         Washington          WA       40414 89.3 30.3
##    Dem.Rep BarakWin
## 1       19        1
## 2       11        1
## 3       26        1
## 4       23        1
## 5       75        1
## 6       24        1
## 7       26        1
## 8       34        1
## 9       15        1
## 10      11        1
## 11      13        1
## 12      19        1
## 13      27        1
## 14       9        1
## 15      17        1

6.Display enough rows to see examples of all of steps 1-5 above.

library(knitr)
kable(cdata)
X State Abr Income HS BA Dem.Rep ObamaWin
1 Alabama AL 32404 80.4 21.4 -1 0
2 Alaska AK 40352 90.5 26.0 -11 0
3 Arizona AZ 33029 83.5 25.3 0 0
4 Arkansas AR 30060 81.1 19.3 12 0
5 California CA 41571 80.2 29.5 19 1
6 Colorado CO 41042 88.9 35.0 11 1
7 Connecticut CT 54117 88.0 34.7 26 1
8 Delaware DE 40608 87.4 26.1 23 1
9 District of Columbia DC 61092 85.7 47.5 75 1
10 Florida FL 38444 84.9 25.8 9 1
11 Georgia GA 33457 82.9 27.1 4 0
12 Hawaii HI 39239 89.4 29.2 34 1
13 Idaho ID 31197 88.4 24.5 -15 0
14 Illinois IL 40322 85.7 29.5 24 1
15 Indiana IN 33616 85.8 22.1 9 1
16 Iowa IA 35023 89.6 24.3 18 1
17 Kansas KS 36768 89.1 28.8 -2 0
18 Kentucky KY 31111 80.1 20.0 13 0
19 Louisiana LA 34756 79.9 20.4 9 0
20 Maine ME 33722 89.4 26.7 19 1
21 Maryland MD 46021 87.4 35.2 26 1
22 Massachusetts MA 49082 88.4 37.9 34 1
23 Michigan MI 35086 87.4 24.7 17 1
24 Minnesota MN 41034 91.0 31.0 15 1
25 Mississippi MS 28845 78.5 18.9 1 0
26 Missouri MO 34389 85.6 24.5 11 0
27 Montana MT 32458 90.0 27.0 4 0
28 Nebraska NE 36471 89.6 27.5 -7 0
29 Nevada NV 40480 83.7 21.8 11 1
30 New Hampshire NH 41512 90.5 32.5 13 1
31 New Jersey NJ 49194 87.0 33.9 19 1
32 New Mexico NM 31474 82.3 24.8 14 1
33 New York NY 47385 84.1 31.7 27 1
34 North Carolina NC 33636 83.0 25.6 11 1
35 North Dakota ND 34846 89.0 25.7 1 0
36 Ohio OH 34874 87.1 24.1 18 1
37 Oklahoma OK 34153 84.8 22.8 6 0
38 Oregon OR 34784 88.0 28.3 17 1
39 Pennsylvania PA 38788 86.8 25.8 16 1
40 Rhode Island RI 39463 83.0 29.8 37 1
41 South Carolina SC 31013 82.1 23.5 0 0
42 South Dakota SD 33905 88.2 25.0 1 0
43 Tennessee TN 33280 81.4 21.8 5 0
44 Texas TX 37187 79.1 25.2 2 0
45 Utah UT 31189 90.2 28.7 -23 0
46 Vermont VT 36670 90.3 33.6 33 1
47 Virginia VA 41347 85.9 33.6 9 1
48 Washington WA 40414 89.3 30.3 17 1
49 West Virginia WV 29537 81.2 17.3 19 0
50 Wisconsin WI 36047 89.0 25.4 18 1
51 Wyoming WY 43226 91.2 23.4 -20 0
library(knitr)
kable(newData)
X State Abr StateIncome HS BA Dem.Rep BarakWin
5 5 California CA 41571 80.2 29.5 19 1
6 6 Colorado CO 41042 88.9 35.0 11 1
7 7 Connecticut CT 54117 88.0 34.7 26 1
8 8 Delaware DE 40608 87.4 26.1 23 1
9 9 District of Columbia DC 61092 85.7 47.5 75 1
14 14 Illinois IL 40322 85.7 29.5 24 1
21 21 Maryland MD 46021 87.4 35.2 26 1
22 22 Massachusetts MA 49082 88.4 37.9 34 1
24 24 Minnesota MN 41034 91.0 31.0 15 1
29 29 Nevada NV 40480 83.7 21.8 11 1
30 30 New Hampshire NH 41512 90.5 32.5 13 1
31 31 New Jersey NJ 49194 87.0 33.9 19 1
33 33 New York NY 47385 84.1 31.7 27 1
47 47 Virginia VA 41347 85.9 33.6 9 1
48 48 Washington WA 40414 89.3 30.3 17 1