7. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
cdata<-read.csv(url("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Stat2Data/Election08.csv"))
1.Use the summary function to gain an overview of the data set.
summary(cdata)
## X State Abr Income
## Min. : 1.0 Alabama : 1 AK : 1 Min. :28845
## 1st Qu.:13.5 Alaska : 1 AL : 1 1st Qu.:33537
## Median :26.0 Arizona : 1 AR : 1 Median :36047
## Mean :26.0 Arkansas : 1 AZ : 1 Mean :37642
## 3rd Qu.:38.5 California : 1 CA : 1 3rd Qu.:40544
## Max. :51.0 Colorado : 1 CO : 1 Max. :61092
## (Other) :45 (Other):45
## HS BA Dem.Rep ObamaWin
## Min. :78.50 Min. :17.30 Min. :-23.00 Min. :0.0000
## 1st Qu.:83.00 1st Qu.:24.20 1st Qu.: 3.00 1st Qu.:0.0000
## Median :87.00 Median :25.80 Median : 12.00 Median :1.0000
## Mean :86.00 Mean :27.15 Mean : 12.31 Mean :0.5686
## 3rd Qu.:89.05 3rd Qu.:29.65 3rd Qu.: 19.00 3rd Qu.:1.0000
## Max. :91.20 Max. :47.50 Max. : 75.00 Max. :1.0000
##
2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
newData=subset(cdata, cdata$Income>40000 & cdata$ObamaWin==1)
summary(newData)
## X State Abr Income
## Min. : 5.00 California :1 CA :1 Min. :40322
## 1st Qu.: 8.50 Colorado :1 CO :1 1st Qu.:40821
## Median :22.00 Connecticut :1 CT :1 Median :41512
## Mean :22.27 Delaware :1 DC :1 Mean :45015
## 3rd Qu.:30.50 District of Columbia:1 DE :1 3rd Qu.:48234
## Max. :48.00 Illinois :1 IL :1 Max. :61092
## (Other) :9 (Other):9
## HS BA Dem.Rep ObamaWin
## Min. :80.20 Min. :21.80 Min. : 9.00 Min. :1
## 1st Qu.:85.70 1st Qu.:29.90 1st Qu.:14.00 1st Qu.:1
## Median :87.40 Median :32.50 Median :19.00 Median :1
## Mean :86.88 Mean :32.68 Mean :23.27 Mean :1
## 3rd Qu.:88.65 3rd Qu.:34.85 3rd Qu.:26.00 3rd Qu.:1
## Max. :91.00 Max. :47.50 Max. :75.00 Max. :1
##
print(newData)
## X State Abr Income HS BA Dem.Rep ObamaWin
## 5 5 California CA 41571 80.2 29.5 19 1
## 6 6 Colorado CO 41042 88.9 35.0 11 1
## 7 7 Connecticut CT 54117 88.0 34.7 26 1
## 8 8 Delaware DE 40608 87.4 26.1 23 1
## 9 9 District of Columbia DC 61092 85.7 47.5 75 1
## 14 14 Illinois IL 40322 85.7 29.5 24 1
## 21 21 Maryland MD 46021 87.4 35.2 26 1
## 22 22 Massachusetts MA 49082 88.4 37.9 34 1
## 24 24 Minnesota MN 41034 91.0 31.0 15 1
## 29 29 Nevada NV 40480 83.7 21.8 11 1
## 30 30 New Hampshire NH 41512 90.5 32.5 13 1
## 31 31 New Jersey NJ 49194 87.0 33.9 19 1
## 33 33 New York NY 47385 84.1 31.7 27 1
## 47 47 Virginia VA 41347 85.9 33.6 9 1
## 48 48 Washington WA 40414 89.3 30.3 17 1
3. Create new column names for the new data frame.
names(newData)
## [1] "X" "State" "Abr" "Income" "HS" "BA"
## [7] "Dem.Rep" "ObamaWin"
names(newData)[4]<-"StateIncome"
names(newData)[8]<-"BarakWin"
newData
## X State Abr StateIncome HS BA Dem.Rep BarakWin
## 5 5 California CA 41571 80.2 29.5 19 1
## 6 6 Colorado CO 41042 88.9 35.0 11 1
## 7 7 Connecticut CT 54117 88.0 34.7 26 1
## 8 8 Delaware DE 40608 87.4 26.1 23 1
## 9 9 District of Columbia DC 61092 85.7 47.5 75 1
## 14 14 Illinois IL 40322 85.7 29.5 24 1
## 21 21 Maryland MD 46021 87.4 35.2 26 1
## 22 22 Massachusetts MA 49082 88.4 37.9 34 1
## 24 24 Minnesota MN 41034 91.0 31.0 15 1
## 29 29 Nevada NV 40480 83.7 21.8 11 1
## 30 30 New Hampshire NH 41512 90.5 32.5 13 1
## 31 31 New Jersey NJ 49194 87.0 33.9 19 1
## 33 33 New York NY 47385 84.1 31.7 27 1
## 47 47 Virginia VA 41347 85.9 33.6 9 1
## 48 48 Washington WA 40414 89.3 30.3 17 1
5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
rData<-as.data.frame(sapply(newData,gsub,pattern="e",replacement="Excellent",ignore.case = TRUE))
print(rData)
## X State Abr StateIncome HS BA
## 1 5 California CA 41571 80.2 29.5
## 2 6 Colorado CO 41042 88.9 35
## 3 7 ConnExcellentcticut CT 54117 88 34.7
## 4 8 DExcellentlawarExcellent DExcellent 40608 87.4 26.1
## 5 9 District of Columbia DC 61092 85.7 47.5
## 6 14 Illinois IL 40322 85.7 29.5
## 7 21 Maryland MD 46021 87.4 35.2
## 8 22 MassachusExcellenttts MA 49082 88.4 37.9
## 9 24 MinnExcellentsota MN 41034 91 31
## 10 29 NExcellentvada NV 40480 83.7 21.8
## 11 30 NExcellentw HampshirExcellent NH 41512 90.5 32.5
## 12 31 NExcellentw JExcellentrsExcellenty NJ 49194 87 33.9
## 13 33 NExcellentw York NY 47385 84.1 31.7
## 14 47 Virginia VA 41347 85.9 33.6
## 15 48 Washington WA 40414 89.3 30.3
## Dem.Rep BarakWin
## 1 19 1
## 2 11 1
## 3 26 1
## 4 23 1
## 5 75 1
## 6 24 1
## 7 26 1
## 8 34 1
## 9 15 1
## 10 11 1
## 11 13 1
## 12 19 1
## 13 27 1
## 14 9 1
## 15 17 1
6.Display enough rows to see examples of all of steps 1-5 above.
library(knitr)
kable(cdata)
| 1 |
Alabama |
AL |
32404 |
80.4 |
21.4 |
-1 |
0 |
| 2 |
Alaska |
AK |
40352 |
90.5 |
26.0 |
-11 |
0 |
| 3 |
Arizona |
AZ |
33029 |
83.5 |
25.3 |
0 |
0 |
| 4 |
Arkansas |
AR |
30060 |
81.1 |
19.3 |
12 |
0 |
| 5 |
California |
CA |
41571 |
80.2 |
29.5 |
19 |
1 |
| 6 |
Colorado |
CO |
41042 |
88.9 |
35.0 |
11 |
1 |
| 7 |
Connecticut |
CT |
54117 |
88.0 |
34.7 |
26 |
1 |
| 8 |
Delaware |
DE |
40608 |
87.4 |
26.1 |
23 |
1 |
| 9 |
District of Columbia |
DC |
61092 |
85.7 |
47.5 |
75 |
1 |
| 10 |
Florida |
FL |
38444 |
84.9 |
25.8 |
9 |
1 |
| 11 |
Georgia |
GA |
33457 |
82.9 |
27.1 |
4 |
0 |
| 12 |
Hawaii |
HI |
39239 |
89.4 |
29.2 |
34 |
1 |
| 13 |
Idaho |
ID |
31197 |
88.4 |
24.5 |
-15 |
0 |
| 14 |
Illinois |
IL |
40322 |
85.7 |
29.5 |
24 |
1 |
| 15 |
Indiana |
IN |
33616 |
85.8 |
22.1 |
9 |
1 |
| 16 |
Iowa |
IA |
35023 |
89.6 |
24.3 |
18 |
1 |
| 17 |
Kansas |
KS |
36768 |
89.1 |
28.8 |
-2 |
0 |
| 18 |
Kentucky |
KY |
31111 |
80.1 |
20.0 |
13 |
0 |
| 19 |
Louisiana |
LA |
34756 |
79.9 |
20.4 |
9 |
0 |
| 20 |
Maine |
ME |
33722 |
89.4 |
26.7 |
19 |
1 |
| 21 |
Maryland |
MD |
46021 |
87.4 |
35.2 |
26 |
1 |
| 22 |
Massachusetts |
MA |
49082 |
88.4 |
37.9 |
34 |
1 |
| 23 |
Michigan |
MI |
35086 |
87.4 |
24.7 |
17 |
1 |
| 24 |
Minnesota |
MN |
41034 |
91.0 |
31.0 |
15 |
1 |
| 25 |
Mississippi |
MS |
28845 |
78.5 |
18.9 |
1 |
0 |
| 26 |
Missouri |
MO |
34389 |
85.6 |
24.5 |
11 |
0 |
| 27 |
Montana |
MT |
32458 |
90.0 |
27.0 |
4 |
0 |
| 28 |
Nebraska |
NE |
36471 |
89.6 |
27.5 |
-7 |
0 |
| 29 |
Nevada |
NV |
40480 |
83.7 |
21.8 |
11 |
1 |
| 30 |
New Hampshire |
NH |
41512 |
90.5 |
32.5 |
13 |
1 |
| 31 |
New Jersey |
NJ |
49194 |
87.0 |
33.9 |
19 |
1 |
| 32 |
New Mexico |
NM |
31474 |
82.3 |
24.8 |
14 |
1 |
| 33 |
New York |
NY |
47385 |
84.1 |
31.7 |
27 |
1 |
| 34 |
North Carolina |
NC |
33636 |
83.0 |
25.6 |
11 |
1 |
| 35 |
North Dakota |
ND |
34846 |
89.0 |
25.7 |
1 |
0 |
| 36 |
Ohio |
OH |
34874 |
87.1 |
24.1 |
18 |
1 |
| 37 |
Oklahoma |
OK |
34153 |
84.8 |
22.8 |
6 |
0 |
| 38 |
Oregon |
OR |
34784 |
88.0 |
28.3 |
17 |
1 |
| 39 |
Pennsylvania |
PA |
38788 |
86.8 |
25.8 |
16 |
1 |
| 40 |
Rhode Island |
RI |
39463 |
83.0 |
29.8 |
37 |
1 |
| 41 |
South Carolina |
SC |
31013 |
82.1 |
23.5 |
0 |
0 |
| 42 |
South Dakota |
SD |
33905 |
88.2 |
25.0 |
1 |
0 |
| 43 |
Tennessee |
TN |
33280 |
81.4 |
21.8 |
5 |
0 |
| 44 |
Texas |
TX |
37187 |
79.1 |
25.2 |
2 |
0 |
| 45 |
Utah |
UT |
31189 |
90.2 |
28.7 |
-23 |
0 |
| 46 |
Vermont |
VT |
36670 |
90.3 |
33.6 |
33 |
1 |
| 47 |
Virginia |
VA |
41347 |
85.9 |
33.6 |
9 |
1 |
| 48 |
Washington |
WA |
40414 |
89.3 |
30.3 |
17 |
1 |
| 49 |
West Virginia |
WV |
29537 |
81.2 |
17.3 |
19 |
0 |
| 50 |
Wisconsin |
WI |
36047 |
89.0 |
25.4 |
18 |
1 |
| 51 |
Wyoming |
WY |
43226 |
91.2 |
23.4 |
-20 |
0 |
library(knitr)
kable(newData)
| 5 |
5 |
California |
CA |
41571 |
80.2 |
29.5 |
19 |
1 |
| 6 |
6 |
Colorado |
CO |
41042 |
88.9 |
35.0 |
11 |
1 |
| 7 |
7 |
Connecticut |
CT |
54117 |
88.0 |
34.7 |
26 |
1 |
| 8 |
8 |
Delaware |
DE |
40608 |
87.4 |
26.1 |
23 |
1 |
| 9 |
9 |
District of Columbia |
DC |
61092 |
85.7 |
47.5 |
75 |
1 |
| 14 |
14 |
Illinois |
IL |
40322 |
85.7 |
29.5 |
24 |
1 |
| 21 |
21 |
Maryland |
MD |
46021 |
87.4 |
35.2 |
26 |
1 |
| 22 |
22 |
Massachusetts |
MA |
49082 |
88.4 |
37.9 |
34 |
1 |
| 24 |
24 |
Minnesota |
MN |
41034 |
91.0 |
31.0 |
15 |
1 |
| 29 |
29 |
Nevada |
NV |
40480 |
83.7 |
21.8 |
11 |
1 |
| 30 |
30 |
New Hampshire |
NH |
41512 |
90.5 |
32.5 |
13 |
1 |
| 31 |
31 |
New Jersey |
NJ |
49194 |
87.0 |
33.9 |
19 |
1 |
| 33 |
33 |
New York |
NY |
47385 |
84.1 |
31.7 |
27 |
1 |
| 47 |
47 |
Virginia |
VA |
41347 |
85.9 |
33.6 |
9 |
1 |
| 48 |
48 |
Washington |
WA |
40414 |
89.3 |
30.3 |
17 |
1 |