- Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
csv <- RCurl::getURL("https://raw.githubusercontent.com/baroncurtin2/bridgeworkshop/master/College.csv")
data <- read.csv(textConnection(csv))
summary(data)
## X Private Apps
## Abilene Christian University: 1 No :212 Min. : 81
## Adelphi University : 1 Yes:565 1st Qu.: 776
## Adrian College : 1 Median : 1558
## Agnes Scott College : 1 Mean : 3002
## Alaska Pacific University : 1 3rd Qu.: 3624
## Albertson College : 1 Max. :48094
## (Other) :771
## Accept Enroll Top10perc Top25perc
## Min. : 72 Min. : 35 Min. : 1.00 Min. : 9.0
## 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00 1st Qu.: 41.0
## Median : 1110 Median : 434 Median :23.00 Median : 54.0
## Mean : 2019 Mean : 780 Mean :27.56 Mean : 55.8
## 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00 3rd Qu.: 69.0
## Max. :26330 Max. :6392 Max. :96.00 Max. :100.0
##
## F.Undergrad P.Undergrad Outstate Room.Board
## Min. : 139 Min. : 1.0 Min. : 2340 Min. :1780
## 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320 1st Qu.:3597
## Median : 1707 Median : 353.0 Median : 9990 Median :4200
## Mean : 3700 Mean : 855.3 Mean :10441 Mean :4358
## 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925 3rd Qu.:5050
## Max. :31643 Max. :21836.0 Max. :21700 Max. :8124
##
## Books Personal PhD Terminal
## Min. : 96.0 Min. : 250 Min. : 8.00 Min. : 24.0
## 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00 1st Qu.: 71.0
## Median : 500.0 Median :1200 Median : 75.00 Median : 82.0
## Mean : 549.4 Mean :1341 Mean : 72.66 Mean : 79.7
## 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00 3rd Qu.: 92.0
## Max. :2340.0 Max. :6800 Max. :103.00 Max. :100.0
##
## S.F.Ratio perc.alumni Expend Grad.Rate
## Min. : 2.50 Min. : 0.00 Min. : 3186 Min. : 10.00
## 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751 1st Qu.: 53.00
## Median :13.60 Median :21.00 Median : 8377 Median : 65.00
## Mean :14.09 Mean :22.74 Mean : 9660 Mean : 65.46
## 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830 3rd Qu.: 78.00
## Max. :39.80 Max. :64.00 Max. :56233 Max. :118.00
##
mean(data$Apps)
## [1] 3001.638
mean(data$Accept)
## [1] 2018.804
median(data$Apps)
## [1] 1558
median(data$Accept)
## [1] 1110
- Create a new data frame with a subset of the columns and rows. Make sure to rename it.
collegeData <- data.frame(data, stringsAsFactors = FALSE)
collegeData[] <- lapply(collegeData, as.character)
- Create new column names for the new data frame.
colnames(collegeData) <- c("Name", "Private_Flag", "Applications", "Accepted", "Enrollees", "Top10", "Top25", "FullTime_Undergrad", "PartTime_Undergrad", "OutOfState_Tution", "Room.Board_Costs", "Book_Costs", "Personal_Spend", "Faculty_PHDperc", "Terminal_Degreeperc", "Faculty_Ratio", "AlumniDonation_perc", "InstructionalExpense", "GraduationRate")
- Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
summary(collegeData)
## Name Private_Flag Applications
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Accepted Enrollees Top10
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Top25 FullTime_Undergrad PartTime_Undergrad
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## OutOfState_Tution Room.Board_Costs Book_Costs
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Personal_Spend Faculty_PHDperc Terminal_Degreeperc
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Faculty_Ratio AlumniDonation_perc InstructionalExpense
## Length:777 Length:777 Length:777
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## GraduationRate
## Length:777
## Class :character
## Mode :character
mean(collegeData$Apps)
## Warning in mean.default(collegeData$Apps): argument is not numeric or
## logical: returning NA
## [1] NA
mean(collegeData$Accept)
## Warning in mean.default(collegeData$Accept): argument is not numeric or
## logical: returning NA
## [1] NA
median(collegeData$Apps)
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'NULL'
## NULL
median(collegeData$Accept)
## [1] "3446"
- For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
collegeData$Private_Flag[collegeData$Private_Flag == "Yes"] <- 1
collegeData$Private_Flag[collegeData$Private_Flag == "No"] <- 0
collegeData$GraduationRate[collegeData$GraduationRate == 100] <- 1
- Display enough rows to see examples of all of steps 1-5 above.
head(collegeData, 20)
## Name Private_Flag Applications
## 1 Abilene Christian University 1 1660
## 2 Adelphi University 1 2186
## 3 Adrian College 1 1428
## 4 Agnes Scott College 1 417
## 5 Alaska Pacific University 1 193
## 6 Albertson College 1 587
## 7 Albertus Magnus College 1 353
## 8 Albion College 1 1899
## 9 Albright College 1 1038
## 10 Alderson-Broaddus College 1 582
## 11 Alfred University 1 1732
## 12 Allegheny College 1 2652
## 13 Allentown Coll. of St. Francis de Sales 1 1179
## 14 Alma College 1 1267
## 15 Alverno College 1 494
## 16 American International College 1 1420
## 17 Amherst College 1 4302
## 18 Anderson University 1 1216
## 19 Andrews University 1 1130
## 20 Angelo State University 0 3540
## Accepted Enrollees Top10 Top25 FullTime_Undergrad PartTime_Undergrad
## 1 1232 721 23 52 2885 537
## 2 1924 512 16 29 2683 1227
## 3 1097 336 22 50 1036 99
## 4 349 137 60 89 510 63
## 5 146 55 16 44 249 869
## 6 479 158 38 62 678 41
## 7 340 103 17 45 416 230
## 8 1720 489 37 68 1594 32
## 9 839 227 30 63 973 306
## 10 498 172 21 44 799 78
## 11 1425 472 37 75 1830 110
## 12 1900 484 44 77 1707 44
## 13 780 290 38 64 1130 638
## 14 1080 385 44 73 1306 28
## 15 313 157 23 46 1317 1235
## 16 1093 220 9 22 1018 287
## 17 992 418 83 96 1593 5
## 18 908 423 19 40 1819 281
## 19 704 322 14 23 1586 326
## 20 2001 1016 24 54 4190 1512
## OutOfState_Tution Room.Board_Costs Book_Costs Personal_Spend
## 1 7440 3300 450 2200
## 2 12280 6450 750 1500
## 3 11250 3750 400 1165
## 4 12960 5450 450 875
## 5 7560 4120 800 1500
## 6 13500 3335 500 675
## 7 13290 5720 500 1500
## 8 13868 4826 450 850
## 9 15595 4400 300 500
## 10 10468 3380 660 1800
## 11 16548 5406 500 600
## 12 17080 4440 400 600
## 13 9690 4785 600 1000
## 14 12572 4552 400 400
## 15 8352 3640 650 2449
## 16 8700 4780 450 1400
## 17 19760 5300 660 1598
## 18 10100 3520 550 1100
## 19 9996 3090 900 1320
## 20 5130 3592 500 2000
## Faculty_PHDperc Terminal_Degreeperc Faculty_Ratio AlumniDonation_perc
## 1 70 78 18.1 12
## 2 29 30 12.2 16
## 3 53 66 12.9 30
## 4 92 97 7.7 37
## 5 76 72 11.9 2
## 6 67 73 9.4 11
## 7 90 93 11.5 26
## 8 89 100 13.7 37
## 9 79 84 11.3 23
## 10 40 41 11.5 15
## 11 82 88 11.3 31
## 12 73 91 9.9 41
## 13 60 84 13.3 21
## 14 79 87 15.3 32
## 15 36 69 11.1 26
## 16 78 84 14.7 19
## 17 93 98 8.4 63
## 18 48 61 12.1 14
## 19 62 66 11.5 18
## 20 60 62 23.1 5
## InstructionalExpense GraduationRate
## 1 7041 60
## 2 10527 56
## 3 8735 54
## 4 19016 59
## 5 10922 15
## 6 9727 55
## 7 8861 63
## 8 11487 73
## 9 11644 80
## 10 8991 52
## 11 10932 73
## 12 11711 76
## 13 7940 74
## 14 9305 68
## 15 8127 55
## 16 7355 69
## 17 21424 1
## 18 7994 59
## 19 10908 46
## 20 4010 34