7. BONUS – place the original .csv in a github file and have R read from the link.
Display original dataset columns.
library(readr)
MetroHealth83 <- read_csv("https://raw.githubusercontent.com/kleberperez1/RBridgeWeek2/master/MetroHealth83.csv")
## Parsed with column specification:
## cols(
## Num = col_double(),
## City = col_character(),
## NumMDs = col_double(),
## RateMDs = col_double(),
## NumHospitals = col_double(),
## NumBeds = col_double(),
## RateBeds = col_double(),
## NumMedicare = col_double(),
## PctChangeMedicare = col_double(),
## MedicareRate = col_double(),
## SSBNum = col_double(),
## SSBRate = col_double(),
## SSBChange = col_double(),
## NumRetired = col_double(),
## SSINum = col_double(),
## SSIRate = col_double(),
## SqrtMDs = col_double()
## )
1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
summary(MetroHealth83)
## Num City NumMDs RateMDs
## Min. : 1.0 Length:83 Min. : 143.0 Min. :104.0
## 1st Qu.:21.5 Class :character 1st Qu.: 336.5 1st Qu.:190.5
## Median :42.0 Mode :character Median : 844.0 Median :267.0
## Mean :42.0 Mean :1643.3 Mean :283.2
## 3rd Qu.:62.5 3rd Qu.:2018.0 3rd Qu.:351.5
## Max. :83.0 Max. :9410.0 Max. :743.0
## NumHospitals NumBeds RateBeds NumMedicare
## Min. : 2.000 Min. : 141 Min. :102.0 Min. : 10306
## 1st Qu.: 2.000 1st Qu.: 467 1st Qu.:219.0 1st Qu.: 23050
## Median : 5.000 Median : 975 Median :299.0 Median : 46661
## Mean : 7.193 Mean :1517 Mean :311.6 Mean : 73290
## 3rd Qu.:10.000 3rd Qu.:1901 3rd Qu.:374.5 3rd Qu.: 84618
## Max. :32.000 Max. :6177 Max. :641.0 Max. :330821
## PctChangeMedicare MedicareRate SSBNum SSBRate
## Min. :-2.200 Min. : 8240 Min. : 11245 Min. :10068
## 1st Qu.: 2.350 1st Qu.:12033 1st Qu.: 26923 1st Qu.:14116
## Median : 4.600 Median :14279 Median : 55110 Median :16205
## Mean : 4.531 Mean :14699 Mean : 84186 Mean :16971
## 3rd Qu.: 6.000 3rd Qu.:16926 3rd Qu.:101140 3rd Qu.:19864
## Max. :12.800 Max. :25474 Max. :380405 Max. :27674
## SSBChange NumRetired SSINum SSIRate
## Min. :-1.800 Min. : 6775 Min. : 1495 Min. : 820
## 1st Qu.: 2.400 1st Qu.: 16843 1st Qu.: 3525 1st Qu.:1770
## Median : 4.700 Median : 35070 Median : 6725 Median :2308
## Mean : 4.486 Mean : 53179 Mean :12025 Mean :2353
## 3rd Qu.: 5.900 3rd Qu.: 61363 3rd Qu.:15077 3rd Qu.:2762
## Max. :13.500 Max. :254260 Max. :72225 Max. :4746
## SqrtMDs
## Min. :11.96
## 1st Qu.:18.34
## Median :29.05
## Mean :35.00
## 3rd Qu.:44.92
## Max. :97.01
2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
SubsetMetroHealth83 <- MetroHealth83[c(2, 5, 14:16)]
3. Create new column names for the new data frame.
colnames(SubsetMetroHealth83) <- c("NewCity", "NHosp", "NRet", "SSIN", "SSIR")
DT::datatable(SubsetMetroHealth83, options = list(pageLength = 5))
4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
summary(SubsetMetroHealth83)
## NewCity NHosp NRet SSIN
## Length:83 Min. : 2.000 Min. : 6775 Min. : 1495
## Class :character 1st Qu.: 2.000 1st Qu.: 16843 1st Qu.: 3525
## Mode :character Median : 5.000 Median : 35070 Median : 6725
## Mean : 7.193 Mean : 53179 Mean :12025
## 3rd Qu.:10.000 3rd Qu.: 61363 3rd Qu.:15077
## Max. :32.000 Max. :254260 Max. :72225
## SSIR
## Min. : 820
## 1st Qu.:1770
## Median :2308
## Mean :2353
## 3rd Qu.:2762
## Max. :4746
5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
newdata <- subset(SubsetMetroHealth83, NewCity >= "ma" & NewCity <= "na")
newdata [ newdata >= "Ma" & newdata <= "Na" ] <- "excellent"
6. Display enough rows to see examples of all of steps 1-5 above.
DT::datatable(newdata, options = list(pageLength = 5))
7. This question is answered at the begining: BONUS
Please email to: kleber.perez@live.com for any suggestion.