MetroHealth83 Data


7. BONUS – place the original .csv in a github file and have R read from the link.
Display original dataset columns.

library(readr)
MetroHealth83 <- read_csv("https://raw.githubusercontent.com/kleberperez1/RBridgeWeek2/master/MetroHealth83.csv")
## Parsed with column specification:
## cols(
##   Num = col_double(),
##   City = col_character(),
##   NumMDs = col_double(),
##   RateMDs = col_double(),
##   NumHospitals = col_double(),
##   NumBeds = col_double(),
##   RateBeds = col_double(),
##   NumMedicare = col_double(),
##   PctChangeMedicare = col_double(),
##   MedicareRate = col_double(),
##   SSBNum = col_double(),
##   SSBRate = col_double(),
##   SSBChange = col_double(),
##   NumRetired = col_double(),
##   SSINum = col_double(),
##   SSIRate = col_double(),
##   SqrtMDs = col_double()
## )


1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

summary(MetroHealth83)
##       Num           City               NumMDs          RateMDs     
##  Min.   : 1.0   Length:83          Min.   : 143.0   Min.   :104.0  
##  1st Qu.:21.5   Class :character   1st Qu.: 336.5   1st Qu.:190.5  
##  Median :42.0   Mode  :character   Median : 844.0   Median :267.0  
##  Mean   :42.0                      Mean   :1643.3   Mean   :283.2  
##  3rd Qu.:62.5                      3rd Qu.:2018.0   3rd Qu.:351.5  
##  Max.   :83.0                      Max.   :9410.0   Max.   :743.0  
##   NumHospitals       NumBeds        RateBeds      NumMedicare    
##  Min.   : 2.000   Min.   : 141   Min.   :102.0   Min.   : 10306  
##  1st Qu.: 2.000   1st Qu.: 467   1st Qu.:219.0   1st Qu.: 23050  
##  Median : 5.000   Median : 975   Median :299.0   Median : 46661  
##  Mean   : 7.193   Mean   :1517   Mean   :311.6   Mean   : 73290  
##  3rd Qu.:10.000   3rd Qu.:1901   3rd Qu.:374.5   3rd Qu.: 84618  
##  Max.   :32.000   Max.   :6177   Max.   :641.0   Max.   :330821  
##  PctChangeMedicare  MedicareRate       SSBNum          SSBRate     
##  Min.   :-2.200    Min.   : 8240   Min.   : 11245   Min.   :10068  
##  1st Qu.: 2.350    1st Qu.:12033   1st Qu.: 26923   1st Qu.:14116  
##  Median : 4.600    Median :14279   Median : 55110   Median :16205  
##  Mean   : 4.531    Mean   :14699   Mean   : 84186   Mean   :16971  
##  3rd Qu.: 6.000    3rd Qu.:16926   3rd Qu.:101140   3rd Qu.:19864  
##  Max.   :12.800    Max.   :25474   Max.   :380405   Max.   :27674  
##    SSBChange        NumRetired         SSINum         SSIRate    
##  Min.   :-1.800   Min.   :  6775   Min.   : 1495   Min.   : 820  
##  1st Qu.: 2.400   1st Qu.: 16843   1st Qu.: 3525   1st Qu.:1770  
##  Median : 4.700   Median : 35070   Median : 6725   Median :2308  
##  Mean   : 4.486   Mean   : 53179   Mean   :12025   Mean   :2353  
##  3rd Qu.: 5.900   3rd Qu.: 61363   3rd Qu.:15077   3rd Qu.:2762  
##  Max.   :13.500   Max.   :254260   Max.   :72225   Max.   :4746  
##     SqrtMDs     
##  Min.   :11.96  
##  1st Qu.:18.34  
##  Median :29.05  
##  Mean   :35.00  
##  3rd Qu.:44.92  
##  Max.   :97.01


Subsetting Data


2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

SubsetMetroHealth83 <- MetroHealth83[c(2, 5, 14:16)]


3. Create new column names for the new data frame.

colnames(SubsetMetroHealth83) <- c("NewCity", "NHosp", "NRet", "SSIN", "SSIR")
DT::datatable(SubsetMetroHealth83, options = list(pageLength = 5))


4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

summary(SubsetMetroHealth83)
##    NewCity              NHosp             NRet             SSIN      
##  Length:83          Min.   : 2.000   Min.   :  6775   Min.   : 1495  
##  Class :character   1st Qu.: 2.000   1st Qu.: 16843   1st Qu.: 3525  
##  Mode  :character   Median : 5.000   Median : 35070   Median : 6725  
##                     Mean   : 7.193   Mean   : 53179   Mean   :12025  
##                     3rd Qu.:10.000   3rd Qu.: 61363   3rd Qu.:15077  
##                     Max.   :32.000   Max.   :254260   Max.   :72225  
##       SSIR     
##  Min.   : 820  
##  1st Qu.:1770  
##  Median :2308  
##  Mean   :2353  
##  3rd Qu.:2762  
##  Max.   :4746


5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

newdata <- subset(SubsetMetroHealth83, NewCity >= "ma" & NewCity <= "na")
newdata [ newdata >= "Ma" & newdata <= "Na" ] <- "excellent"


6. Display enough rows to see examples of all of steps 1-5 above.

DT::datatable(newdata, options = list(pageLength = 5))  


7. This question is answered at the begining: BONUS


Please email to: kleber.perez@live.com for any suggestion.