R Week 2 Assignment

R Bridge Course Week 2 Assignment 
One of the challenges in working with data is wrangling.  In this assignment we will use R to perform this task. 
Here is a list of data sets:  http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list) 

Please select one, download it and perform the following tasks: 
1. Use the summary function to gain an overview of the data set.  Then display the mean and median for at least two attributes. 

doc<-read.csv("doctor.csv", header = TRUE)
> summary(doc)
       X           doctor         children         access      
 Min.   :  1   Min.   : 0.00   Min.   :1.000   Min.   :0.0000  
 1st Qu.:122   1st Qu.: 0.00   1st Qu.:1.000   1st Qu.:0.2500  
 Median :243   Median : 1.00   Median :2.000   Median :0.3500  
 Mean   :243   Mean   : 1.61   Mean   :2.264   Mean   :0.3812  
 3rd Qu.:364   3rd Qu.: 2.00   3rd Qu.:3.000   3rd Qu.:0.5000  
 Max.   :485   Max.   :48.00   Max.   :9.000   Max.   :0.9200  
     health         
 Min.   :-1.524000  
 1st Qu.:-1.066000  
 Median :-0.421000  
 Mean   :-0.000041  
 3rd Qu.: 0.657000  
 Max.   : 7.217000  
> mean(doc$health)
[1] -4.123711e-05
> median(doc$health)
[1] -0.421

2. Create a new data frame with a subset of the columns and rows.  Make sure to rename it. 

doc.sub <- subset(doc, doctor > 5 & children < 2)

3. Create new column names for the new data frame. 

names(doc.sub)[1]<-"Record"
names(doc.sub)[2]<-"DocVisits"
names(doc.sub)[3]<-"NumOfChildren"
names(doc.sub)[4]<-"AccessRate"
names(doc.sub)[5]<-"HealthRate"

4. Use the summary function to create an overview of your new data frame.  The print the mean and median for the same two attributes.  Please compare. 

summary(doc.sub)
     Record        DocVisits  NumOfChildren   AccessRate       HealthRate   
 Min.   :  5.0   Min.   : 6   Min.   :1     Min.   :0.0000   Min.   :0.000  
 1st Qu.:109.0   1st Qu.: 7   1st Qu.:1     1st Qu.:0.3300   1st Qu.:0.000  
 Median :248.0   Median : 9   Median :1     Median :0.4200   Median :2.000  
 Mean   :233.2   Mean   :12   Mean   :1     Mean   :0.4376   Mean   :1.882  
 3rd Qu.:320.0   3rd Qu.:11   3rd Qu.:1     3rd Qu.:0.6700   3rd Qu.:2.000  
 Max.   :483.0   Max.   :48   Max.   :1     Max.   :0.6700   Max.   :4.000  
> mean(doc.sub$HealthRate)
[1] 1.882353
> median(doc.sub$HealthRate)
[1] 2

Based on data housholds with less than 2 children and more doctor visits have better health the average households.

5. For at least 3 values in a column please rename so that every value in that column is renamed.  For example, suppose I have 20 values of the letter “e” in one column.  Rename those values so that all 20 would show as “excellent”. 

doc.sub$HealthRate[doc.sub$HealthRate < 0]<-0
doc.sub$HealthRate[doc.sub$HealthRate > 0 & doc.sub$HealthRate < 2]<-2
doc.sub$HealthRate[doc.sub$HealthRate > 2]<-4

6. Display enough rows to see examples of all of steps 1-5 above. 

head(doc)
  X doctor children access health
1 1      0        1   0.50  0.495
2 2      1        3   0.17  0.520
3 3      0        4   0.42 -1.227
4 4      0        2   0.33 -1.524
5 5     11        1   0.67  0.173
6 6      3        1   0.25 -0.905

head(doc.sub)

Record DocVisits NumOfChildren AccessRate HealthRate
5        5        11             1       0.67          2
8        8         6             1       0.67          2
13      13        15             1       0.67          0
36      36         7             1       0.58          2
109    109         6             1       0.67          4
150    150         9             1       0.08          0


7. BONUS – place the original .csv in a github file and have R read from the link.  This will be a very useful skill as you progress in your data science education and career. 

require(RCurl)
url.data <- read.csv(text=getURL("https://raw.githubusercontent.com/apag101/MSDSBridge/master/temparature.csv"), header = T)
 
Please submit your .rmd file and the .csv file as well as a link to your RPubs. 

End Data