CUNY SPS R WORKSHOP
Week 2 Assignment
Tage N Singh
This assignment uses a dataset of physician visits, it is located and accessed on github
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
## The Summary of the dataset is :
## Physician_Visits Non_Physician_Visits Outpatient_Visits
## Min. : 0.000 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 1.000 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 4.000 Median : 0.000 Median : 0.0000
## Mean : 5.774 Mean : 1.618 Mean : 0.7508
## 3rd Qu.: 8.000 3rd Qu.: 1.000 3rd Qu.: 0.0000
## Max. :89.000 Max. :104.000 Max. :141.0000
## Non_Physician_Outpatient_Visits ER_Visits Hospitalizations
## Min. : 0.0000 Min. : 0.0000 Min. :0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.:0.000
## Median : 0.0000 Median : 0.0000 Median :0.000
## Mean : 0.5361 Mean : 0.2635 Mean :0.296
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.:0.000
## Max. :155.0000 Max. :12.0000 Max. :8.000
## Chronic_Conditions Disability Age Black
## Min. :0.000 Min. :0.000 Min. : 66.00 Length:4406
## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.: 69.00 Class :character
## Median :1.000 Median :0.000 Median : 73.00 Mode :character
## Mean :1.542 Mean :0.204 Mean : 74.02
## 3rd Qu.:2.000 3rd Qu.:0.000 3rd Qu.: 78.00
## Max. :8.000 Max. :1.000 Max. :109.00
## Sex Married Education_Years Fam_Income
## Length:4406 Length:4406 Min. : 0.00 Min. :-1013.0
## Class :character Class :character 1st Qu.: 8.00 1st Qu.: 912.2
## Mode :character Mode :character Median :11.00 Median : 1698.5
## Mean :10.29 Mean : 2527.2
## 3rd Qu.:12.00 3rd Qu.: 3172.8
## Max. :18.00 Max. :54835.0
## Employed Private_Ins Medicaid Region
## Length:4406 Length:4406 Length:4406 Length:4406
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Health_Status
## Length:4406
## Class :character
## Mode :character
##
##
##
## The following 2 lines will present the mean,median of the Age and Education years fields.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 66.00 69.00 73.00 74.02 78.00 109.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 8.00 11.00 10.29 12.00 18.00
## Veryifing the row count
## [1] 4406
Answer to Question 2
## Hospitalizations Chronic_Conditions Disability Age
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. : 66.00
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 69.00
## Median :0.0000 Median :1.000 Median :0.0000 Median : 73.00
## Mean :0.2233 Mean :1.251 Mean :0.1433 Mean : 73.99
## 3rd Qu.:0.0000 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.: 78.00
## Max. :7.0000 Max. :8.000 Max. :1.0000 Max. :109.00
## Black Sex Married Education_Years
## Length:1500 Length:1500 Length:1500 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 8.0
## Mode :character Mode :character Mode :character Median :11.0
## Mean :10.5
## 3rd Qu.:12.0
## Max. :18.0
## Fam_Income Employed Private_Ins Medicaid
## Min. : 0.0 Length:1500 Length:1500 Length:1500
## 1st Qu.: 975.8 Class :character Class :character Class :character
## Median : 1771.0 Mode :character Mode :character Mode :character
## Mean : 2615.9
## 3rd Qu.: 3217.2
## Max. :54835.0
## Region
## Length:1500
## Class :character
## Mode :character
##
##
##
## Veryifing the row count
## [1] 1500
Answer to Question 3
## [1] "Hospitalizations" "Chronic_Conditions" "Disability"
## [4] "Age" "Black" "Sex"
## [7] "Married" "Education_Years" "Fam_Income"
## [10] "Employed" "Private_Ins" "Medicaid"
## [13] "Region"
Answer to Question 4
## The Summary of the Q3 dataset with new column names is
## hosps chcond disable time_on_earth
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. : 66.00
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 69.00
## Median :0.0000 Median :1.000 Median :0.0000 Median : 73.00
## Mean :0.2233 Mean :1.251 Mean :0.1433 Mean : 73.99
## 3rd Qu.:0.0000 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.: 78.00
## Max. :7.0000 Max. :8.000 Max. :1.0000 Max. :109.00
## race gender hooked_up eduyrs
## Length:1500 Length:1500 Length:1500 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 8.0
## Mode :character Mode :character Mode :character Median :11.0
## Mean :10.5
## 3rd Qu.:12.0
## Max. :18.0
## faminc working privins govtins
## Min. : 0.0 Length:1500 Length:1500 Length:1500
## 1st Qu.: 975.8 Class :character Class :character Class :character
## Median : 1771.0 Mode :character Mode :character Mode :character
## Mean : 2615.9
## 3rd Qu.: 3217.2
## Max. :54835.0
## where
## Length:1500
## Class :character
## Mode :character
##
##
##
## Verifying the row count
## [1] 1500
## --------------------------------------------------------------------------------------------------------------
## --------------------------------------------------------------------------------------------------------------
## The mean, median of the Age column now named time_on_earth is
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 66.00 69.00 73.00 73.99 78.00 109.00
## --------------------------------------------------------------------------------------------------------------
## The mean, median of the ORIGINAL AGE column is
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 66.00 69.00 73.00 74.02 78.00 109.00
## Note that the mean and median difference between the full dataset of records and a dataset of 1500 records is miniscule
## --------------------------------------------------------------------------------------------------------------
## --------------------------------------------------------------------------------------------------------------
## The mean, median of the Education column now named eduyrs is
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 8.0 11.0 10.5 12.0 18.0
## --------------------------------------------------------------------------------------------------------------
## The mean, median of the ORIGINAL EDUCATION column is
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 8.00 11.00 10.29 12.00 18.00
## Note that the mean and median difference between the full dataset of records and a dataset of 1500 records is miniscule
## --------------------------------------------------------------------------------------------------------------
## --------------------------------------------------------------------------------------------------------------
Answer to Question 5
## Replacing calues in colmn 'where
## Replacing values of other with smoething_else
## Replacing values of Northeast with north_east
## Replacing values of midwest with middle_country
Answer to Question 6
## Our Original Dataset
## Physician_Visits Non_Physician_Visits Outpatient_Visits
## 1 0 0 0
## 2 8 0 0
## 3 0 0 0
## 4 1 0 0
## 5 1 0 0
## 6 0 0 0
## 7 2 0 0
## 8 1 0 0
## 9 5 0 0
## 10 2 5 20
## Non_Physician_Outpatient_Visits ER_Visits Hospitalizations
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## 7 0 0 0
## 8 0 0 0
## 9 0 0 0
## 10 0 0 0
## Chronic_Conditions Disability Age Black Sex Married Education_Years
## 1 1 1 79 yes Female No 4
## 2 0 0 77 no Female No 5
## 3 2 0 92 no Female No 8
## 4 0 0 66 no Female No 7
## 5 0 0 70 no Female Yes 14
## 6 0 0 67 no Male Yes 12
## 7 0 1 92 no Male Yes 13
## 8 0 0 68 no Male Yes 14
## 9 0 0 66 no Male Yes 16
## 10 0 0 70 yes Male Yes 12
## Fam_Income Employed Private_Ins Medicaid Region Health_Status
## 1 414 No No No Midwest Excellent
## 2 384 No No Yes Midwest Excellent
## 3 480 No No No Midwest Excellent
## 4 625 No No No Midwest Excellent
## 5 4175 No No No Midwest Excellent
## 6 1992 No No No Midwest Excellent
## 7 1857 No No No Midwest Excellent
## 8 4175 No No No Midwest Excellent
## 9 10300 Yes No No Midwest Excellent
## 10 10421 Yes No No Midwest Excellent
## Resultant dataset from Question 2
## Hospitalizations Chronic_Conditions Disability Age Black Sex Married
## 1 0 1 1 79 yes Female No
## 2 0 0 0 77 no Female No
## 3 0 2 0 92 no Female No
## 4 0 0 0 66 no Female No
## 5 0 0 0 70 no Female Yes
## 6 0 0 0 67 no Male Yes
## 7 0 0 1 92 no Male Yes
## 8 0 0 0 68 no Male Yes
## 9 0 0 0 66 no Male Yes
## 10 0 0 0 70 yes Male Yes
## Education_Years Fam_Income Employed Private_Ins Medicaid Region
## 1 4 414 No No No Midwest
## 2 5 384 No No Yes Midwest
## 3 8 480 No No No Midwest
## 4 7 625 No No No Midwest
## 5 14 4175 No No No Midwest
## 6 12 1992 No No No Midwest
## 7 13 1857 No No No Midwest
## 8 14 4175 No No No Midwest
## 9 16 10300 Yes No No Midwest
## 10 12 10421 Yes No No Midwest
## Resultant dataset from Question 3
## hosps chcond disable time_on_earth race gender hooked_up eduyrs faminc
## 1 0 1 1 79 yes Female No 4 414
## 2 0 0 0 77 no Female No 5 384
## 3 0 2 0 92 no Female No 8 480
## 4 0 0 0 66 no Female No 7 625
## 5 0 0 0 70 no Female Yes 14 4175
## 6 0 0 0 67 no Male Yes 12 1992
## 7 0 0 1 92 no Male Yes 13 1857
## 8 0 0 0 68 no Male Yes 14 4175
## 9 0 0 0 66 no Male Yes 16 10300
## 10 0 0 0 70 yes Male Yes 12 10421
## working privins govtins where
## 1 No No No Midwest
## 2 No No Yes Midwest
## 3 No No No Midwest
## 4 No No No Midwest
## 5 No No No Midwest
## 6 No No No Midwest
## 7 No No No Midwest
## 8 No No No Midwest
## 9 Yes No No Midwest
## 10 Yes No No Midwest
## Resultant dataset from Question 5
## hosps chcond disable time_on_earth race gender hooked_up eduyrs faminc
## 1 0 1 1 79 yes Female No 4 414
## 2 0 0 0 77 no Female No 5 384
## 3 0 2 0 92 no Female No 8 480
## 4 0 0 0 66 no Female No 7 625
## 5 0 0 0 70 no Female Yes 14 4175
## 6 0 0 0 67 no Male Yes 12 1992
## 7 0 0 1 92 no Male Yes 13 1857
## 8 0 0 0 68 no Male Yes 14 4175
## 9 0 0 0 66 no Male Yes 16 10300
## 10 0 0 0 70 yes Male Yes 12 10421
## working privins govtins where
## 1 No No No middle_country
## 2 No No Yes middle_country
## 3 No No No middle_country
## 4 No No No middle_country
## 5 No No No middle_country
## 6 No No No middle_country
## 7 No No No middle_country
## 8 No No No middle_country
## 9 Yes No No middle_country
## 10 Yes No No middle_country