A nationwide survey of hospital costs conducted by the US Agency for Healthcare consists of hospital records of inpatient samples. The given data is restricted to the city of Wisconsin and relates to patients in the age group 0-17 years. The agency wants to analyze the data to research on healthcare costs and their utilization.
setwd("C:/Learning/Data Analytics/Datasicence with R/Project/Healthcare - cost analysis")
HospitalCosts=read.csv("hospital_costs.csv", header=TRUE)
head(HospitalCosts)
names(HospitalCosts)
[1] "ï..AGE" "FEMALE" "LOS" "RACE" "TOTCHG" "APRDRG"
names(HospitalCosts)[1] = "AGE"
names(HospitalCosts)
[1] "AGE" "FEMALE" "LOS" "RACE" "TOTCHG" "APRDRG"
The agency wants to find the age category of people who frequently visit the hospital and has the maximum expenditure.
summary(HospitalCosts)
AGE FEMALE LOS RACE TOTCHG APRDRG
Min. : 0.0 Min. :0.00 Min. : 0 Min. :1.0 Min. : 532 Min. : 21
1st Qu.: 0.0 1st Qu.:0.00 1st Qu.: 2 1st Qu.:1.0 1st Qu.: 1216 1st Qu.:640
Median : 0.0 Median :1.00 Median : 2 Median :1.0 Median : 1536 Median :640
Mean : 5.1 Mean :0.51 Mean : 3 Mean :1.1 Mean : 2774 Mean :616
3rd Qu.:13.0 3rd Qu.:1.00 3rd Qu.: 3 3rd Qu.:1.0 3rd Qu.: 2530 3rd Qu.:751
Max. :17.0 Max. :1.00 Max. :41 Max. :6.0 Max. :48388 Max. :952
NA's :1
summary(as.factor(HospitalCosts$AGE))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
307 10 1 3 2 2 2 3 2 2 4 8 15 18 25 29 29 38
hist(HospitalCosts$AGE, main="Histogram of Age Group and their hospital visits",
xlab="Age group", border="black", col=c("light green", "dark green"), xlim=c(0,20), ylim=c(0,350))
ExpenseBasedOnAge = aggregate(TOTCHG ~ AGE, FUN=sum, data=HospitalCosts)
which.max(tapply(ExpenseBasedOnAge$TOTCHG, ExpenseBasedOnAge$TOTCHG, FUN=sum))
678118
18
barplot(tapply(ExpenseBasedOnAge$TOTCHG, ExpenseBasedOnAge$AGE, FUN=sum))
summary(HospitalCosts$FEMALE)
0 1
244 255
summary(ageGenderInflModel)
Call:
lm(formula = TOTCHG ~ AGE + FEMALE, data = HospitalCosts)
Residuals:
Min 1Q Median 3Q Max
-3403 -1444 -873 -156 44950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2719.4 261.4 10.40 < 2e-16 ***
AGE 86.0 25.5 3.37 0.00081 ***
FEMALE1 -744.2 354.7 -2.10 0.03638 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3850 on 496 degrees of freedom
Multiple R-squared: 0.0259, Adjusted R-squared: 0.0219
F-statistic: 6.58 on 2 and 496 DF, p-value: 0.00151
summary(ageGenderRaceInflModel)
Call:
lm(formula = LOS ~ AGE + FEMALE + RACE, data = HospitalCostsNA)
Residuals:
Min 1Q Median 3Q Max
-3.21 -1.21 -0.86 0.14 37.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.8569 0.2316 12.34 <2e-16 ***
AGE -0.0394 0.0226 -1.74 0.082 .
FEMALE 0.3539 0.3129 1.13 0.259
RACE2 -0.3750 1.3957 -0.27 0.788
RACE3 0.7892 3.3858 0.23 0.816
RACE4 0.5949 1.9572 0.30 0.761
RACE5 -0.8569 1.9627 -0.44 0.663
RACE6 -0.7188 2.3929 -0.30 0.764
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.4 on 491 degrees of freedom
Multiple R-squared: 0.0087, Adjusted R-squared: -0.00543
F-statistic: 0.616 on 7 and 491 DF, p-value: 0.743
Significance method - build a model using all independent variables vs dependent variable
summary(hospitalCostModel)
Call:
lm(formula = TOTCHG ~ ., data = HospitalCosts)
Residuals:
Min 1Q Median 3Q Max
-6377 -700 -174 122 43378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5218.677 507.647 10.28 <2e-16 ***
AGE 134.695 17.471 7.71 7e-14 ***
FEMALE1 -390.692 247.739 -1.58 0.12
LOS 743.152 34.923 21.28 <2e-16 ***
RACE -212.429 227.933 -0.93 0.35
APRDRG -7.791 0.682 -11.43 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2610 on 493 degrees of freedom
Multiple R-squared: 0.554, Adjusted R-squared: 0.549
F-statistic: 122 on 5 and 493 DF, p-value: <2e-16
summary(hcm1)
Call:
lm(formula = TOTCHG ~ AGE + FEMALE + LOS + APRDRG, data = HospitalCostsNA)
Residuals:
Min 1Q Median 3Q Max
-6344 -687 -168 132 43387
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4971.980 433.116 11.48 < 2e-16 ***
AGE 134.241 17.462 7.69 8.2e-14 ***
FEMALE -383.082 247.571 -1.55 0.12
LOS 743.618 34.914 21.30 < 2e-16 ***
APRDRG -7.767 0.681 -11.40 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2610 on 494 degrees of freedom
Multiple R-squared: 0.553, Adjusted R-squared: 0.549
F-statistic: 153 on 4 and 494 DF, p-value: <2e-16
summary(hcm2)
Call:
lm(formula = TOTCHG ~ AGE + LOS + APRDRG, data = HospitalCostsNA)
Residuals:
Min 1Q Median 3Q Max
-6603 -719 -169 124 43350
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4960.170 433.658 11.44 < 2e-16 ***
AGE 128.552 17.095 7.52 2.6e-13 ***
LOS 740.806 34.916 21.22 < 2e-16 ***
APRDRG -8.006 0.664 -12.05 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2620 on 495 degrees of freedom
Multiple R-squared: 0.551, Adjusted R-squared: 0.548
F-statistic: 202 on 3 and 495 DF, p-value: <2e-16
summary(hcm3)
Call:
lm(formula = TOTCHG ~ AGE + LOS, data = HospitalCostsNA)
Residuals:
Min 1Q Median 3Q Max
-4783 -1103 -458 -133 41382
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 200.7 203.5 0.99 0.32
AGE 98.0 19.2 5.10 4.8e-07 ***
LOS 734.3 39.7 18.51 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2970 on 496 degrees of freedom
Multiple R-squared: 0.419, Adjusted R-squared: 0.416
F-statistic: 179 on 2 and 496 DF, p-value: <2e-16
| Data | Approach | Model Name | Detail | R2 | adj R2 | std err | R2 - adj R2 | p-value |
|---|---|---|---|---|---|---|---|---|
| HospitalCosts | Ap1:significance | hospitalCostModel | signifi, all independent variables | 0.554 | 0.549 | 2610 | 0.005 | <2e-16 |
| HospitalCosts | Ap1:significance | hcm1 | -RACE | 0.553 | 0.549 | 2610 | 0.004 | <2e-16 |
| HospitalCosts | Ap1:significance | hcm2 | -RACE - FEMALE (gender) | 0.551 | 0.548 | 2620 | 0.003 | <2e-16 |
| HospitalCosts | Ap1:significance | hcm3 | AGE + LOS | 0.419 | 0.416 | 2970 | 0.003 | <2e-16 |
Length of Stay increases the hospital cost
All Patient Refined Diagnosis Related Groups also affects healthcare costs