I want to start by creating a new dataframe that contains the variable I am interested in, the total weighted health survey score.
health_score<-Merge_Attempt |> select("CMS Certification Number (CCN)", "Total Weighted Health Survey Score")
Let’s describe this variable:
stat.desc(health_score$`Total Weighted Health Survey Score`)
## nbr.val nbr.null nbr.na min max range
## 1.469600e+04 1.290000e+02 5.600000e+01 0.000000e+00 1.723250e+03 1.723250e+03
## sum median mean SE.mean CI.mean.0.95 var
## 1.261394e+06 5.600000e+01 8.583245e+01 8.149997e-01 1.597502e+00 9.761443e+03
## std.dev coef.var
## 9.880002e+01 1.151080e+00
The total weighted health survey score is rather complicated. As detailed in the Centers for Medicare & Medicaid Services Five-Star Quality Rating System, the score is based on the two most recent required health inspections for nursing homes that are part of Medicaid or Medicare, with the most recent survey being weighted more heavily. Deficiencies that are identified during these inspections are assigned points based on the scope and severity. Points are also added if more than one additional visit by inspectors is necessary to confirm that deficiencies have been properly addressed. A lower total weighted health survey score is a reflection of better performance.
Let’s see if there are NA values in the dataset:
summary(health_score$`Total Weighted Health Survey Score`)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 28.00 56.00 85.83 108.00 1723.25 56
There are 56 NA values. Let’s remove these:
health_score_clean <- health_score |> filter(!is.na(`Total Weighted Health Survey Score`))
Now, let’s visualize with a histogram:
hist(health_score_clean$`Total Weighted Health Survey Score`)
That is definitely not a normal distribution. Perhaps the log transformation will make the variable more normal. I will create a new variable in my dataset:
health_score_transformed <- health_score_clean |> mutate(TWHSS_log=log(`Total Weighted Health Survey Score`))
Let’s see how the variable looks now:
hist(health_score_transformed$TWHSS_log)
The data appears much more normal after applying the log transformation!