Project Description

The Royal National Hospital for Rheumatic Diseases in Bath carried out a study to determine if additional stretching exercises improved range of motion in the hips of patients with Ankylosing Spondylitis (AS). AS is a chronic form of inflammatory arthritis that limits spine and muscle motion.

Thirty-nine patients with ‘typical’ AS were randomly allocated to either a control group receiving standard treatment or a group receiving the additional stretching exercises. The study was designed so patients were twice as likely to be assigned to the group receiving additional stretching. Upon admission and again after three weeks, the patients were assessed by several measurements on each hip for flexion, extension, abduction, and rotation extent on a scale of 0-180 degrees to determine improvement. Only flexion and lateral rotation are of concern here.

The raw data provided by the client was analyzed to determine if in fact the additional stretching exercises were more effective in improving hip rotation than the standard treatment.


Research Questions

Question 1

Has the stretched group improved significantly more than the control group?

Question 2

Can a model be produced to predict the improvement for a patient?


Statistical Questions

Question 1

Is the improvement for the Treatment Group statistically significant?

Question 2

Does the fact that the data is listed by hip and not by patient introduce any error that can be accounted for?

Question 3

Is there a model that incorperates both Rotation and Flexion as to make it more intuitive for understanding?


Variables

Variable Description Type Units Type.of.Variable Range Mean Median
CPreFlex Control Pre-Flexion Control Degrees Ordinal 0-180 110 112
CPostFlex Control Post-Flexion Control Degrees Ordinal 0-180 113.8 115
CDiffFlex Control Difference-In-Flexion Control Change in Degrees Ordinal -180-180 3.792 3
CPreRot Control Pre-Rotation Control Degrees Ordinal 0-180 25 26
CPostRot Control Post-Rotation Control Degrees Ordinal 0-180 25.96 27
CDiffRot Control Difference-In-Rotation Control Change in Degrees Ordinal -180-180 0.9583 1.5
TPreFlex Treatment Pre-Flexion Explanatory Degrees Ordinal 0-180 116.5 120
TPostFlex Treatment Post-Flexion Explanatory Degrees Ordinal 0-180 124 126
TDiffFlex Treatment Difference-In-Flexion Response Change in Degrees Ordinal -180-180 7.481 6
TPreRot Treatment Pre-Rotation Explanatory Degrees Ordinal 0-180 24.78 25
TPostRot Treatment Post-Rotation Explanatory Degrees Ordinal 0-180 31.37 32
TDiffRot Treatment Difference-In-Rotation Explanatory Change in Degrees Ordinal -180-180 6.593 5

Exploratory Data Analysis

Read and Summarize Data

##     CPreFlex       CPostFlex        CPreRot         CPostRot    
##  Min.   : 81.0   Min.   : 96.0   Min.   : 4.00   Min.   : 2.00  
##  1st Qu.:105.0   1st Qu.:110.0   1st Qu.:21.75   1st Qu.:24.00  
##  Median :112.0   Median :115.0   Median :26.00   Median :27.00  
##  Mean   :110.0   Mean   :113.8   Mean   :25.00   Mean   :25.96  
##  3rd Qu.:114.2   3rd Qu.:120.0   3rd Qu.:29.00   3rd Qu.:30.25  
##  Max.   :126.0   Max.   :126.0   Max.   :36.00   Max.   :41.00  
##  NA's   :30      NA's   :30      NA's   :30      NA's   :30     
##     TPreFlex       TPostFlex      TPreRot         TPostRot    
##  Min.   : 77.0   Min.   : 88   Min.   : 2.00   Min.   :10.00  
##  1st Qu.:111.2   1st Qu.:120   1st Qu.:20.00   1st Qu.:26.00  
##  Median :120.0   Median :126   Median :25.00   Median :32.00  
##  Mean   :116.5   Mean   :124   Mean   :24.78   Mean   :31.37  
##  3rd Qu.:125.0   3rd Qu.:129   3rd Qu.:31.50   3rd Qu.:37.75  
##  Max.   :135.0   Max.   :139   Max.   :48.00   Max.   :50.00  
##                                                               
##    CDiffFlex         CDiffRot         TDiffFlex          TDiffRot     
##  Min.   :-5.000   Min.   :-9.0000   Min.   :-11.000   Min.   :-8.000  
##  1st Qu.: 0.750   1st Qu.:-1.2500   1st Qu.:  2.000   1st Qu.: 2.000  
##  Median : 3.000   Median : 1.5000   Median :  6.000   Median : 5.000  
##  Mean   : 3.792   Mean   : 0.9583   Mean   :  7.481   Mean   : 6.593  
##  3rd Qu.: 6.250   3rd Qu.: 4.0000   3rd Qu.:  9.750   3rd Qu.:10.750  
##  Max.   :30.000   Max.   :16.0000   Max.   : 49.000   Max.   :22.000  
##  NA's   :30       NA's   :30

The Two Plots Below are for visual inspection that the Treatment Group DID have improvement, the blue line signifies a threshhold for where there was zero or negative improvement. The important note here, is that the majority of Treatment Hips (Both in Flexion and Rotation) are above that line, signifying that there was most likely improvement and doing further analysis would be justified.

Plot Improvement for Treatment Flexion

Plot Improvement for Treatment Rotation


Plot of Hips (Treatment) Density

Plot of Hips (Control) Density

Noteworthy Point of Figures Above

  • There seems to be some difference between the Control and Treatment pre-scores
  • The Treatment and Control groups’ pre-scores should be more similar to each other if they were truly selected at random
  • See Appendix for more detail

Statistical Analysis

Addressing Statitsical Question 1

## 
##  Welch Two Sample t-test
## 
## data:  data$TDiffFlex and data$CDiffFlex
## t = 1.901, df = 62.271, p-value = 0.03097
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4489492       Inf
## sample estimates:
## mean of x mean of y 
##  7.481481  3.791667
  • Here we have a T-Test, which tests to see if the Treatment Group’s Improvement is Statistically larger than the Control Group’s
  • The Test resulted in a TRUE value, meaning that the Treatment Group DID improve more than the Control Group
  • This means that the stretching therapy did in fact work and we can move forward in producing a model to estimate the improvement for a patient after the treatment

Addressing Statistical Question 2

Check Collinearity between Even and Odd Hips

## [1] 0.6592946
  • We had a hunch that because the data is listed by Hip and not by Patient, that the Odd and Even Hips would essentially describe each other - thus leading to a bad model
  • To determine this we used a statistical function that calculates this in the form of a percentage (ICC Value) where a value between 0.01 and 0.50 is bad, 0.51 to 0.80 is moderate, and 0.81 to 1.00 is good
  • Our value came out to be 0.6593, which although is not terrible, it does point to the Even and Odd hips reporting too similar of values to each other because every pair (1,2), (3,4), etc. are the same patient
  • Therefore, when building our model, we will consider this and adjust the model accordingly

Addressing Statistical Question 3

Model Building and Testing

##  TPreRot TPreFlex 
## 1.058889 1.058889
  • Here we have the VIF (Variance Inflation Factor) which double checks that the variables we used are not insignificant to the model
  • Values for this Less than 10 are generally considered good for a model
  • And here we can see that for both TPreRot and TPreFlex, they are less than 2, making them good to use for a model
##                 Analysis of Variance          Response: TDiffFlex 
## 
##  Factor     d.f. Partial SS MS         F     P     
##  TPreRot     1    569.235    569.23496 12.21 0.001 
##  TPreFlex    1   2672.351   2672.35104 57.34 <.0001
##  REGRESSION  2   2816.502   1408.25096 30.22 <.0001
##  ERROR      51   2376.980     46.60744
  • Here we have an ANOVA Table, which displays various properties of the model
  • The important values here are under the “P” column, where values less than 0.05 are good
  • More specifically, the P-Values show the significance of each variable, similarly to the VIF Values above
  • The point of this test was to perform an extra check on our model before continuing

Model Summary

## [1] 0.5423148
  • Here we have the R-Squared Value, which gives the accuracy of the model to the actual data in the form of a decimal
  • Values here between 0.50 and 1.00 are considered good to continue with
  • However, because this model uses both the Even and Odd Hips, we believe that we can achieve a more accurate model by using only the Even or Odd Hips

Odd Hips Model Option

## [1] 0.4912659
  • This is the R-Squared Value for the Model using only the Odd Hips, and as you can see it is lower than the model using both, so we won’t use this one

Even Hips Model Option

## [1] 0.6401323
  • Here we have the R-Squared Value for the Model using only the Even Hips, and as you can see it is higher than the model using both, therefore we will use this model
  • It seems that the collinerarity between the even and odd hips we noted earlier is playing a factor in our model
  • By remapping the model to use only the even hips, it is more accurate
  • This may be because EVERY Even Hip was greater than its Odd Hip counterpart

Regression Plot of Even Hips Model

  • Here we have a Regression Plot, which plots the model (Red Line) to the Data (Black Dots)
  • The Shaded Region around the red line dictates the Error of the model
  • This is helpful because it shows the region where we are 95% confident the value we’re predicting with the model will be

Recomendations

Question 1

Has the stretched group improved significantly more than the control group?

There is statistical evidence to suggest the daily stretching treatment is more successful in improving range of hip motion in comparison to the standard treatment. It can therefore be concluded that this new treatment should be implemented over the previous.

Question 2

Can a model be produced to predict the improvement for a patient?

We were able to find a model that incorperates the Pre-Rotation and Pre-Flexion measurements in order to predict the improvement. This allows anyone to determine if, based on their Pre-Measurements, they would improve from having the stretch treatment.


Considerations


Appendix

R Code

TABLE1 <- read.xlsx("E:/Dropbox/case study 2/varstable.xlsx", sheetName = "Sheet1")
tableHTML(TABLE1, widths = c(100,600,100,300,200,200,200,200), theme = "rshiny-blue",rownames = FALSE) %>%
  add_css_header(css = list(c('font-size', 'border'), c('20px', '2px solid blue')),
                 headers = c(1, 2, 3, 4, 5, 6, 7, 8)) %>%
  add_css_row(css = list(c('background-color'), c('pink')), rows = 1:6) %>%
  add_css_row(css = list(c('background-color'), c('lightblue')), rows = 7:12) %>%
  add_css_column(css = list(c('border'), c('1px solid grey')), columns = 1:8)
data <- read.csv("E:/Dropbox/case study 2/data.csv")
summary(data)
plot(data$TDiffFlex ~ c(1:54))
title(main = "Improvement for Treatment Group Flexion", sub = "Blue Line Shows No Improvement Threshold")
abline(h=0, col="BLUE", lty=2)
plot(data$TDiffRot ~ c(1:54))
title(main = "Improvement for Treatment Group Rotation", sub = "Blue Line Shows No Improvement Threshold")
abline(h=0, col="BLUE", lty=2)
#Mapping Even and Odd Hips to Seperate Data Sets
even_indexes<-seq(2,54,2)
odd_indexes<-seq(1,54,2)
data_odd <- data[odd_indexes,]
data_even <- data[even_indexes,]

TDiffFlexOdd <- data_odd$TDiffFlex
TDiffFlexEven <- data_even$TDiffFlex

CDiffFlexOdd <- data_odd$CDiffFlex
CDiffFlexEven <- data_even$CDiffFlex
ggplot(data = data, aes(x=1, y=data$TPreFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with Overlayed Density Plot of Treatment Group Flexion") + guides(size = guide_legend("Density"))
ggplot(data = data, aes(x=1, y=data$CPreFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with Overlayed Density Plot of Control Group Flexion") + guides(size = guide_legend("Density"))
t.test(x=data$TDiffFlex,y=data$CDiffFlex,alternative = 'greater')
#colin <- lm(TDiffFlexEven ~ TDiffFlexOdd)
#plot(colin)
#Use ICC Package to test collinearity
require("ICC")
ICCest(x=TDiffFlexEven, y=TDiffFlexOdd, data = NULL, alpha = 0.05, CI.type = c("THD", "Smith"))$ICC
testmodel1 <- ols(TDiffFlex ~ TPreRot + TPreFlex, data = data)
vif(testmodel1)
anova(testmodel1)
testmodel2 <- lm(TDiffFlex ~ TPreRot + TPreFlex, data = data)

summary(testmodel2)$r.squared
odd_hip_model <- lm(TDiffFlexOdd ~ TPreRot + TPreFlex, data = data_odd)

summary(odd_hip_model)$r.squared
even_hip_model <- lm(TDiffFlexEven ~ TPreRot + TPreFlex, data = data_even)

summary(even_hip_model)$r.squared
ggplotRegression <- function (fit) {
ggplot(fit$model, aes_string(x, y = names(fit$model)[1])) +
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  labs(title = "Regression Plot")
}
#Here we've created a function that plots the 
#model results to the risk values so we can see if we can improve our
#model in any way as well as use this function for future models
#We can apply our model to this function
x <- 1:nrow(data_even)
ggplotRegression(even_hip_model)
ggplot(data = data_even, aes(x=1, y=TDiffFlexEven)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density"))
ggplot(data = data_even, aes(x=1, y=data_even$CDiffFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density"))
ggplot(data = data, aes(x=1, y=sample(data$TPreFlex, 24)-data$CPreFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density")) + geom_hline(yintercept = 0, col="Red", lty=2) + labs(subtitle = "Note that the Average Difference between the Treatment and Control is NOT equal to 0")
t.test(x=data$TPreFlex,y=data$CPreFlex,alternative = 'greater')
#Alternative Hypothesis = TRUE therefore there is a significant difference between the Treatment and Control Groups' Pre-Flex Scores
head(data, 10)

Extra Figures

Plot of Even Hip (Treatment) Density

ggplot(data = data_even, aes(x=1, y=TDiffFlexEven)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density"))

Plot of Even Hips (Control) Density

ggplot(data = data_even, aes(x=1, y=data_even$CDiffFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density"))

Plot of Difference between Control and Treatment Pre-Flex

ggplot(data = data, aes(x=1, y=sample(data$TPreFlex, 24)-data$CPreFlex)) + geom_boxplot() + geom_count(color="blue") + ggtitle("Boxplot with overlayed Density Plot") + guides(size = guide_legend("Density")) + geom_hline(yintercept = 0, col="Red", lty=2) + labs(subtitle = "Note that the Average Difference between the Treatment and Control is NOT equal to 0")

Extra Detail Regarding Difference Between Control and Treatment Groups

## 
##  Welch Two Sample t-test
## 
## data:  data$TPreFlex and data$CPreFlex
## t = 2.5362, df = 55.374, p-value = 0.007027
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.226952      Inf
## sample estimates:
## mean of x mean of y 
##  116.5000  109.9583

Raw Data

##    CPreFlex CPostFlex CPreRot CPostRot TPreFlex TPostFlex TPreRot TPostRot
## 1       100       100      23       17      125       126      25       36
## 2       105       103      18       12      120       127      35       37
## 3       114       115      21       24      135       135      28       40
## 4       115       116      28       27      135       135      24       34
## 5       123       126      25       29      100       113      26       30
## 6       126       121      26       27      110       115      24       26
## 7       105       110      35       33      122       123      22       42
## 8       105       102      33       24      122       125      24       37
## 9       120       123      25       30      124       126      29       29
## 10      123       118      22       27      124       135      28       31
##    CDiffFlex CDiffRot TDiffFlex TDiffRot
## 1          0       -6         1       11
## 2         -2       -6         7        2
## 3          1        3         0       12
## 4          1       -1         0       10
## 5          3        4        13        4
## 6         -5        1         5        2
## 7          5       -2         1       20
## 8         -3       -9         3       13
## 9          3        5         2        0
## 10        -5        5        11        3