Final.Project.Draft

Author

B Braden

Caveats/Comments

The subscript for the outline directions is just for me to be able to visually see that that text is different from the text I am writing. It will not be in the final and sorry for the visual

Part 1: The Introduction

1. Background Info on Dataset

2. Describing Data Source

_{Describe how the source for your data and how data was collected (randomized experiment or observations). Discuss any potential bias, if any.}

Randomized experiment with 401 participants. There were only 17/401 participants with tension type headaches as opposed to migraines so conclusions drawn on the migraines might not also apply to the tension type headaches. The paper notes that there was not a placebo control, like sham acupuncture. So they cannot rule out that acupuncture was not acting as a placebo.The experiment was a randomized, controlled trial. Sources of bias include that the patients were not blinded, there was one noted conflict of interest, and the control did not involve a placebo so the treatment acupuncture could have been acting as a placebo.*

3. Provide stats details

There were 12 variables. 6 were causing the results of the other six.

4. Defining Variables

Contributors: Steve Grambow and Megan Neely

5. Background Research

Acupuncture has been around since before 2500 BCE in China
“Most headaches occur because specific pain-sensitive structures in or around the head are overstimulated or damaged. Some of these structures are inside the skull, or intracranial; the remainder are in the tissues surrounding or covering the skull, or extracranial” (Britannica, headache)
“The needles release endorphins and other hormones, stimulating the circulatory system—a part of what helps weaken headache pain.” (Dower)
“There are four main types of acupuncture:
- Full body acupuncture: This standard acupuncture practice stimulates points in different parts of the body to release blockages and activate innate healing reserves.
- Auricular acupuncture: A for of acupuncture applied exclusively to points of the outer ear to treat various conditions including chronic pain
- Electroacupuncture: Pulses of weak electrical current are sent through acupuncture needles into acupuncture points on the skin
- Community acupuncture: offered in a group setting, recipients are fully clothed, sitting in chairs, while the practitioner inserts needles into exposed areas of the body.” (Dower)
“To help treat headaches, an acupuncturist will place needles in various pressure points, including those on your head and neck.” (Dower)

6. My Overarching Question

How receiving acupuncture vs not receiving acupuncture affected the frequency of headaches after 3 months?

Part 2: My Work With The Data

Loading Libraries

#Load Libraries 
library(tidyverse)
library(tidymodels)
library(ggplot2)
library(knitr)
library(kableExtra)

Loading Dataset

load("~/Desktop/Stats/acupuncture_data_reduced.RData")

Viewing the Dataset

head(data)

   id age sex migraine chronicity acupuncturist practice_id group   pk1   pk2
1 100  47   1        1         35            12          35     1 10.75    NA
2 101  52   1        1          8            12          35     0  9.50    NA
3 104  32   1        1         14            12          35     0 16.00    NA
4 105  53   1        1         10             9          25     0 32.50 44.00
5 108  56   1        1         40             9          25     0 16.50 17.50
6 112  45   1        1         27             9          25     1  9.25  4.75
       pk5 f1 f2    f5
1       NA  8  0  0.00
2       NA  4  0  0.00
3 15.33333 12  0 13.33
4       NA 21 18  0.00
5 23.25000 14 19 15.00
6  6.25000 15  8 13.00

Renaming Variables To Make Thier Names More Intuitive

#Renaming Variables
renamed_data <- data %>%
  rename(migraine_type = migraine, trtmnt = group, frequency_base = f1, frequency_3m = f2, frequency_12m = f5 , severity_base = pk1, severity_3m = pk2, severity_12m = pk5)
head(renamed_data)

   id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100  47   1             1         35            12          35      1
2 101  52   1             1          8            12          35      0
3 104  32   1             1         14            12          35      0
4 105  53   1             1         10             9          25      0
5 108  56   1             1         40             9          25      0
6 112  45   1             1         27             9          25      1
  severity_base severity_3m severity_12m frequency_base frequency_3m
1         10.75          NA           NA              8            0
2          9.50          NA           NA              4            0
3         16.00          NA     15.33333             12            0
4         32.50       44.00           NA             21           18
5         16.50       17.50     23.25000             14           19
6          9.25        4.75      6.25000             15            8
  frequency_12m
1          0.00
2          0.00
3         13.33
4          0.00
5         15.00
6         13.00

# Making Binary 0s and 1s into Categorical Variables
new_df <- renamed_data |>
  mutate(cat_trtmnt = ifelse(trtmnt == 0, "control", "acupuncture"))
head(new_df)

   id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100  47   1             1         35            12          35      1
2 101  52   1             1          8            12          35      0
3 104  32   1             1         14            12          35      0
4 105  53   1             1         10             9          25      0
5 108  56   1             1         40             9          25      0
6 112  45   1             1         27             9          25      1
  severity_base severity_3m severity_12m frequency_base frequency_3m
1         10.75          NA           NA              8            0
2          9.50          NA           NA              4            0
3         16.00          NA     15.33333             12            0
4         32.50       44.00           NA             21           18
5         16.50       17.50     23.25000             14           19
6          9.25        4.75      6.25000             15            8
  frequency_12m  cat_trtmnt
1          0.00 acupuncture
2          0.00     control
3         13.33     control
4          0.00     control
5         15.00     control
6         13.00 acupuncture

I wanted to highlight that control treatment just means that they did not receive acupuncture. (As compared to acupuncture treatment where they did receive acupuncture)

#didn't know how to mutate two things at once so more Categorical Variable making 
categ_df <- new_df |>
  mutate(cat_migraine_type=ifelse(migraine_type == 0, "tension type", "migraine"))
head(categ_df)

   id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100  47   1             1         35            12          35      1
2 101  52   1             1          8            12          35      0
3 104  32   1             1         14            12          35      0
4 105  53   1             1         10             9          25      0
5 108  56   1             1         40             9          25      0
6 112  45   1             1         27             9          25      1
  severity_base severity_3m severity_12m frequency_base frequency_3m
1         10.75          NA           NA              8            0
2          9.50          NA           NA              4            0
3         16.00          NA     15.33333             12            0
4         32.50       44.00           NA             21           18
5         16.50       17.50     23.25000             14           19
6          9.25        4.75      6.25000             15            8
  frequency_12m  cat_trtmnt cat_migraine_type
1          0.00 acupuncture          migraine
2          0.00     control          migraine
3         13.33     control          migraine
4          0.00     control          migraine
5         15.00     control          migraine
6         13.00 acupuncture          migraine

#Filtering Just the Control Treatment 
control_df <- categ_df |>
  filter(cat_trtmnt=="control")

#Filtering Just the Acupuncture Treatment
acupuncture_df <- categ_df |>
  filter(cat_trtmnt=="acupuncture")

I could not figure out how to organize my dataframes all at once so to clarify:

Main data frame = categ_df
just control observations = control_df
just acupuncture treatment observations = acupuncture_df

7. Initial Summary Graphs

frequency_base: the headache frequency before treatment. frequency_3m: the headache frequency after three months of treatment

#Boxplot of Baseline Headache Frequency vs After 3 Months
rab=new_df$frequency_base
rayr=new_df$frequency_3m
boxplot(rab,rayr, main="Base Headache Frequency vs After 3 Months", ylab= "Headache Frequency", names = c("Base", "After 3 Months"),col="violet")

It appears that that the frequency_3m variable had some upper outliars, in general and in comparison to the frequency_base variable. The frequency_base variable appears more evenly distributed. The median is lower for frequency_3m compared to frequency_base so that does not mean the acupuncture is the reason but that idea could want to be explored after looking at these histograms.

#Boxplots of 3 Month Post Treatment Headache Frequency for Control (No Acupuncture Treatment) vs Acupuncture Treatment
rab=control_df$frequency_3m
rayr=acupuncture_df$frequency_3m
boxplot(rab,rayr, main="3 Month Headache Frequency Control vs Acupuncture Treatment", ylab= "3 Month Headache Frequency", names = c("Control", "Acupuncture Treatment"),col="pink")

#Bargraph of cat_trtmnt and the amount of migraines vs tension type headaches for the control and acupuncture
ggplot(categ_df, aes(x = cat_trtmnt, fill = cat_migraine_type))+ 
  geom_bar(position="fill")

There are a few more acupuncture observations than control ones but the split is pretty even. Across both groups there are far more migraines than tension type headaches.

8. Summary Stats

head(categ_df)

   id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100  47   1             1         35            12          35      1
2 101  52   1             1          8            12          35      0
3 104  32   1             1         14            12          35      0
4 105  53   1             1         10             9          25      0
5 108  56   1             1         40             9          25      0
6 112  45   1             1         27             9          25      1
  severity_base severity_3m severity_12m frequency_base frequency_3m
1         10.75          NA           NA              8            0
2          9.50          NA           NA              4            0
3         16.00          NA     15.33333             12            0
4         32.50       44.00           NA             21           18
5         16.50       17.50     23.25000             14           19
6          9.25        4.75      6.25000             15            8
  frequency_12m  cat_trtmnt cat_migraine_type
1          0.00 acupuncture          migraine
2          0.00     control          migraine
3         13.33     control          migraine
4          0.00     control          migraine
5         15.00     control          migraine
6         13.00 acupuncture          migraine

summary(categ_df$frequency_3m)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    4.00   10.00   10.94   17.00   28.00

9. Defining Parameters

_{Define the parameter or parameters you are trying to estimate}

(I am not quite sure what parameter refers to here). I am focusing on rejecting the null that there is no difference in the headache frequency after 3 months of treatment between the acupuncture group and the control group

10. 3 Statistical Techniques

_{Use at least 3 DIFFERENT statistical techniques you have learned throughout this course to attempt to answer your question. Techniques may include sampling, randomization testing, ChiSquare tests, ANOVA, bootstrapping confidence intervals, linear, multiple linear, and logistic regressions, nonparametric tests, etc. Be sure you check all basic assumptions for that technique before performing any HT or CI.}

𝜒2 Test

Ho: There is no association between migraine type and treatment type: receiving acupuncture or not Ha: There is an association between migraine type and treatment type: receiving acupuncture or not

#Chi^2 test between migraine_type and trtmnt
chisq.test(categ_df$cat_migraine_type, categ_df$cat_trtmnt)


    Pearson's Chi-squared test with Yates' continuity correction

data:  categ_df$cat_migraine_type and categ_df$cat_trtmnt
X-squared = 0.10498, df = 1, p-value = 0.7459

With a p-value of 0.7459 there is not sufficient evidence to reject the null hypothesis (There is no association between migraine type and treatment type: receiving acupuncture or not). The chi-squared value is not less than 0.05 again the null hypothesis cannot be rejected.

T Test

#Boxplot to compare distributions
ggplot(categ_df, aes(x=cat_trtmnt, y=frequency_3m, fill = cat_trtmnt))+
  geom_boxplot()+
  geom_jitter(alpha = .2)

#Density Plot
ggplot(categ_df, aes(frequency_3m, color = cat_trtmnt))+
  geom_density()

I am assuming the observations are independent. Although the density plot does not appear normally distributed it is because there are so many 0 values. However, both treatments have a lot of 0 values so I am pushing onward and stating it’s ok.

#T Test
t.test(categ_df$frequency_3m ~ categ_df$cat_trtmnt)


    Welch Two Sample t-test

data:  categ_df$frequency_3m by categ_df$cat_trtmnt
t = -0.9435, df = 391.45, p-value = 0.346
alternative hypothesis: true difference in means between group acupuncture and group control is not equal to 0
95 percent confidence interval:
 -2.4636757  0.8658484
sample estimates:
mean in group acupuncture     mean in group control 
                 10.54634                  11.34526

t = -0.9435, df = 391.45, p-value = 0.346

Difference of Means CI

# Calculate observed difference in means
diff_mean_obs <- categ_df |>
  # Specify the response and explanatory variables
  specify(frequency_3m ~ cat_trtmnt) |>           # syntax is y ~ x
  calculate(stat = "diff in means", order = c("control", "acupuncture")) 
diff_mean_obs

Response: frequency_3m (numeric)
Explanatory: cat_trtmnt (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 0.799

#Difference of Means CI
diff_ci_mean <- categ_df |>
  specify(frequency_3m ~ cat_trtmnt) |>                     #syntax is y ~ x          # set the null
  generate(reps = 1000, type = "bootstrap") |>    # shuffle 1000 times
  calculate(stat = "diff in means", order = c("control", "acupuncture")) 
# Specify to calculate a difference in means and what order of subtraction to use

#Histogram of randomized distribution under the null
ggplot(diff_ci_mean, aes(stat)) +
  geom_histogram() +
  geom_vline(xintercept = pull(diff_mean_obs), color = "red")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#P-value calculation
diff_ci_mean |>
  get_p_value(obs_stat = diff_mean_obs, direction = "two-sided")

# A tibble: 1 × 1
  p_value
    <dbl>
1   0.994

The null cannot be rejected. So this has not proven control and acupuncture treatment’s frequency of headaches at 3m is different.

#Visualization
diff_ci_mean |>
  visualize() +
  shade_p_value(diff_mean_obs, direction = "two-sided")

Linear Regression

#Scatterplot of Treatment to Headache Frequency
ggplot(categ_df, aes(x=cat_trtmnt, y=frequency_3m))+
  geom_point()+
   theme_bw()+
  labs(x="Treatment", 
       y="Headache Frequency After 3 Months",
       title = "Scatterplot of Treatment to Headache Frequency")

l_mod <- lm(data = categ_df, frequency_3m ~ cat_trtmnt)
summary(l_mod)


Call:
lm(formula = frequency_3m ~ cat_trtmnt, data = categ_df)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.3453  -7.3453  -0.5463   5.6547  17.4537 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        10.5463     0.5907  17.853   <2e-16 ***
cat_trtmntcontrol   0.7989     0.8450   0.945    0.345    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.458 on 399 degrees of freedom
Multiple R-squared:  0.002236,  Adjusted R-squared:  -0.0002652 
F-statistic: 0.894 on 1 and 399 DF,  p-value: 0.345

#Scatterplot of Treatment to Headache Frequency Post 3 Months
ggplot(categ_df, aes(x=trtmnt, y=frequency_3m))+
  geom_point()+
  geom_smooth(method = "lm", se = TRUE)+
   theme_bw()+
  labs(x="Treatment", 
       y="Frequency of Headaches after 3 months",
       title = "Scatterplot of Treatment to Headache Frequency Post 3 months")

`geom_smooth()` using formula = 'y ~ x'

The p-value was less than 0.5 so the null hypothesis can be rejected. The R squared value is very low, so this model is not a good fit. I assumed there was independence and no colinearity. Are diagnostic plots only for multiple linear regression?

Part 3: The Conclusion

11. Statistical Analysis General Conclusion

With only focusing on the two treatment types and seeing how they affected the frequency of headaches after three months, as well as only doing these specific statistical techniques, the null hypotheses could not be rejected.

12. Important Results

13. Specific Conclusion

^{Write specific conclusions regarding implications of your results (useful to the general public). You can include your own personal opinions here.}

14. Statistical Analysis Reflection

The data I used was not the entirety of the original dataset so that definitely affected my statistical analyses. I am also not sure if the specific type of regression I used fit the data best.

15. Bibliography

Dower, N. (2022, August 10). Acupuncture Can Help Headache and Migraine Pain. Lancastergeneralhealth.org. https://www.lancastergeneralhealth.org/health-hub-home/2022/august/acupuncture-can-help-headache-and-migraine-pain#:~:text=To%20help%20treat%20headaches%2C%20an,what%20helps%20weaken%20headache%20pain.
Encyclopædia Britannica. (n.d.). Acupuncture. Britannica School. Retrieved April 6, 2024, from https://school.eb.com/levels/high/article/acupuncture/3626
Encyclopædia Britannica. (n.d.). Headache. Britannica School. Retrieved April 6, 2024, from https://school.eb.com/levels/high/article/headache/1622