#Load Libraries
library(tidyverse)
library(tidymodels)
library(ggplot2)
library(knitr)
library(kableExtra)Final.Project.Draft
Caveats/Comments
The subscript for the outline directions is just for me to be able to visually see that that text is different from the text I am writing. It will not be in the final and sorry for the visual
Part 1: The Introduction
1. Background Info on Dataset
2. Describing Data Source
Describe how the source for your data and how data was collected (randomized experiment or observations). Discuss any potential bias, if any.
Randomized experiment with 401 participants. There were only 17/401 participants with tension type headaches as opposed to migraines so conclusions drawn on the migraines might not also apply to the tension type headaches. The paper notes that there was not a placebo control, like sham acupuncture. So they cannot rule out that acupuncture was not acting as a placebo.The experiment was a randomized, controlled trial. Sources of bias include that the patients were not blinded, there was one noted conflict of interest, and the control did not involve a placebo so the treatment acupuncture could have been acting as a placebo.*
3. Provide stats details
There were 12 variables. 6 were causing the results of the other six.
4. Defining Variables
5. Background Research
Acupuncture has been around since before 2500 BCE in China
“Most headaches occur because specific pain-sensitive structures in or around the head are overstimulated or damaged. Some of these structures are inside the skull, or intracranial; the remainder are in the tissues surrounding or covering the skull, or extracranial” (Britannica, headache)
“The needles release endorphins and other hormones, stimulating the circulatory system—a part of what helps weaken headache pain.” (Dower)
“There are four main types of acupuncture:
Full body acupuncture: This standard acupuncture practice stimulates points in different parts of the body to release blockages and activate innate healing reserves.
Auricular acupuncture: A for of acupuncture applied exclusively to points of the outer ear to treat various conditions including chronic pain
Electroacupuncture: Pulses of weak electrical current are sent through acupuncture needles into acupuncture points on the skin
Community acupuncture: offered in a group setting, recipients are fully clothed, sitting in chairs, while the practitioner inserts needles into exposed areas of the body.” (Dower)
“To help treat headaches, an acupuncturist will place needles in various pressure points, including those on your head and neck.” (Dower)
6. My Overarching Question
How receiving acupuncture vs not receiving acupuncture affected the frequency of headaches after 3 months?
Part 2: My Work With The Data
Loading Libraries
Loading Dataset
load("~/Desktop/Stats/acupuncture_data_reduced.RData")Viewing the Dataset
head(data) id age sex migraine chronicity acupuncturist practice_id group pk1 pk2
1 100 47 1 1 35 12 35 1 10.75 NA
2 101 52 1 1 8 12 35 0 9.50 NA
3 104 32 1 1 14 12 35 0 16.00 NA
4 105 53 1 1 10 9 25 0 32.50 44.00
5 108 56 1 1 40 9 25 0 16.50 17.50
6 112 45 1 1 27 9 25 1 9.25 4.75
pk5 f1 f2 f5
1 NA 8 0 0.00
2 NA 4 0 0.00
3 15.33333 12 0 13.33
4 NA 21 18 0.00
5 23.25000 14 19 15.00
6 6.25000 15 8 13.00
Renaming Variables To Make Thier Names More Intuitive
#Renaming Variables
renamed_data <- data %>%
rename(migraine_type = migraine, trtmnt = group, frequency_base = f1, frequency_3m = f2, frequency_12m = f5 , severity_base = pk1, severity_3m = pk2, severity_12m = pk5)
head(renamed_data) id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100 47 1 1 35 12 35 1
2 101 52 1 1 8 12 35 0
3 104 32 1 1 14 12 35 0
4 105 53 1 1 10 9 25 0
5 108 56 1 1 40 9 25 0
6 112 45 1 1 27 9 25 1
severity_base severity_3m severity_12m frequency_base frequency_3m
1 10.75 NA NA 8 0
2 9.50 NA NA 4 0
3 16.00 NA 15.33333 12 0
4 32.50 44.00 NA 21 18
5 16.50 17.50 23.25000 14 19
6 9.25 4.75 6.25000 15 8
frequency_12m
1 0.00
2 0.00
3 13.33
4 0.00
5 15.00
6 13.00
# Making Binary 0s and 1s into Categorical Variables
new_df <- renamed_data |>
mutate(cat_trtmnt = ifelse(trtmnt == 0, "control", "acupuncture"))
head(new_df) id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100 47 1 1 35 12 35 1
2 101 52 1 1 8 12 35 0
3 104 32 1 1 14 12 35 0
4 105 53 1 1 10 9 25 0
5 108 56 1 1 40 9 25 0
6 112 45 1 1 27 9 25 1
severity_base severity_3m severity_12m frequency_base frequency_3m
1 10.75 NA NA 8 0
2 9.50 NA NA 4 0
3 16.00 NA 15.33333 12 0
4 32.50 44.00 NA 21 18
5 16.50 17.50 23.25000 14 19
6 9.25 4.75 6.25000 15 8
frequency_12m cat_trtmnt
1 0.00 acupuncture
2 0.00 control
3 13.33 control
4 0.00 control
5 15.00 control
6 13.00 acupuncture
I wanted to highlight that control treatment just means that they did not receive acupuncture. (As compared to acupuncture treatment where they did receive acupuncture)
#didn't know how to mutate two things at once so more Categorical Variable making
categ_df <- new_df |>
mutate(cat_migraine_type=ifelse(migraine_type == 0, "tension type", "migraine"))
head(categ_df) id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100 47 1 1 35 12 35 1
2 101 52 1 1 8 12 35 0
3 104 32 1 1 14 12 35 0
4 105 53 1 1 10 9 25 0
5 108 56 1 1 40 9 25 0
6 112 45 1 1 27 9 25 1
severity_base severity_3m severity_12m frequency_base frequency_3m
1 10.75 NA NA 8 0
2 9.50 NA NA 4 0
3 16.00 NA 15.33333 12 0
4 32.50 44.00 NA 21 18
5 16.50 17.50 23.25000 14 19
6 9.25 4.75 6.25000 15 8
frequency_12m cat_trtmnt cat_migraine_type
1 0.00 acupuncture migraine
2 0.00 control migraine
3 13.33 control migraine
4 0.00 control migraine
5 15.00 control migraine
6 13.00 acupuncture migraine
#Filtering Just the Control Treatment
control_df <- categ_df |>
filter(cat_trtmnt=="control")#Filtering Just the Acupuncture Treatment
acupuncture_df <- categ_df |>
filter(cat_trtmnt=="acupuncture")I could not figure out how to organize my dataframes all at once so to clarify:
Main data frame = categ_df
just control observations = control_df
just acupuncture treatment observations = acupuncture_df
7. Initial Summary Graphs
frequency_base: the headache frequency before treatment. frequency_3m: the headache frequency after three months of treatment
#Boxplot of Baseline Headache Frequency vs After 3 Months
rab=new_df$frequency_base
rayr=new_df$frequency_3m
boxplot(rab,rayr, main="Base Headache Frequency vs After 3 Months", ylab= "Headache Frequency", names = c("Base", "After 3 Months"),col="violet")It appears that that the frequency_3m variable had some upper outliars, in general and in comparison to the frequency_base variable. The frequency_base variable appears more evenly distributed. The median is lower for frequency_3m compared to frequency_base so that does not mean the acupuncture is the reason but that idea could want to be explored after looking at these histograms.
#Boxplots of 3 Month Post Treatment Headache Frequency for Control (No Acupuncture Treatment) vs Acupuncture Treatment
rab=control_df$frequency_3m
rayr=acupuncture_df$frequency_3m
boxplot(rab,rayr, main="3 Month Headache Frequency Control vs Acupuncture Treatment", ylab= "3 Month Headache Frequency", names = c("Control", "Acupuncture Treatment"),col="pink")#Bargraph of cat_trtmnt and the amount of migraines vs tension type headaches for the control and acupuncture
ggplot(categ_df, aes(x = cat_trtmnt, fill = cat_migraine_type))+
geom_bar(position="fill")There are a few more acupuncture observations than control ones but the split is pretty even. Across both groups there are far more migraines than tension type headaches.
8. Summary Stats
head(categ_df) id age sex migraine_type chronicity acupuncturist practice_id trtmnt
1 100 47 1 1 35 12 35 1
2 101 52 1 1 8 12 35 0
3 104 32 1 1 14 12 35 0
4 105 53 1 1 10 9 25 0
5 108 56 1 1 40 9 25 0
6 112 45 1 1 27 9 25 1
severity_base severity_3m severity_12m frequency_base frequency_3m
1 10.75 NA NA 8 0
2 9.50 NA NA 4 0
3 16.00 NA 15.33333 12 0
4 32.50 44.00 NA 21 18
5 16.50 17.50 23.25000 14 19
6 9.25 4.75 6.25000 15 8
frequency_12m cat_trtmnt cat_migraine_type
1 0.00 acupuncture migraine
2 0.00 control migraine
3 13.33 control migraine
4 0.00 control migraine
5 15.00 control migraine
6 13.00 acupuncture migraine
summary(categ_df$frequency_3m) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 4.00 10.00 10.94 17.00 28.00
9. Defining Parameters
Define the parameter or parameters you are trying to estimate
(I am not quite sure what parameter refers to here). I am focusing on rejecting the null that there is no difference in the headache frequency after 3 months of treatment between the acupuncture group and the control group
10. 3 Statistical Techniques
Use at least 3 DIFFERENT statistical techniques you have learned throughout this course to attempt to answer your question. Techniques may include sampling, randomization testing, ChiSquare tests, ANOVA, bootstrapping confidence intervals, linear, multiple linear, and logistic regressions, nonparametric tests, etc. Be sure you check all basic assumptions for that technique before performing any HT or CI.
𝜒2 Test
Ho: There is no association between migraine type and treatment type: receiving acupuncture or not Ha: There is an association between migraine type and treatment type: receiving acupuncture or not
#Chi^2 test between migraine_type and trtmnt
chisq.test(categ_df$cat_migraine_type, categ_df$cat_trtmnt)
Pearson's Chi-squared test with Yates' continuity correction
data: categ_df$cat_migraine_type and categ_df$cat_trtmnt
X-squared = 0.10498, df = 1, p-value = 0.7459
With a p-value of 0.7459 there is not sufficient evidence to reject the null hypothesis (There is no association between migraine type and treatment type: receiving acupuncture or not). The chi-squared value is not less than 0.05 again the null hypothesis cannot be rejected.
T Test
#Boxplot to compare distributions
ggplot(categ_df, aes(x=cat_trtmnt, y=frequency_3m, fill = cat_trtmnt))+
geom_boxplot()+
geom_jitter(alpha = .2)#Density Plot
ggplot(categ_df, aes(frequency_3m, color = cat_trtmnt))+
geom_density()I am assuming the observations are independent. Although the density plot does not appear normally distributed it is because there are so many 0 values. However, both treatments have a lot of 0 values so I am pushing onward and stating it’s ok.
#T Test
t.test(categ_df$frequency_3m ~ categ_df$cat_trtmnt)
Welch Two Sample t-test
data: categ_df$frequency_3m by categ_df$cat_trtmnt
t = -0.9435, df = 391.45, p-value = 0.346
alternative hypothesis: true difference in means between group acupuncture and group control is not equal to 0
95 percent confidence interval:
-2.4636757 0.8658484
sample estimates:
mean in group acupuncture mean in group control
10.54634 11.34526
t = -0.9435, df = 391.45, p-value = 0.346
Difference of Means CI
# Calculate observed difference in means
diff_mean_obs <- categ_df |>
# Specify the response and explanatory variables
specify(frequency_3m ~ cat_trtmnt) |> # syntax is y ~ x
calculate(stat = "diff in means", order = c("control", "acupuncture"))
diff_mean_obsResponse: frequency_3m (numeric)
Explanatory: cat_trtmnt (factor)
# A tibble: 1 × 1
stat
<dbl>
1 0.799
#Difference of Means CI
diff_ci_mean <- categ_df |>
specify(frequency_3m ~ cat_trtmnt) |> #syntax is y ~ x # set the null
generate(reps = 1000, type = "bootstrap") |> # shuffle 1000 times
calculate(stat = "diff in means", order = c("control", "acupuncture"))
# Specify to calculate a difference in means and what order of subtraction to use#Histogram of randomized distribution under the null
ggplot(diff_ci_mean, aes(stat)) +
geom_histogram() +
geom_vline(xintercept = pull(diff_mean_obs), color = "red")`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#P-value calculation
diff_ci_mean |>
get_p_value(obs_stat = diff_mean_obs, direction = "two-sided")# A tibble: 1 × 1
p_value
<dbl>
1 0.994
The null cannot be rejected. So this has not proven control and acupuncture treatment’s frequency of headaches at 3m is different.
#Visualization
diff_ci_mean |>
visualize() +
shade_p_value(diff_mean_obs, direction = "two-sided")Linear Regression
#Scatterplot of Treatment to Headache Frequency
ggplot(categ_df, aes(x=cat_trtmnt, y=frequency_3m))+
geom_point()+
theme_bw()+
labs(x="Treatment",
y="Headache Frequency After 3 Months",
title = "Scatterplot of Treatment to Headache Frequency")l_mod <- lm(data = categ_df, frequency_3m ~ cat_trtmnt)
summary(l_mod)
Call:
lm(formula = frequency_3m ~ cat_trtmnt, data = categ_df)
Residuals:
Min 1Q Median 3Q Max
-11.3453 -7.3453 -0.5463 5.6547 17.4537
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.5463 0.5907 17.853 <2e-16 ***
cat_trtmntcontrol 0.7989 0.8450 0.945 0.345
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.458 on 399 degrees of freedom
Multiple R-squared: 0.002236, Adjusted R-squared: -0.0002652
F-statistic: 0.894 on 1 and 399 DF, p-value: 0.345
#Scatterplot of Treatment to Headache Frequency Post 3 Months
ggplot(categ_df, aes(x=trtmnt, y=frequency_3m))+
geom_point()+
geom_smooth(method = "lm", se = TRUE)+
theme_bw()+
labs(x="Treatment",
y="Frequency of Headaches after 3 months",
title = "Scatterplot of Treatment to Headache Frequency Post 3 months")`geom_smooth()` using formula = 'y ~ x'
The p-value was less than 0.5 so the null hypothesis can be rejected. The R squared value is very low, so this model is not a good fit. I assumed there was independence and no colinearity. Are diagnostic plots only for multiple linear regression?
Part 3: The Conclusion
11. Statistical Analysis General Conclusion
With only focusing on the two treatment types and seeing how they affected the frequency of headaches after three months, as well as only doing these specific statistical techniques, the null hypotheses could not be rejected.
12. Important Results
13. Specific Conclusion
Write specific conclusions regarding implications of your results (useful to the general public). You can include your own personal opinions here.
14. Statistical Analysis Reflection
The data I used was not the entirety of the original dataset so that definitely affected my statistical analyses. I am also not sure if the specific type of regression I used fit the data best.
15. Bibliography
Dower, N. (2022, August 10). Acupuncture Can Help Headache and Migraine Pain. Lancastergeneralhealth.org. https://www.lancastergeneralhealth.org/health-hub-home/2022/august/acupuncture-can-help-headache-and-migraine-pain#:~:text=To%20help%20treat%20headaches%2C%20an,what%20helps%20weaken%20headache%20pain.
Encyclopædia Britannica. (n.d.). Acupuncture. Britannica School. Retrieved April 6, 2024, from https://school.eb.com/levels/high/article/acupuncture/3626
Encyclopædia Britannica. (n.d.). Headache. Britannica School. Retrieved April 6, 2024, from https://school.eb.com/levels/high/article/headache/1622