My final project is centered on determining the differences between patients with hypothyroidism and patients without hypothyroidism. It will be divided into 3 main sections: blood test results, confounding health conditions, and medical interventions. This will use the same hypothyroidism dataset used in all of my data dives, which seems to be a dataset for machine learning purposes, which makes it more ideal for determining the differences between patients with and without hypothyroidism.

First, the dataset and necessary libraries were loaded in:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

hypothyroid <- read_delim("./hypothyroid data set.csv", delim = ",")

## Rows: 3163 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (8): hypothyroid, sex, TSH_measured, T3_measured, TT4_measured, T4U_mea...
## dbl  (7): age, TSH, T3, TT4, T4U, FTI, TBG
## lgl (11): on_thyroxine, query_on_thyroxine, on_antithyroid_medication, thyro...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Visualization and Hypothesis Testing of Observed Differences in Hypothyroidism Patients

##Comparing Blood Test Results in Patients With and Without Hypothyroidism

I am comparing TSH and TT4 to confirm existing literature about differences in TSH and TT4 levels between those with hypothyroidism and those without hypothyroidism. I would have liked to expand this to the TBG or FTI tests, but presentation time limits force me to keep my analysis more limited. A subset of the data will be created for this part first:

part1<-hypothyroid |>
  select(hypothyroid,TSH,TT4) |>
  filter(is.na(TSH)==FALSE) |> #removing null TSH rows
  filter(is.na(TT4)==FALSE) |> #removing null TT4 rows
  filter(TSH<=150) #to handle TSH outliers

TSH Test

The TSH test measures the level of Thyroid Suppressing Hormone in the blood. Existing literature states that TSH is higher in patients with hypothyroidism than in patients without it. First, a visualization of average TSH levels in patients with and without hypothyroidism will be created:

part1|>
  group_by(hypothyroid)|>
  summarize(meanTSH=mean(TSH)) |>
  ggplot(aes(x=hypothyroid,y=meanTSH))+geom_bar(stat="identity",fill="darkgreen")+theme_bw()+labs(title="Mean TSH for Patients With and Without Hypothyroidism",x="Patient Status for Hypothyroidism",y="Average TSH Test Result Value (microunits/milliliter)")

Visually, there is a massive difference in mean TSH value between patients with hypothyroidism and those without, and patients with hypothyroidism which matches with what is expected based on existing literature.

However, determining if there is statistical evidence that supports rejecting the idea that this difference is due to random chance requires hypothesis testing. The null hypothesis for this test will be that there is no difference between hypothyroidism and non-hypothyroidism patients in the average TSH blood test result. I will conduct a 2-sample t-test with a p-value of .01, or 1% chance of this difference happening randomly due to chance based on the sampling distribution, as the standard for rejecting the null hypothesis because I want to be highly certain that the observed differences are not due to random chance. This p-value cut-off will be used for all hypothesis testing in this project, and assumptions for all hypothesis testing like sample size of each group and each group being a random sample from a normal population should be met:

test1<-t.test(TSH ~ hypothyroid, data = part1)
test1

## 
##  Welch Two Sample t-test
## 
## data:  TSH by hypothyroid
## t = 13.537, df = 135.48, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  36.91696 49.54918
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                 45.608824                  2.375755

The p-value is 2.2e-16, which is incredibly low (which is to be expected given the magnitude of differences between average TSH blood test levels). This means the null hypothesis that the average TSH level is no different between hypothyroidism and non-hypothyroidism patients can be rejected. Given that .5 - 4 microunits per milliliter is the normal range of TSH values, it could have easily been assumed that the null hypothesis that both hypothyroidism and non-hypothyroidism patients had the TSH average TSH level could be rejected, but running the test is good practice.

TT4 Test

The TT4 test measures the level of both free and bound thyroxine, a thyroid hormone, in the blood. Existing literature states that total thyroxine levels are lower in patients with hypothyroidism than in patients without it. First, a visualization of average TT4 results in patients with and without hypothyroidism will be created:

part1|>
  group_by(hypothyroid)|>
  summarize(meanTT4=mean(TT4)) |>
  ggplot(aes(x=hypothyroid,y=meanTT4))+geom_bar(stat="identity",fill="navy")+theme_bw()+labs(title="Mean Total Thyroxine for Patients With and Without Hypothyroidism",x="Patient Status for Hypothyroidism",y="Average TT4 Test Result Value (nanomoles/liter)")

Visually, there is a sizable difference in mean TT4 value between patients with hypothyroidism and those without, and lower values for patients with hypothyroidism which matches with what is expected based on existing literature.

Next, a 2-sample t-test with a null hypothesis that average TT4 levels do not differ between hypothyroidism and non-hypothyroidism patients will be conducted:

test2<-t.test(TT4 ~ hypothyroid, data = part1)
test2

## 
##  Welch Two Sample t-test
## 
## data:  TT4 by hypothyroid
## t = -29.465, df = 170.03, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  -80.18967 -70.11956
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                  37.70074                 112.85535

The p-value is 2.2e-16, which is incredibly low (again, there was a large difference between groups so this low of p-value is to be expected). This means the null hypothesis that the average TT4 level is no different between hypothyroidism and non-hypothyroidism patients can be rejected. 57-148 nanomoles per liter is the normal range of TT4 test results, so the mean TT4 test result being low for TT4 patients also could have indicated that there was a notable difference even without doing hypothesis testing.

Conclusions and Visualizations from the Two Blood Tests

Finally, to highlight the blood test result differences between patients with and without hypothyroidism, a graph plotting TSH and TT4 test results together with color based on hypothyroidism status will be displayed:

part1|>
  ggplot()+geom_point(mapping=aes(x=TSH,y=TT4,color=hypothyroid, shape=hypothyroid))+labs(title="TT4 vs. TSH Test Results Grouped By Hypothyroidism Status",x="TSH Test Results (microunits per milliliter)",y="TT4 Test Results (nanomoles per liter)")+theme_bw()

This graph that’s appeared in several of the data dives was included again here because it illustrates the two different groups very well. The differences between patients with and without hypothyroidism in blood test results is obvious on this graph. Hypothyroidism patients have higher TSH test results and lower TT4 test results compared to those who do not.

Comparing Confounding Factors Between Patients With and Without Hypothyroidism

I am comparing prevalence of goiter and tumors to determine if these confounding health conditions (both are more prevalent in people with thyroid conditions- both hyperthyroidism and hypothyroidism, and there may be some people in the non-hypothyroidism group who have hyperthyroidism) are more prevalent in one group or another. Due to the vagueness of illness, the lack of examples of lithium, and the fact that pregnancy can cause hyperthyroidism-like symptoms, I am limiting it to just these two categories of confounding medical conditions.

I will make another data subset to use for this part of the final project:

part2<-hypothyroid |>
  select(hypothyroid, goitre,tumor)
part2$goitre<- as.integer(as.logical(part2$goitre))
part2$tumor<- as.integer(as.logical(part2$tumor))

Goiter

Goiters are enlargements of the thyroid or irregular lumps of cell growth on the thyroid. Some may be perfectly benign, while others may cause changes in the level of thyroid activity, resulting in either hypothryoidism or hyperthyoridism symptoms. As a result, it would be interesting to see if hypothyroidism patients tend to suffer from goiters more on average than individuals in the negative group.

breakdown<-part2|>
  group_by(hypothyroid,goitre)|>
  summarize(count=n())

## `summarise()` has grouped output by 'hypothyroid'. You can override using the
## `.groups` argument.

print(breakdown) #for determing the numbers for goiter counts for building the graph

## # A tibble: 4 × 3
## # Groups:   hypothyroid [2]
##   hypothyroid goitre count
##   <chr>        <int> <int>
## 1 hypothyroid      0   145
## 2 hypothyroid      1     6
## 3 negative         0  2919
## 4 negative         1    93

goitercounts<-c(6/151,93/3012)
goiterlabels<-c("Hypothyroidism","Negative")
df<-data.frame(name=goiterlabels,value=goitercounts)
ggplot(data=df,aes(x=goiterlabels,y=goitercounts))+geom_bar(stat="identity",fill="purple")+theme_bw()+labs(title="Proportion of Individuals with Goiter Based on Hypothyroidism Status",x="Hypothyroidism Status of the Patient",y="Proportion of Patients With Goiters")

There is a difference that can be visualized in the proportion of patients with goiter to patients without goiter between the groups of individuals with hypothyroidism and without hypothyroidism. However, the numerical difference is rather small, and it does not seem likely that this difference not due to random chance.

However, hypothesis testing will be conducted once again with a null hypothesis that there is no different in the proportion of patients with and without hypothyroidism who have goiter:

test3<-t.test(goitre ~ hypothyroid, data = part2)
test3

## 
##  Welch Two Sample t-test
## 
## data:  goitre by hypothyroid
## t = 0.54489, df = 161.94, p-value = 0.5866
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  -0.02324582  0.04096303
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                0.03973510                0.03087649

The p-value is .5866, which is fairly high and indicates there is a 58% chance that the differences in proportion of goiter between groups is due to random chance, and thus, no sufficient evidence exists that the null hypothesis can be rejected. This is reasonable considering goiters do not necessarily result in hypothyroidism-like symptoms.

Tumor

Tumors on the thyroid, like goiters, can affect the function of the thyroid. Again, the effect of tumors on the thyroid is highly variable, so it was worth considering if hypothyroidism patients were more or less likely to have tumors on their thyroids:

breakdown2<-part2|>
  group_by(hypothyroid,tumor)|>
  summarize(count=n())

## `summarise()` has grouped output by 'hypothyroid'. You can override using the
## `.groups` argument.

print(breakdown2)

## # A tibble: 3 × 3
## # Groups:   hypothyroid [2]
##   hypothyroid tumor count
##   <chr>       <int> <int>
## 1 hypothyroid     0   151
## 2 negative        0  2972
## 3 negative        1    40

tumorcounts<-c(0/151,40/3012)
tumorlabels<-c("Hypothyroidism","Negative")
df2<-data.frame(name=tumorlabels,value=tumorcounts)
ggplot(data=df2,aes(x=tumorlabels,y=tumorcounts))+geom_bar(stat="identity",fill="maroon")+theme_bw()+labs(title="Proportion of Individuals with Tumors Based on Hypothyroidism Status",x="Hypothyroidism Status of the Patient",y="Proportion of Patients With Tumors)")

Interestingly, no patients in the hypothyroidism group had tumors. This may be because hypothyroidism can also result from the removal of the thyroid due to tumors, so the patients in the hypothyroid group may have had tumors at one point, but they were already removed prior to data collection. There were some tumors in the negative group, though, which is interesting as well. Overall, this visualization seems to indicate that while there may be a difference between the proportion of tumors with patients between these two groups, it is likely due to random chance because the difference is so low.

However, hypothesis testing will be conducted once again with a null hypothesis that there is no different in the proportion of patients with and without hypothyroidism who have tumors:

test4<-t.test(tumor~ hypothyroid, data = part2)
test4

## 
##  Welch Two Sample t-test
## 
## data:  tumor by hypothyroid
## t = -6.3659, df = 3011, p-value = 2.237e-10
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  -0.017370622 -0.009189803
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                0.00000000                0.01328021

The p-value is 2.247e-10, which is surprisingly low. As a result, the null hypothesis that the proportion of patients with tumors between patients with and without hypothyroidism is not supported by the hypothesis testing. This is by far the most interesting difference found thus far: it seems likely that patients without hypothyroidism are more likely to have tumors than those with hypothyroidism, though I speculate that differences in the proportion of people who have had thyroid surgery between groups (which will be examined later on in this project) is why this difference exists.

Conclusions from Confounding Conditions

One of the important things to note about confounding health conditions is that they too are not particularly common, and so the total number of cases of any of the confounding conditions in this dataset is less than the total number of cases of hypothyroidism. A dataset more focused on representing people with these confounding health conditions would likely give a clearer picture of any differences in hypothyroidism status between individuals suffering from these conditions.

However, from what data that is available, the evidence supports there being a difference in the proportion of people who have tumors among those who do and do not have hypothyroidism and no such difference existing in the proportion of people who have goiters.

Comparing Medical Interventions Between Patients With and Without Hypothyroidism

I will look at two types of medical interventions for this part of the project: thyroxine medication and thyroid surgery. Thyroxine medication is taken by people who cannot produce enough thyroxine like people with hypothyroidism, which makes it important to study. Thyroid surgery is also important to study because complete removal of the thyroid essentially results in the individual developing hypothyroidism because the thyroid is not there to produce thyroid hormones, and differences in groups would be expected.

I will make a third data subset to use for this part of the final project:

part3<-hypothyroid |>
  select(hypothyroid, on_thyroxine,thyroid_surgery)
part3$on_thyroxine<- as.integer(as.logical(part3$on_thyroxine))
part3$thyroid_surgery<- as.integer(as.logical(part3$thyroid_surgery))

Thyroxine Medication

Thyroxine medication is essentially synthetic thyroxine, which replaces the thyroxine not being made or not being made in great enough quantity by the thyorid; the most commonly used is levothyroxine. This is a common treatment for those with hypothyroidism, so there should be a notable difference in use between groups.

breakdown3<-part3|>
  group_by(hypothyroid,on_thyroxine)|>
  summarize(count=n())

## `summarise()` has grouped output by 'hypothyroid'. You can override using the
## `.groups` argument.

print(breakdown3)

## # A tibble: 4 × 3
## # Groups:   hypothyroid [2]
##   hypothyroid on_thyroxine count
##   <chr>              <int> <int>
## 1 hypothyroid            0   137
## 2 hypothyroid            1    14
## 3 negative               0  2565
## 4 negative               1   447

thyroxinecounts<-c(14/151,447/3012)
thyroxinelabels<-c("Hypothyroidism","Negative")
df3<-data.frame(name=thyroxinelabels,value=thyroxinecounts)
ggplot(data=df3,aes(x=thyroxinelabels,y=thyroxinecounts))+geom_bar(stat="identity",fill="darkred")+theme_bw()+labs(title="Proportion of Individuals On Thyroxine Medication Based on Hypothyroidism Status",x="Hypothyroidism Status of the Patient",y="Proportion of Patients Taking Thyroxine Medication")

Interestingly, the proportion of patients on thyroxine medication is actually larger in the group without hypothyroidism than in the group with hypothyroidism, which is not what was expected. Perhaps there are individuals with low thyroxine levels in the negative group that have low thyroxine due to other confounding factors or are at sufficiently low thyroxine levels to be treated with thyroxine but not low enough levels to be diagnosed with hypothyroidism. This is one of the more sizable differences between proportions, so it seems plausible this difference may not be due to random chance.

However, hypothesis testing will be conducted once again with a null hypothesis that there is no different in the proportion of patients with and without hypothyroidism who take thyroxine medication:

test5<-t.test(on_thyroxine~ hypothyroid, data = part3)
test5

## 
##  Welch Two Sample t-test
## 
## data:  on_thyroxine by hypothyroid
## t = -2.2684, df = 173.25, p-value = 0.02454
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  -0.104149358 -0.007232927
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                0.09271523                0.14840637

The p-value is .02454, is fairly low at a 2.4% probability of being due to random chance. However, the p-value cutoff selected earlier was .01, which is lower than the p-value for this test. As a result, according to the set standards, there is not sufficient evidence to reject the null hypothesis that the proportion of patients on thyroxine does not differ between patients who have hypothyroidism and those who do not. This is an interesting result as well and less likely than tumor results. This alone could be worth a more in-depth analysis of why so many individuals in the negative group may be on thyroxine, but due to the time constraints of this project, it will not be analyzed further in this project.

Thyroid Surgery

One treatment of thyroid conditions can be thyroid surgery, where part or all of the thyroid is removed. In the case of complete thyroid removal, patients are left without the capability to create thyroid hormones, which essentially leaves them with hypothyroidism as mentioned above. As a result, it is expected that individuals with hypothyroidism are more likely to have had thyroid surgery than those without hypothyroidism.

breakdown4<-part3|>
  group_by(hypothyroid,thyroid_surgery)|>
  summarize(count=n())

## `summarise()` has grouped output by 'hypothyroid'. You can override using the
## `.groups` argument.

print(breakdown4)

## # A tibble: 4 × 3
## # Groups:   hypothyroid [2]
##   hypothyroid thyroid_surgery count
##   <chr>                 <int> <int>
## 1 hypothyroid               0   141
## 2 hypothyroid               1    10
## 3 negative                  0  2918
## 4 negative                  1    94

surgerycounts<-c(10/151,94/3012)
surgerylabels<-c("Hypothyroidism","Negative")
df4<-data.frame(name=surgerylabels,value=surgerycounts)
ggplot(data=df4,aes(x=surgerylabels,y=surgerycounts))+geom_bar(stat="identity",fill="darkorange")+theme_bw()+labs(title="Proportion of Individuals Who Had Thyroid Surgery By Disease State",x="Hypothyroidism Status of the Patient",y="Proportion of Patients Who Have Had Thyroid Surgery")

There is a notable difference between the two groups in the proportion of patients who have had thyroid surgery, and as expected, the proportion of patients who have had thyroid surgery is higher in individuals with hypothyroidism. However, these are small proportions being considered, and it is still reasonable that the observed difference could be due to random chance.

As a result, hypothesis testing will be conducted once again with a null hypothesis that there is no different in the proportion of patients with and without hypothyroidism who have had thyroid surgery:

test6<-t.test(thyroid_surgery~ hypothyroid, data = part3)
test6

## 
##  Welch Two Sample t-test
## 
## data:  thyroid_surgery by hypothyroid
## t = 1.704, df = 157.39, p-value = 0.09036
## alternative hypothesis: true difference in means between group hypothyroid and group negative is not equal to 0
## 95 percent confidence interval:
##  -0.005572758  0.075606091
## sample estimates:
## mean in group hypothyroid    mean in group negative 
##                0.06622517                0.03120850

The p-value is .09036, is fairly low with a 9% probability of the difference being due to random chance. Again, because of the p-value cutoff selected, this isn’t low enough to reject the null hypothesis that the differences in proportions of thyroid surgery observed between those with and without hypothyroidism were due to random chance. One reason why a difference between groups may not be as evident as it should be could be due to doctors’ reluctance to label individuals without thyroids as having hypothyroidism. Alternatively, not all thyroid surgery is complete removal of the thyroid, and some of the people in the negative group who had thyroid surgery may not have lost enough thyroid function to be classified as having hypothyroidism.

Conclusions from Medical Interventions

Interestingly, neither medical intervention was determined to be different enough to have evidence that the difference in proportion of that medical intervention in those with and without hypothyroidism was due to random chance, despite initially expecting those differences to be clear-cut. Some of this may be due to confounding conditions, reluctance to label patients without thyroids as having hypothyroidism, or differences in the extent of surgery or medication received. Like the observed difference in tumors, there could be room for further analysis to determine why differences do not exist here, but project limitations mean that this analysis will not be done here.

Building a Logistic Regression Model for Hypothyroidism

First, this will not be in the presentation for this project. This is mainly to see how good a model can be made from the 3 observed differences from the project.

part4<-hypothyroid |>
  select(hypothyroid,tumor,TSH,TT4) |>
  filter(is.na(TSH)==FALSE) |> #removing null TSH rows
  filter(is.na(TT4)==FALSE) |> #removing null TT4 rows
  filter(TSH<=150)|>
  filter(TSH>0) |> #to handle TSH outliers and prepare for taking the log of TSH
  mutate(hypothyroid= ifelse(hypothyroid=="hypothyroid",1,0))
part4$tumor<- as.integer(as.logical(part4$tumor))
part4

## # A tibble: 1,785 × 4
##    hypothyroid tumor   TSH   TT4
##          <dbl> <int> <dbl> <dbl>
##  1           1     0  30    15  
##  2           1     0 145    19  
##  3           1     0   7.3  57  
##  4           1     0 138    27  
##  5           1     0   7.7  54  
##  6           1     0  21    34  
##  7           1     0  92    39  
##  8           1     0  48     7.6
##  9           1     0  21    53  
## 10           1     0  36    38  
## # ℹ 1,775 more rows

Next, a logistic regression model will be built:

model <- glm(hypothyroid~tumor+log(TSH)+TT4, data = part4, family = binomial(link = 'logit')) #log(TSH) is used based on lack of normality of TSH as determined in past data dives
summary(model)

## 
## Call:
## glm(formula = hypothyroid ~ tumor + log(TSH) + TT4, family = binomial(link = "logit"), 
##     data = part4)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.430970   0.646959  -0.666    0.505    
## tumor       -12.237776 722.471373  -0.017    0.986    
## log(TSH)      1.214736   0.140309   8.658   <2e-16 ***
## TT4          -0.062536   0.007308  -8.557   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 956.64  on 1784  degrees of freedom
## Residual deviance: 294.38  on 1781  degrees of freedom
## AIC: 302.38
## 
## Number of Fisher Scoring iterations: 16

Interestingly, the tumor coefficient was not significant, suggesting a better model could be built using just TSH and log(TT4). Also, the intercept was not significant which is odd. However, because of the nature of logistic regression, we need to calculate e to the power of the coefficient to know what the coefficients represent:

tumorco<-exp(-12.237776)
TSHco<-exp(1.214736)
TT4co<-exp(-.062536)
intercept<-exp(0.430970)
paste("Tumor Coefficent:",round(tumorco,3),
      "Log(TSH) Coefficient:", round(TSHco,3),
      "TT4 Coefficient:",round(TT4co,3),
      "Intercept:", round(intercept,3))

## [1] "Tumor Coefficent: 0 Log(TSH) Coefficient: 3.369 TT4 Coefficient: 0.939 Intercept: 1.539"

From here, the coefficients can be better interpreted. Tumors result in a 100% decrease in the odds of having hypothyroidism according to this model. For every increase in 1 of the the log of TSH results increases, there is a 337% increase in the odds of having hypothyroidism. For every 1 nanomole per liter increase in TT4 test results, there is a 6.1% decrease in the odds of having hypothyroidism.

Overall, this model seems to indicate that the main difference between patients with hypothyroidism and without it are their TSH and TT4 test results, which proves why they are the gold standards of diagnosing hypothyroidism.

Overall Conclusions

From this analysis, there were 3 observed differences between individuals with and without hypothyroidism where the null hypothesis that the difference between the two groups was due to random chance was rejected. Individuals with hypothyroidism had higher TSH test results, lower TT4 test results, and a lower proportion of individuals with tumors compared to individuals without hypothyroidism. Building a logistic regression model from those three results (using log(TSH) in place of TSH due to the non-normality of TSH otherwise) seems to indicate that differences in proportions of tumors is also not not particularly notable either and that the two main differences between patients with and without hypothyroidism are their TSH and TT4 test results.

There is room for further analysis in why medical interventions like thyroxine medication and thyroid surgery were not notably different between groups and why tumors were present at a higher rate in patients without hypothyroidism, but this dataset is insufficient to analyze these differences thoroughly. Future studies using datasets more tailored to these questions might be interesting to further illuminate why these differences were not seen in this dataset and if they should have been observed at all.

Intro to Stats in R- Final Project

Teresa Ortyl

2023-11-27