Introduction

Describe the population parameter of interest in words/context, your research question about its value including your initial conjecture of its value (that makes sense in the context), and whether you suspected (before you saw any data) the actual parameter value is higher or lower (or just different) than this conjectured value.

This investigation aimed to determine whether people with certain blood types had higher or lower IQs than those with a different blood type. The population parameter of interest is the difference between the mean IQs of groups with differing blood types. For this population, there were three different recorded blood types: delta, lambda, and beta. In this study, I will be focusing only on the difference between the mean IQ of those with delta blood type and those with beta blood type, denoted by \(\mu_{\delta}-\mu_{\beta}\). As stated previously, this investigation aimed to determine whether there is a difference between \(\mu_{\delta}\), and \(\mu_{\beta}\). Before collecting or analyzing any data, I suspected that there was a slight difference between the two means, but not a substantial one. In real humans, some blood types are associated with lower risk for certain diseases, specifically strokes and heart attacks. I guessed that cognitive functioning could be another process affected by the different clotting processes in the human body. If the islanders truly mimicked humans, then this theoretical difference may be measurable in the population. However, as I am not a biologist and have no idea whether this theory has any scientific basis whatsoever, I suspected that the true value of the parameter was either zero or close to it; if it wasn’t zero, I suspected that it would be small enough that the difference could be explained by natural variations in the population.

Data Collection Methods

What were the observational units? How was the variable measured? Did anything go wrong during the course of the study? Note: You can never give me too much detail in this section! In particular, there should be enough information that someone else could replicate your study on their own based only on your description (and hopefully improve upon it based on your suggestions in section 4: Conclusion). You are describing your study protocol where someone else could mimic exactly the same study that you carried out. Discuss any potential sources of sampling or non-sampling errors. For example: What was the response rate? How often did you have to make repeat visits in order to obtain the observational units initially selected?

Did anything go wrong during the course of the study? Discuss any potential sources of sampling or non-sampling errors. For example: What was the response rate? How often did you have to make repeat visits in order to obtain the observational units initially selected?

The Data: https://docs.google.com/spreadsheets/d/1wwRJlgiHLewo01yoFWI587-XU99tU5VfyqXcZUJO28U/edit?usp=sharing

IQdata <- read_csv('~/Final Project Data - R Format (1).csv')
data(IQdata)
print(IQdata)
## # A tibble: 102 × 2
##       IQ BType 
##    <dbl> <chr> 
##  1   117 Lambda
##  2   118 Delta 
##  3   119 Lambda
##  4   108 Lambda
##  5   123 Delta 
##  6   145 Lambda
##  7   121 Lambda
##  8   114 Lambda
##  9   106 Beta  
## 10   115 Beta  
## # … with 92 more rows
reduced <- read_csv('~/Final Project Data - No Lambda.csv')
data(reduced)
print(reduced)
## # A tibble: 77 × 2
##       IQ BType
##    <dbl> <chr>
##  1   118 Delta
##  2   123 Delta
##  3   106 Beta 
##  4   115 Beta 
##  5   118 Beta 
##  6    93 Delta
##  7   104 Beta 
##  8   136 Delta
##  9   103 Beta 
## 10   111 Delta
## # … with 67 more rows

The observational units were the islanders that I collected data on. I determined that I would collect data on at least three individuals from each of the towns on the three islands in order to collect as diverse a sample as possible. I chose the individuals I would collect data on in each town using random.org’s random number generator to choose 6 numbers corresponding to house numbers in the town. I decided to restrict my sample to adults between the ages of 18 and 65. For each house, I recorded how many adults in the age range lived in each of the 6 houses and used the random number generator again to determine which adult I would attempt to collect data on. The inhabitants were listed on the information card for the house; I designated the top inhabitant as number 1, the subsequent inhabitant as number 2, and so on and so forth. Once I determined who I would collect data on, I would ask for their consent to participate in my study - if they declined, I would move on to the next individual. If they consented, I would first determine their blood type using the website’s task list. After their blood type was determined, I would administer an IQ test, again using the website’s task list. I would then record the individual’s blood type, IQ test, house number, how many individuals resided in the house, and whether the individual was the first, second, third, etc. inhabitant in the home. When I first started collecting data, I initially wanted to examine the relationship between resting heart rate and blood type. Unfortunately, I failed to realize that the task was not to measure the resting heart rate, rather the current heart rate. I had collected the data on my sample during the daytime/evening of a Saturday. I decided to change my second measurement from heart rate to IQ, and on Sunday morning, I collected the IQ data. Initially, I had not restricted my sample to adults of any specific age, so I had measurements taken from children and the elderly. I decided to restrict the sample to adults, as children and the elderly might skew the means of the samples due to differing cognitive abilities. Thus, I had to remove certain individuals from my sample that were not within the age range, as well as a few individuals that had withdrawn from the study overnight. I used my original method to choose new individuals, 29 of whom consented. A potential source of sampling error is the individuals that declined to participate - I am unsure if these individuals would make my sample less random or less reliable.

3. Descriptive Statistics (5 pts)

Numerical and visual summaries that are appropriate for your data.

For two categorical variables in your data set, describe them, and include a two-way table, and discuss any relevant proportions. Does there appear to be an association between the two variables?

For the quantitative response variable and binary categorical explanatory variable describe each of the variables, including a side-by-side boxplot, and discuss it. Does there appear to be an association between the two variables? If so, describe it. Also, use some summary statistics to compare the groups.

histogram(~ IQ | BType, data = IQdata, width = 9)

dotPlot(~ IQ | BType, data = IQdata, cex = 0.45, width = 9)

bwplot(BType ~ IQ, horizontal = TRUE, data = IQdata)

favstats(IQ ~ BType, data = IQdata)

The variables of interest were the IQ of the islanders and their blood type. The IQs were quantitative variables, and the blood types were categorical variables with three possible values: delta, lambda, and beta. For the purpose of this project, I will only be analyzing the differences between the group with delta and beta blood types. There does not appear to be any association at first glance, and the 5 number summaries of both the beta and delta blood types are similar. The five number summary for the group with beta blood type is 77, 88, 99, 111, 130, and the mean and standard deviation of the distribution are 100.49 and 14.54, respectively. The five number summary for the group with delta blood type is 73, 97, 107, 119, 136, and the mean and standard deviation of the distribution are 107.22 and 15.59.

Analysis of Results

In carrying out a test of significance and a confidence interval about your population parameter(s), make sure you:

  1. Define the population(s) and parameter(s) (again) in words

The population of interest is all adults between the ages of 18 and 65 on all three of the islands. The parameter of interest is the difference between the IQs of islanders with beta blood type and those with delta blood type; in symbols, this is \(\mu_{type \beta} - \mu_{type \delta}\).

  1. State the null and alternative hypotheses in symbols and in words

The null hypothesis is that there is no difference in the population means - that is, the average IQ of islanders with the beta blood type equals that of the islanders with the delta blood type. In symbols, this is \(H_0: \mu_{type \beta}-\mu_{type \delta} = 0\), which can also be represented as \(H_0: \mu_{type \beta}= \mu_{type \delta}\). The alternative hypothesis is that there is a difference in the mean IQ of islanders with the beta blood type and that with the delta blood type. In symbols, \(H_a: \mu_{type \beta}-\mu_{type \delta} \neq 0\), or \(H_a: \mu_{type \beta} \neq \mu_{type \delta}\).

  1. State what a type I and a type II error would represent in this setting

A type I error would be if I was to incorrectly reject the null hypothesis and state that there is evidence against the hypothesis that the two population means are different when they are not. In this case, I would incorrectly state that there is enough evidence to say that there is a difference between the IQ of individuals with beta blood type and those with delta blood type. A type II error, however, would fail to reject the null hypothesis when it is incorrect and there really is a a difference between two population means - in this case, this would happen if I was to state that there is not enough evidence to indicate that the difference between the two population means is not 0, when it actually was.

  1. Discuss/justify whether or not your measurements can reasonably be considered a representative sample from the population(s) of interest

My sampling method was random, as a random number generator determined who I would ask to participate in the study. I believe that this is as close to a random sample as is possible. I believe that my sample was a representative sample because I sampled people from all of the towns on every island. Despite having a random sample, there were some residents that declined to participate, which may or may not have affected the reliability / randomness of the sample.

  1. Use a theory-based approach and appropriate R code to
  1. Find an appropriate standardized statistic and comment on appropriate validity conditions

In order to use the theory-based approach, there need to be at least 20 observations in each group and the distributions of the groups cannot have a strong skew. The distribution of IQs of individuals with the beta blood type is a bit skewed, but not strongly skewed. Therefore, we can use the theory-based approach to find the standardized statistic.

stat(t.test(IQ ~ BType, data = reduced))
##         t 
## -1.951159

The standardized statistic has the value of -1.95. According to the guidelines in the textbook, if a standardized statistic is below the value of -1.5 or above the value of 1.5, there is moderate evidence against the null hypothesis. This means that there is evidence against the null hypothesis, but it is not very strong.

  1. Find the p-value corresponding to your alternative hypothesis and provide a one-sentence interpretation of the p-value in context (use the definition of the p-value: i.e. probability of observing … assuming … is true)
pval(t.test(IQ ~ BType, data = reduced))
##   p.value 
## 0.0549249

The p-value is the probability of observing a sample statistic as or more extreme than the one observed in this study. In this case, the p-value is 0.0549. According to the guidelines in the textbook, if a p-value is larger than 0.05 but smaller than 0.10, there is moderate evidence against the null hypothesis. This reinforces the result from the standardized statistic - we have again found that there is moderate evidence suggesting that the difference between the mean IQ of individuals with the beta blood type and those with the delta blood type is not zero.

  1. Indicate what statistical decision this p-value leads you to draw about the null hypothesis

As there is only moderate evidence against the null hypothesis, I would hesitate to reject the null hypothesis.

  1. State your conclusion in the context of the problem

The p-value and the standardized statistic both show that there is moderate evidence against the null hypothesis. I would hesitate to reject the null hypothesis for fear of making a type I error. I do not think the evidence is strong enough to reject the null hypothesis but I think that there is enough evidence to state that there likely is a difference in the population means, although it is probably quite small.

  1. Use R to find an appropriate confidence interval to describe the plausible values of your population parameter
confint(t.test(IQ ~ BType, data = reduced))
  1. Interpret the confidence interval in the context of the problem. Make sure to also comment on whether zero is included in the confidence interval. Compare your conclusion to the conclusion in 5d

The confidence interval spans from -13.615 to 0.145 - we are 95% sure that the true population parameter is contained in this interval. Because the interval contains zero, there is not enough evidence to reject the null hypothesis, which would mean that there is a difference in the mean IQ of the population with type beta blood and those with type delta blood.

Conclusion

Summarize the results of your study (there will be some repetition, and you should cite your evidence). You should tell a story: What did you learn? Did the data behave as you expected? Pay particular attention to whether or not it is reasonable to generalize your sample to the larger population or process. Is there anything you would do differently next time? What similar questions might someone choose to investigate in the future to build on your results?

Based on all three measures of evidence, we cannot reject the null hypothesis. The p-value has a value of 0.0549 and the standardized statistic has a value of -1.95, both of which provide moderate evidence against the null hypothesis. In order to avoid a type I error, we should not reject the null hypothesis. Additionally, the confidence interval includes 0, meaning that there is not enough evidence to reject the null hypothesis. Overall, the data did behave as expected; there was a slight variation in the mean IQs depending on blood type, but not one significant enough to provide enough evidence to reject the null hypothesis. If I was to redo this study, I would increase my sample size. I would also use an ANOVA test to determine the variances between all three of the groups as opposed to just comparing the mean IQs of the beta blood type group and the delta blood type group. Due to my random sampling technique, my sample can be generalized to the greater population of adults aged 18 to 65 inhabiting the islands. In order to further study the effects that blood type has on cognitive abilities, future participants should be required to complete more tests that measure cognitive functioning. Overall, even though I couldn’t reject my null hypothesis, I found the research itself fascinating.

Presentation: Style, organization, layout, grammar, presentation of a written report, creativity. Make use to cite any work/studies you used to come up with your research question.

R code: Also, make sure that all relevant R code and output are in the body of your report.