Describe the population parameter of interest in words/context, your research question about its value including your initial conjecture of its value (that makes sense in the context), and whether you suspected (before you saw any data) the actual parameter value is higher or lower (or just different) than this conjectured value.
The population parameter of interest in this study is the average number of children someone born in one of the two years (275 and 277) tested has. The research question that I am asking is whether or not the year someone is born in (275 or 277) has any effect on the amount of kids they have. When I initially asked this question, I was predicting that the year someone was born in will have an effect on the number of children that they have. My alternative hypothesis in this study is saying that people born in the year 275 have more kids than people born in the year 277, I was predicting this to be true and that birth year does have an effect on the number of children someone has.
What were the observational units? How was the variable measured? Did anything go wrong during the course of the study? Note: You can never give me too much detail in this section! In particular, there should be enough information that someone else could replicate your study on their own based only on your description (and hopefully improve upon it based on your suggestions in section 4: Conclusion). You are describing your study protocol where someone else could mimic exactly the same study that you carried out. Discuss any potential sources of sampling or non-sampling errors. For example: What was the response rate? How often did you have to make repeat visits in order to obtain the observational units initially selected?
The observational units that I selected were 30 people born in the year 275 from the city of Pauma, located on the island of Bonne Santa on “the islands” applet. I choose 30 people born in the year 277 as well, also from the same city and island. I had a sample size of 60. The variables at hand here were, number of children someone has, which gives us a number, then we find an average among the 30 people. The other variable in this study was birth year, which was categorical in this case because I was only doing two years, 275 and 277. Nothing really went wrong throughout the course of this study / investigation, but I did receive some suprising results that differed from my predictions quite a bit. The way I gathered my observational units was pretty simple. I went to the city hall of Pauma and looked at their birth records. Then, I used a random number generator, after I assigned every born in the year 275 and 277 a number, to choose my random selected 60 people for my sample size. This gave me a completely random sample. This process was a little bit time consuming, but will result in more accurate results than if, say we did not randomly select our sample size.
In carrying out a test of significance and a confidence interval about your population parameter(s), make sure you:
- Define the population(s) and parameter(s) (again) in words
The population that I am doing my study on is the city of Pauma located on the island of Bonne Santa, on “The Islands” applet. I chose 60 people all from Pauma, Bonne Santa, 30 from the year 275 and 30 from the year 277, as my total sample size of 60 people. The variables that I am working with within this study are birth year (differentiate by the years 275 and 277) and number of children one of these people have. The parameter that I am searching for is the average number of kids one of these people born in these years have. I found it to be on average people born in the year 275 have 2.533 kids on average where as people born in 277 have 1.733 kids on average.
- State the null and alternative hypotheses in symbols and in words
Null : Birth year (275 and 277) has no effect on the amount of children someone has throughout their lifetime Null : \(H_{0} : \mu_{275} = \mu_{277}\)
Alternative : Birth year (275 and 277) does have an effect on the amount of children someone has throughout their lifetime Alternative : \(H_{A} : \mu_{275} > \mu_{277}\)
- State what a type I and a type II error would represent in this setting
a type one error means that we rejected a null hypothesis that was actually true and a type two error means that we chose to fail to reject the null hypothesis when we actually should have rejected.
type I error : In the context of this study a type I error would mean that we rejected a null hypothesis that was actually true, so it would mean that we said that birth year does have an effect on the amount of children someone has when in actuality it does not. type II error : In the context of this study a type II error would mean that we chose to fail to reject a null hypothesis when it was actually supposed to be rejected, so this means that we said we fail to reject the null hypothesis when in actuality we should have rejected the null.
- Discuss/justify whether or not your measurements can reasonably be considered a representative sample from the population(s) of interest
No, I don’t think my data and study can be generalized to a population larger than the city of Pauma. I think this because I only recorded data on one city, meaning that other cities could have differing numbers, but I can not confirm or deny that because my study was only on Pauma. I think that if I had chose multiple different cities on islands other than Bonne Santa we would be able to generalize my results further. Say, if I gathered 10 people from different cities all on one of the three islands on “The Islands”" applet, then repeated this process for all three islands we would end up with 60 people, just like in my study, but the people would be from all different islands and all different cities, instead of just Pauma, which would yield better and more accurate to the general population results.
- Use a theory-based approach and appropriate R code to
library(readr)
Stats_mini_project_2_data<-read_csv("~/MATH 247/Stats mini project 2 data - Sheet1.csv")
head(Stats_mini_project_2_data, n=2)
- Find an appropriate test statistic and comment on appropriate validity conditions
our test statistic is 1.655. We meet the validity condition of 10 success and 10 failures in each group of the quantitative variable, we also meet the condition of sample size of at least 20, and we have a normal distribution without any strong skew. We meet all of the validity conditions for this test on two means.
stat(t.test(`# Of Children` ~ Birthday, data = Stats_mini_project_2_data))
## t
## 1.654799
- Find the p-value corresponding to your alternative hypothesis and provide a one-sentence interpretation of the p-value in context (use the definition of the p-value: i.e. probability of observing … assuming … is true)
difference in means between years 275 and 277: 275 : average # of children = 2.533 277 : average # of children = 1.733 difference : 2.533 - 1.733 = 0.8
the p value we received below is 0.103. Definition of p-value: The probability of obtaining the observed results, assuming that the null hypothesis is true. In the context of this study this means that we have no evidence aganist the null hypothesis.
pval(t.test(`# Of Children` ~ Birthday, data = Stats_mini_project_2_data))
## p.value
## 0.103445
- Indicate what statistical decision this p-value leads you to draw about the null hypothesis
because we received a result of 0.103, we have no evidence against the null hypothesis, meaning that we should choose to fail to reject the null hypothesis, this means that we are accepting the null hypothesis as true.
- State your conclusion in the context of the problem
In the context of this problem, the conclusion we found is that there is no evidence that the year you are born in has any effect on the number of children one of these people have.
- Use R to find an appropriate confidence interval to describe the plausible values of your population parameter
the 95% confidence interval that we found has plausible values ranging from -0.168 to 1.768 for my population parameter.
confint(t.test(`# Of Children` ~ Birthday, data = Stats_mini_project_2_data))
- Interpret the confidence interval in the context of the problem. Make sure to also comment on whether zero is included in the confidence interval. Compare your conclusion to the conclusion in 5d
In the context of this problem, the confidence interval that we have means that we are 95% confident that the population parameter, means, will fall between the interval of -0.168 to 1.768. This confidence interval does include the value 0, which means that if we try to run this again, we are more than likely to find that there is no difference between the two groups (year 275 and 277) and the number of children people born in each of these years are having.
Summarize the results of your study (there will be some repetition, and you should cite your evidence). You should tell a story: What did you learn? Did the data behave as you expected? Pay particular attention to whether or not it is reasonable to generalize your sample to the larger population or process. Is there anything you would do differently next time? What similar questions might someone choose to investigate in the future to build on your results?
I started by asking the question of testing whether or not the birth year of a person has any effect on the number of children someone has. I was interest in this study because I find the relationship or correlation between the year and number of children to be really interesting. Before doing this study I knew that a given year, say 2022, does have an effect on the number of children are born, and it varies from year to year depending on real world events that are happening. For example, say a war happens and there are high death rates, we would expect to see that in the coming years birth rates will increase. From doing this study I learned that the year of someones birth does not have an effect on the amount of kids they have throughout their life. Before doing this test I was predicting that there would be some kind of effect, so to see the results I saw was kind of suprising and caught me a little off guard. When we look at my study specifically, I do not think this is a study that can be generalized further than the population that I did my study on. I only took people from one city on one island on “The Islands” applet. The city I chose was Pauma on the island of Bonne Santa. If I had chose people from all three of the islands on the applet, and chose people from all different cities, I would have been able to generalizes my results to “the islands” applet that we used. But because my data was collected so narrowly, I am unable to generalize these results any further than the sample size we used. In the future, if I were to redo this study I would do the investigation of year and birthrates. So, lets take the year 275 and 277 for example. I would calculate the amount of children born in each of these years, based on one single mom (so finding a average number of children) and testing to see if the number of kids born in one of these years is different. This would tell me whether or not the current year we are in has any effect on the amount of children born. I think this would be a pretty interesting study to conduct.
Presentation: Style, organization, layout, grammar, presentation of a written report, creativity. Make use to cite any work/studies you used to come up with your research question.
R code: Also, make sure that all relevant R code and output are in the body of your report.