In this statistical analysis, I will be examining male and female daily drug and alcohol use from the past year. I will analyze the differences in male and female drug and alcohol use throughout the past year, and also the difference in drug and alcohol use related to age throughout the past year. Here are my findings:
The x-axis of this histogram represents the amount of days annually that both male and females consumed alcohol. The y-axis represents the count of male and females in the data that were involved in consuming alcohol throughout the past year. The histogram in all, represents the number of days that male and females of all ages consumed alcohol throughout the past year. Looking at the histogram, you can conclude that most people who drank last year drank less than twice a week. The mean number of days that people drank in the past year was 88. This mean shows that on average people drank about once a week, which supports my claim that male and females drank less than twice a week this past year. This histogram also excludes the people that have never drank before, and the people that did not drink in the past year. This furthers my point that male and females of all ages used alcohol less than twice a week last year.
The cases between when you would use a histogram and when you would use a frequency polygon do differ. A histogram is good for showing data collected on one variable, while the frequency polygon is good for showing the data collected on multiple variables. Taking our histogram from above for instance, if we wanted to differ between males and females and compare which one has consumed more alcohol daily throughout a year, we would not be able to do that with a histogram. Yet, if we used a frequency plot to see the difference between male and female daily alcohol consumption throughout a year we would be able to do that. A histogram is good at showing a basic one variable model, while a frequency plot is good at showing multiple variable models. They both are very good tools in analyzing data, but you have to use them in the right situations.
The larger binwidth on this frequency polygon is very beneficial in the visualization aspect of the frequency polygon compared to the one above. With the binwidth being bigger in this frequency polygon, you are able to see the slope of the line much easier than the one above. This helps to show that most males and females drank less than 100 days last year. With the larger binwidth, you get a better understanding of the slopes of the lines, and come to a much easier conclusion of drinking days compared to people.
This boxplot is showing the median drinking days of all ages in the data set. Those who are considered legal drinking age are 21+, and those that are considered illegal are below 21 years of age. This boxplot suggests that as age increases, the number of drinking days per person increases. There is a continual increase of drinking days from ages 12-13 to 35+. There is no evidence in this boxplot that would tell us why people below the age of 21 drink less than people above the age of 21, but the boxplot does suggest that as age increases, the amount of drinking days per person also increases.
From these boxplots, you can conclude that on average as you get older you drink more frequently, and also that males tend to drink more frequently than females. A tentative conclusion that you can make from this is that older males tend to drink more than older females, and also younger males tend to drink more than younger females. This is not a certain conclusion that you can conclude from these two graphs, but it is one that undoubtedly makes sense given the information that is given in these two boxplots.
## 12-13 14-15 16-17 18-20 21-25 26-34 35+
## 26.22134 25.27678 34.79498 55.50665 81.85258 85.31651 95.65255
##
## Welch Two Sample t-test
##
## data: IRALCFY by SEX
## t = 26.05, df = 32491, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 23.12881 26.89249
## sample estimates:
## mean in group Male mean in group Female
## 93.87654 68.86589
In this t-test, I see that the mean number of drinking days of male and females are different, and that the null hypothesis is going to be rejected based off of the p-value.
##
## Welch Two Sample t-test
##
## data: underage$IRALCFY and legal$IRALCFY
## t = -17.099, df = 6867.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.36643 -23.32543
## sample estimates:
## mean of x mean of y
## 55.50665 81.85258
In this t-test, I see that the mean number of drinking days for legal drinkers is higher than that of illegal drinkers. I also see that the null hypothesis is going to be rejected based off of the p-value.
We have about 5,500 data points in the female category of drinking days, and about 8,000 data points in the male category of drinking days. Looking at the frequency polygon above, we are able to assume that the t-test calculations are valid. The t-test shows that the mean number of drinking days for females is smaller than the mean number of drinking days for males, which is also shown in the frequency polygon.
Based on my visual inspection of the two age groups in the bar graph, I would expect ages 12-13 and 14-15 to have a probability value less than 0.05.
##
## Welch Two Sample t-test
##
## data: Test1$IRALCFY and Test2$IRALCFY
## t = 0.28141, df = 390.74, p-value = 0.7785
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.654680 7.543813
## sample estimates:
## mean of x mean of y
## 26.22134 25.27678
Based on my visual analyzation, age groups 12-13 and 14-15 seemed to have the closest medians which was correct. The age group 12-13 had a median of 26, while the age group 14-15 had a median of 25. The bar graph presents a great visual representation which helped me to analyze that the medians would be closest between age groups 12-13 and 14-15.