library(tidyverse)## Warning: package 'tidyverse' was built under R version 4.1.1
## Warning: package 'ggplot2' was built under R version 4.1.1
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.2
## Warning: package 'purrr' was built under R version 4.1.1
## Warning: package 'dplyr' was built under R version 4.1.1
## Warning: package 'forcats' was built under R version 4.1.1
The data used to answer the research questions.
load("brfss2013.RData")The data is acquired from the Behavioral Risk Factor Surveillance System (BRFSS). The data is collected from cellular interviews. The interviewees are selected at random. The data can be used to create a general inference about the general population of the United States of America because the data is from randomly selected targets.
Research question 1: Find the top 10 states with the highest average sleep time? This is to see if the sleeping pattern across the nation is uniform.
Research question 2: Is there a correlation between Weight and Height between different Genders? To test if there is correlation between height and weight between the different genders.
Research question 3: Does education category and income level have a relationship? To test if education level affect income.
Research question 1: Find the top 10 states with the highest average sleep time?
From the data, we are trying to figure out the average sleeping time of all the states in America and plot a bar chart of the Top 10 with the highest average.
data <- brfss2013 %>% select(X_state, sleptim1)
top_10_states <- data %>% filter(X_state != "0") %>% group_by(X_state) %>% summarise( mean_time = mean(sleptim1, na.rm = TRUE)) %>% arrange(desc(mean_time)) %>% slice(1:10)
top_10_states## # A tibble: 10 x 2
## X_state mean_time
## <fct> <dbl>
## 1 Wyoming 7.23
## 2 South Dakota 7.20
## 3 Colorado 7.18
## 4 Kansas 7.18
## 5 Oregon 7.16
## 6 Nebraska 7.15
## 7 Iowa 7.15
## 8 Montana 7.15
## 9 Mississippi 7.14
## 10 Missouri 7.14
ggplot(top_10_states, aes(x = reorder(X_state,desc(mean_time)), y = mean_time)) + geom_bar(stat = "identity", color = "black", fill = "red", alpha = 0.5) + labs(title = "Top 10 Average sleeping time per State", x = "State", y = "Average time")From the graph and also the numeric summary, the state with the highest average sleeping time is Wyoming. Also we notice that the average sleeping time for the Top 10 states is not so different, the bar graph look as though the peaks are at a similar height.
Research question 2: Is there a correlation between Weight and Height between different Genders?
From the data, we will try to see if there is a correlation between height and weight for the different genders. We will use the Wyoming state to answer the research question.
data2 <- brfss2013 %>% select(X_state, htm4, wtkg3, sex) %>% drop_na(htm4, wtkg3) %>% filter(X_state == "Wyoming")
ggplot(data2, aes(y = wtkg3, x = htm4, color = sex)) + geom_point() + facet_wrap(~sex) + geom_smooth(method = "lm") + labs (title = "Height and Weight of Males and Females in Wyoming", x = "Height in Meters", y = "Weight in Kilograms")## `geom_smooth()` using formula 'y ~ x'
There is a positive correlation between height and weight for either male and female. There is also the presents of outliers in the data.
Research question 3: Does education category and income level have a relationship?
To answer this question, we will plot a table to see how many of those who graduate from college or technical school are paid compared to those who never graduated from high school.
data3 <- brfss2013 %>% select (X_educag, X_incomg, sex)
data3 %>% group_by(X_educag, X_incomg) %>% drop_na(X_educag, X_incomg) %>% summarise(count = n())## `summarise()` has grouped output by 'X_educag'. You can override using the `.groups` argument.
## # A tibble: 20 x 3
## # Groups: X_educag [4]
## X_educag X_incomg count
## <fct> <fct> <int>
## 1 Did not graduate high school Less than $15,000 12771
## 2 Did not graduate high school $15,000 to less than $25,0~ 11207
## 3 Did not graduate high school $25,000 to less than $35,0~ 4055
## 4 Did not graduate high school $35,000 to less than $50,0~ 2551
## 5 Did not graduate high school $50,000 or more 2647
## 6 Graduated high school Less than $15,000 20250
## 7 Graduated high school $15,000 to less than $25,0~ 31512
## 8 Graduated high school $25,000 to less than $35,0~ 18153
## 9 Graduated high school $35,000 to less than $50,0~ 18856
## 10 Graduated high school $50,000 or more 29887
## 11 Attended college or technical school Less than $15,000 13191
## 12 Attended college or technical school $15,000 to less than $25,0~ 22116
## 13 Attended college or technical school $25,000 to less than $35,0~ 15396
## 14 Attended college or technical school $35,000 to less than $50,0~ 19798
## 15 Attended college or technical school $50,000 or more 45516
## 16 Graduated from college or technical school Less than $15,000 5910
## 17 Graduated from college or technical school $15,000 to less than $25,0~ 11619
## 18 Graduated from college or technical school $25,000 to less than $35,0~ 11191
## 19 Graduated from college or technical school $35,000 to less than $50,0~ 20249
## 20 Graduated from college or technical school $50,000 or more 102944
From the above, the highest paid ($50,000 and more) are those who have graduated from college and have the least who earn less than 15,000 dollars. Those who did not graduate high school still have people who earn over 50,000 dollars but are less abundant than all other education category. There seems to be a relationship of income and education level.