Setup

Part 1: Data

This Sample of data is retrieved using a special Questionnaire made by BRFSS. The BRFSS questionnaire consists of a core component and optional modules. Many questions are taken from established national surveys, such as the National Health Interview Survey or the National Health and Nutrition Examination Survey.

1.1 Sample Description

In a telephone survey such as the BRFSS, a sample record is one telephone number in the list of all telephone numbers the system randomly selects for dialing.

BRFSS divides telephone numbers into two groups, or strata, which are sampled separately.

The target population for cellular telephone samples in 2013 consists of persons residing in a private residence or college housing, who have a working cellular telephone, are aged 18 and older, and received 90 percent or more of their calls on cellular telephones.


Part 2: Research questions

I’d like to work in the economic-social context. Therefore we will be using the Module 19 which refers to Social Context. Section 8 which relates to Demographics.

Research quesion 1: Do Male makes more money than Females?

Research quesion 2: Which Income Range tends to drink more on average?

Research quesion 3: Over all Females drink more alcohol than Males? * * *

Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1: Because we have a huge data set and for just academic purpouse we will work with the State of California Let’s same each category of female and Male in different variables

Let’s check our proportion of Male and Females of this sub sample

## [1] 0.441917
## [1] 0.558083

Let’s plot the ranges of income in the male sample Before let’s manipulate a little bit the data i already played with the dataset a bit and found there are some missing values. Missing values are really annoying because they make or EDA too complex when applying functions and so on.

My approach is, from the cali_males dataset, i will create a new dataset with just the income2 variable with clean values, so i can keep working from there. We can use mutate to keep adding more variables.

now let’s plot this

Now let’s check for females

now let’s do a scatter plot to compare. Blue dots are Male. Pink dots are Female.

it Seems Females tend to better in california.

Research quesion 2: Moving on Question #2.

We have so many missing values, i propose, let’s use dplyr and do small dataframe and save it for later to plot it. We will do this for Females and Males

## # A tibble: 8 x 2
##   income_cat        mean_drinks
##   <fct>                   <dbl>
## 1 Less than $35,000        3.14
## 2 Less than $25,000        3.12
## 3 Less than $20,000        3.09
## 4 Less than $10,000        2.92
## 5 Less than $50,000        2.86
## 6 Less than $15,000        2.82
## 7 Less than $75,000        2.72
## 8 $75,000 or more          2.14

It seems on males that earn between $25,000 and $35,000 tend to drink more.

Let’s plot this.

Now let’s do it for Female

## # A tibble: 8 x 2
##   income_cat        mean_drinks
##   <fct>                   <dbl>
## 1 Less than $10,000        2.14
## 2 Less than $20,000        1.85
## 3 Less than $15,000        1.84
## 4 Less than $50,000        1.81
## 5 Less than $25,000        1.80
## 6 $75,000 or more          1.73
## 7 Less than $35,000        1.67
## 8 Less than $75,000        1.56

Female that makes less than $10,000 are more prone to drink more.

let’s plot this

Research quesion 3:

##   gender     mean
## 1 Female 1.734375
## 2   Male 2.567489

We can see that man on average drink more.