#DATA SET DESCRIPTION:
#The data set is gathered from the National Health Interview Survey in 1991.The data set contains information on all employees who either were or were not subject to a smoking ban, if these employees smoked or not and some other attributes pertaining to race (African American or Latino) and gender(Male or Female).
#DATA ATTRIBUTE DESCRIPTION:
# 1.smoker- Identifying whether an employee is a current smoker or not. It looks like the data set has 2,424 smokers and 7,578 non-smokers.
# 2.ban - shows whether there is a ban in a particular working area. It looks like the ban is implemented for 6,099 employees and not implemented for 3,903 employees.
# 3.age - How old the employees are. It looks like the age ranges from 18 to 88 years old.
# 4.education - The level of education these employees have. It can be either: high school,college, masters, some college, or high school dropout.
#5.afam - Is the employee African American? It looks like 769 employees are African American and 9,231 are not African American.
#6.Hispanic - Is the employee Hispanic? It looks like 1,134 employees are Hispanic and 8,867 are not Hispanic.
#7.Gender - There are 5,637 females. There are 4,362 males.
# The data set used
SmokeBan <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SmokeBan.csv", header = TRUE)
#DATA SET SUMMARY:
print("The summary of the data set is:")
## [1] "The summary of the data set is:"
summary(SmokeBan)
## X smoker ban age
## Min. : 1 Length:10000 Length:10000 Min. :18.00
## 1st Qu.: 2501 Class :character Class :character 1st Qu.:29.00
## Median : 5000 Mode :character Mode :character Median :37.00
## Mean : 5000 Mean :38.69
## 3rd Qu.: 7500 3rd Qu.:47.00
## Max. :10000 Max. :88.00
## education afam hispanic gender
## Length:10000 Length:10000 Length:10000 Length:10000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
#DATA SUBSET SUMMARY (OTHER RELEVANT STATISTICS)
# MEAN AND MEDIAN AGE OF ALL SMOKERS
print("The mean age of all smokers is:")
## [1] "The mean age of all smokers is:"
mean(SmokeBan[SmokeBan$smoker == 'yes', 'age'])
## [1] 37.96121
print("The median age of all smokers is:")
## [1] "The median age of all smokers is:"
median(SmokeBan[SmokeBan$smoker == 'yes', 'age'])
## [1] 36
# MEAN AND MEDIAN AGE OF ALL NON-SMOKERS
print("The mean age of all non-smokers is:")
## [1] "The mean age of all non-smokers is:"
mean(SmokeBan[SmokeBan$smoker == 'no', 'age'])
## [1] 38.92728
print("The median age of all non-smokers is:")
## [1] "The median age of all non-smokers is:"
median(SmokeBan[SmokeBan$smoker == 'no', 'age'])
## [1] 38
# MEAN AND MEDIAN AGE OF ALL MALE SMOKERS
print("The mean age of all smoking males is:")
## [1] "The mean age of all smoking males is:"
mean(SmokeBan[SmokeBan$gender == 'male' & SmokeBan$smoker == 'yes', 'age'])
## [1] 37.41637
print("The median age of all smoking males is:")
## [1] "The median age of all smoking males is:"
median(SmokeBan[SmokeBan$gender == 'male' & SmokeBan$smoker == 'yes', 'age'])
## [1] 36
# MEAN AND MEDIAN AGE OF ALL FEMALE SMOKERS
print("The mean age of all smoking females is:")
## [1] "The mean age of all smoking females is:"
mean(SmokeBan[SmokeBan$gender == 'female' & SmokeBan$smoker == 'yes', 'age'])
## [1] 38.43264
print("The median age of all smoking females is:")
## [1] "The median age of all smoking females is:"
median(SmokeBan[SmokeBan$gender == 'female' & SmokeBan$smoker == 'yes', 'age'])
## [1] 37
#DATA SET CONCLUSION:
# We can draw some important conclusions from the summary of both the set and subset. Firstly, we can see that the median age of all employees (37 years of age) is above the median age of all smokers (36 years of age) and below the median age of all non-smokers (38 years of age). This tells us that we should target more of the younger population, since that is the age group of the whole population who are smokers. This also holds true when looking at the mean. The mean age of all smokers is less than the mean of all non-smokers and thus the younger population should be target more with the ban. From the data of the subset we also see that there are older woman smokers (mean: 38.4 years old) than male smokers(mean: 37.4 years old).
#