#DATA SET DESCRIPTION:

#The data set is gathered from the National Health Interview Survey in 1991.The data set contains information on all employees who either were or were not subject to a smoking ban, if these employees smoked or not and some other attributes pertaining to race (African American or Latino) and gender(Male or Female).

#DATA ATTRIBUTE DESCRIPTION:

# 1.smoker- Identifying whether an employee is a current smoker or not. It looks like the data set has 2,424 smokers and 7,578    non-smokers.
# 2.ban - shows whether there is a ban in a particular working area. It looks like the ban is implemented for 6,099 employees and not  implemented for 3,903 employees.
# 3.age - How old the employees are. It looks like the age ranges from 18 to 88 years old.
# 4.education - The level of education these employees have. It can be either: high school,college, masters, some college, or high school dropout.
#5.afam - Is the employee African American? It looks like 769 employees are African American and 9,231 are not African American.
#6.Hispanic - Is the employee Hispanic? It looks like 1,134 employees are Hispanic and 8,867 are not Hispanic.
#7.Gender - There are 5,637 females. There are 4,362 males.

# The data set used
SmokeBan <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SmokeBan.csv", header = TRUE)

#DATA SET SUMMARY:

print("The summary of the data set is:")
## [1] "The summary of the data set is:"
summary(SmokeBan)
##        X            smoker              ban                 age       
##  Min.   :    1   Length:10000       Length:10000       Min.   :18.00  
##  1st Qu.: 2501   Class :character   Class :character   1st Qu.:29.00  
##  Median : 5000   Mode  :character   Mode  :character   Median :37.00  
##  Mean   : 5000                                         Mean   :38.69  
##  3rd Qu.: 7500                                         3rd Qu.:47.00  
##  Max.   :10000                                         Max.   :88.00  
##   education             afam             hispanic            gender         
##  Length:10000       Length:10000       Length:10000       Length:10000      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
#DATA SUBSET SUMMARY (OTHER RELEVANT STATISTICS)
 # MEAN AND MEDIAN AGE OF ALL SMOKERS
   print("The mean age of all smokers is:")
## [1] "The mean age of all smokers is:"
   mean(SmokeBan[SmokeBan$smoker == 'yes', 'age'])
## [1] 37.96121
   print("The median age of all smokers is:")
## [1] "The median age of all smokers is:"
   median(SmokeBan[SmokeBan$smoker == 'yes', 'age'])
## [1] 36
 # MEAN AND MEDIAN AGE OF ALL NON-SMOKERS
   print("The mean age of all non-smokers is:")
## [1] "The mean age of all non-smokers is:"
   mean(SmokeBan[SmokeBan$smoker == 'no', 'age'])
## [1] 38.92728
   print("The median age of all non-smokers is:")
## [1] "The median age of all non-smokers is:"
   median(SmokeBan[SmokeBan$smoker == 'no', 'age'])
## [1] 38
 # MEAN AND MEDIAN AGE OF ALL MALE SMOKERS
   print("The mean age of all smoking males is:")
## [1] "The mean age of all smoking males is:"
   mean(SmokeBan[SmokeBan$gender == 'male' & SmokeBan$smoker == 'yes', 'age'])
## [1] 37.41637
   print("The median age of all smoking males is:")
## [1] "The median age of all smoking males is:"
   median(SmokeBan[SmokeBan$gender == 'male' & SmokeBan$smoker == 'yes', 'age'])
## [1] 36
 # MEAN AND MEDIAN AGE OF ALL FEMALE SMOKERS
   print("The mean age of all smoking females is:")
## [1] "The mean age of all smoking females is:"
   mean(SmokeBan[SmokeBan$gender == 'female' & SmokeBan$smoker == 'yes', 'age'])
## [1] 38.43264
   print("The median age of all smoking females is:")
## [1] "The median age of all smoking females is:"
   median(SmokeBan[SmokeBan$gender == 'female' & SmokeBan$smoker == 'yes', 'age'])
## [1] 37
 #DATA SET CONCLUSION:
   
 # We can draw some important conclusions from the summary of both the set and subset. Firstly, we can see that the median age of all employees (37 years of age) is above the median age of all smokers (36 years of age) and below the median age of all non-smokers (38 years of age). This tells us  that we should target more of the younger population, since that is the age group of the whole population who are smokers. This also holds true when looking at the mean. The mean age of all smokers is less than the mean of all non-smokers and thus the younger population should be target more with the ban. From the data of the subset we also see that there are older woman smokers (mean: 38.4 years old) than male smokers(mean: 37.4 years old).

#