# MEANINGFUL QUESTIONS FOR ANALYSIS:

# 1.What is the Mean and Median age of all smokers?
# 2.How does the Mean and Median of all smokers compare to the Mean and Median of Female or Male smokers?
# 3.What is a useful subset of data I can create from my original data set?
# 4.What column can I add to my subset to give me a better idea of which employees really need the ban?
# 5.What is the correlation between age and gender when it comes to employees that smoke?
# 6.What is the relationship between age and years_smoked by gender?


# CONCLUSIION:

# By doing data exploration,I saw that the Mean and Median ages of the female employee population is a bit higher than that of the male population. I was curious as to the length of time the employees have been smoking for. To me this was important because the less an employee was smoking the more affective the ban will be. for employees,who smoked for many years, a less strict more gradual ban would -be needed (because it will be harder for them to quit cold turkey). Thus, I added the "years_smoking" field to my data set.Graphs have been very useful in giving me a better idea about the relationship between age,gender, years_smoked,and race. By creating a scatter plot, I learned that the majority of African American employees smoke within the age range of 25 to 45 years old. The bar plot graph, helped me understand that male employees started smoking at an earlier age and smoked longer than the female employees. From this data I realized that it is important to enforce a ban on the population aged from the mid 20's to the mid 40's.It is also important to understand that the ban needs to be enforced differently depending on the amount of time an employee has been smokng in order to improve efficacy.