Introduction The original post How to Make a Histogram with Basic R, which is posted on https://www.r-bloggers.com/how-to-make-a-histogram-with-basic-r/ , First, we download the dataset on https://s3.amazonaws.com/assets.datacamp.com/blog_assets/chol.txt?tap_a=5644-dce66f&tap_s=10907-287229 and open the file in R Studio. With the function hist(), we were able to display the dataset as a histogram of AirPassengers that slightly skewed to the right with the number of passengers on the x-axis and frequency on the y-axis. The second step, we specified the data by changing the label x-axis and y-axis, coloring the borders, filling the bin with green color. At the third step, we set the limit for both x-axis and y-axis and set the bins width. At the fourth step, we replaced the frequency on the y-axis to probability density in order to display the probability on the histogram. Last but not least, we performed a density curve on the histogram so the graph is more visualized by the first glance.

hist(AirPassengers, main = "Histgram for Air Passengers", xlab = "Passengers", border = "blue", col = "green", xlim = c(100, 700), las = 1, breaks = 5, probability = TRUE)
lines(density(AirPassengers))

Follow Up:

chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"), header = TRUE)
boxplot(chol$AGE~chol$SMOKE, col = cm.colors(3), border = "blue", breaks =5, ylim = c(10, 60), xlab = "Smoking Status", ylab = "Age")

boxplot(chol$WEIGHT~chol$SMOKE, col = cm.colors(3), border = "blue", breaks = 5, ylim = c(50, 150), xlab = "Smoking Status", ylab = "Weight")

We decided to expand on the data given from the article, and measure if age and weight correlated to smoking status. We made two different box plots, the first one measuring age and smoking status. The graph shows that pipe smokers have the lowest median age by a very slight margin, but the range of ages is also greatest for pipe smokers ranging from around 19-58. Non-smokers have a slightly higher mean age compared to pipe and cigarette smokers, but overall the range of ages is lower. We predicted that non-smokers would have the lowest median age, due to the newer trend in Public Health prevention for smoking. Next, we measured the correlation between weight and smoking status of the Air Passengers. The graph had similar results to age, as there was no statistical difference with weight and smoking status. Similarly, pipe smokers had the biggest range of weights again. Non-smokers had the smallest range of weights showing that they had a more normalized distribution of weight. Overall, there was no significant association between weight and smoking status, as well as age and smoking status in the data we examined. The original dataset provided by the article was from a sample of Air Passenger which could have led to a sampling bias in the findings we reproduced.