View the interactive version of this document here


LATE POLICY


Late assignments will not be accepted for credit!

You can still receive the worked answers to use for study.

Partial credit will be given for attempts at working through problems,
so it’s best to always submit HW on-time, even if it’s wrong or incomplete!




CODE SHOWCASE

Algebra


PROBLEM
\(103^3\)


CODE

103^3
## [1] 1092727




Trigonometry


PROBLEM
Cosine of my age


CODE

myage <- difftime(Sys.Date(),"1995-09-17")
myage <- as.numeric(myage/365.25)
cos(myage)
## [1] -0.2766089




Word Length


PROBLEM
Count the number of letters in the longest town name in Wales:
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch


CODE

nchar("Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch") 
## [1] 58




CAR ANALYSIS

EXAMINING THE DATASET

Data summary
Name mpg
Number of rows 234
Number of columns 11
_______________________
Column type frequency:
character 6
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
manufacturer 0 1 4 10 0 15 0
model 0 1 2 22 0 38 0
trans 0 1 8 10 0 10 0
drv 0 1 1 1 0 3 0
fl 0 1 1 1 0 5 0
class 0 1 3 10 0 7 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
displ 0 1 3.47 1.29 1.6 2.4 3.3 4.6 7 ▇▆▆▃▁
year 0 1 2003.50 4.51 1999.0 1999.0 2003.5 2008.0 2008 ▇▁▁▁▇
cyl 0 1 5.89 1.61 4.0 4.0 6.0 8.0 8 ▇▁▇▁▇
cty 0 1 16.86 4.26 9.0 14.0 17.0 19.0 35 ▆▇▃▁▁
hwy 0 1 23.44 5.95 12.0 18.0 24.0 27.0 44 ▅▅▇▁▁




VISUALIZING THE DATA

From the summary statistcs above, we know there are 234 records in our dataset, but the scatter plot appears to be showing a lot less. This is because many points are overlapping.

There are many ways to adjust our visualization. Some simple approaches include: making the points partially transparent (alpha), slightly offsetting them (jitter), or changing the shapes (or some/all/none of the above).


DEFAULT


TRANSPARENCY

JITTER



SHAPES

!!

Note: It’s generally not recommend to use more than six shapes per graph because, as you can see, it becomes very difficult to differentiate. In fact, default ggplot2 will not allow it without a scale_shape_manual() override.




COMBINATION




QUESTIONS

Designing My Own Study

1. The object of observation: Car Manufacturers

2. The object of analysis: Car Manufacturers who released a new model every year between 1999 & 2008

3. The population: Car Manufacturers

4. List the available variables: city miles per gallon, highway miles per gallon.

5. Response variable: Highway miles per gallon (hwy)

6. What are you hoping to find out? For cars of the same class, do those produced by manufacturers specializing in 1 or 2 classes have better fuel efficiency than those made by manufacturers producing a variety of classes?




Average Year

What is the average manufacturing year of the car models in the data set? 2003.5





Engine Size & Highway MPG

## `geom_smooth()` using formula 'y ~ x'


The correlation coefficient is -0.7660 which indicates a relatively strong negative correlation, i.e. as the fuel displacement increases, the average highway mpg decreases. We can also see this on the graph.


This suggests that larger car engines have worse fuel efficiency on the highway.


ACKNOWLEDGEMENTS

Thank you to:
https://ggplot2.tidyverse.org/ for help with graphs
https://shiny.rstudio.com/ for help with interactive graph
https://rmarkdown.rstudio.com/articles_intro.html for help with markdown/html