View the interactive version of this document here
⛔
Late assignments will not be accepted for credit!
⚠
You can still receive the worked answers to use for
study.
✔
Partial credit will be given for attempts at
working through problems,
so it’s best to always submit HW on-time,
even if it’s wrong or incomplete!
PROBLEM
\(103^3\)
CODE
103^3## [1] 1092727
PROBLEM
Cosine of my age
CODE
myage <- difftime(Sys.Date(),"1995-09-17")
myage <- as.numeric(myage/365.25)
cos(myage)## [1] -0.2766089
PROBLEM
Count the number of letters in the longest town name
in Wales:
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
CODE
nchar("Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch") ## [1] 58
| Name | mpg |
| Number of rows | 234 |
| Number of columns | 11 |
| _______________________ | |
| Column type frequency: | |
| character | 6 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| manufacturer | 0 | 1 | 4 | 10 | 0 | 15 | 0 |
| model | 0 | 1 | 2 | 22 | 0 | 38 | 0 |
| trans | 0 | 1 | 8 | 10 | 0 | 10 | 0 |
| drv | 0 | 1 | 1 | 1 | 0 | 3 | 0 |
| fl | 0 | 1 | 1 | 1 | 0 | 5 | 0 |
| class | 0 | 1 | 3 | 10 | 0 | 7 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| displ | 0 | 1 | 3.47 | 1.29 | 1.6 | 2.4 | 3.3 | 4.6 | 7 | ▇▆▆▃▁ |
| year | 0 | 1 | 2003.50 | 4.51 | 1999.0 | 1999.0 | 2003.5 | 2008.0 | 2008 | ▇▁▁▁▇ |
| cyl | 0 | 1 | 5.89 | 1.61 | 4.0 | 4.0 | 6.0 | 8.0 | 8 | ▇▁▇▁▇ |
| cty | 0 | 1 | 16.86 | 4.26 | 9.0 | 14.0 | 17.0 | 19.0 | 35 | ▆▇▃▁▁ |
| hwy | 0 | 1 | 23.44 | 5.95 | 12.0 | 18.0 | 24.0 | 27.0 | 44 | ▅▅▇▁▁ |
From the summary statistcs above, we know there are 234 records in
our dataset, but the scatter plot appears to be showing a lot less. This
is because many points are overlapping.
There are many ways to
adjust our visualization. Some simple approaches include: making the
points partially transparent (alpha), slightly offsetting them
(jitter), or changing the shapes (or some/all/none of the
above).
!!
Note: It’s generally not recommend to use more than six shapes per graph because, as you can see, it becomes very difficult to differentiate. In fact, default ggplot2 will not allow it without a scale_shape_manual() override.
1. The object of observation: Car Manufacturers
2. The object of analysis: Car Manufacturers
who released a new model every year between 1999 & 2008
3. The population: Car Manufacturers
4. List the available variables: city miles per gallon,
highway miles per gallon.
5. Response
variable: Highway miles per gallon (hwy)
6.
What are you hoping to find out? For cars of the same class, do
those produced by manufacturers specializing in 1 or 2 classes have
better fuel efficiency than those made by manufacturers producing a
variety of classes?
What is the average manufacturing year of the car models in the data set? 2003.5
## `geom_smooth()` using formula 'y ~ x'
The correlation coefficient is -0.7660 which indicates a relatively
strong negative correlation, i.e. as the fuel displacement increases,
the average highway mpg decreases. We can also see this on the
graph.
This suggests that larger car engines have worse fuel
efficiency on the highway.
Thank you to:
https://ggplot2.tidyverse.org/ for help with graphs
https://shiny.rstudio.com/ for help with interactive
graph
https://rmarkdown.rstudio.com/articles_intro.html for
help with markdown/html