We will start by doing a little data cleaning to get the data in a form we’ll need it in for this example. You might not have seen these tools before. Don’t worry, you’re not expected to at this time, but you’ll have seen as least most of these by the end of the semester!
The bike_full data set has 8 columns:
## # A tibble: 62 × 8
## date month day trips distance move_time cumul_dist cumul_time
## <date> <ord> <int> <int> <dbl> <int> <dbl> <int>
## 1 2023-07-01 July 1 1 2.18 13 2.18 13
## 2 2023-07-02 July 2 2 7.36 46 9.54 59
## 3 2023-07-03 July 3 0 0 0 9.54 59
## 4 2023-07-04 July 4 1 11.2 67 20.7 126
## 5 2023-07-05 July 5 0 0 0 20.7 126
## 6 2023-07-06 July 6 0 0 0 20.7 126
## 7 2023-07-07 July 7 0 0 0 20.7 126
## 8 2023-07-08 July 8 0 0 0 20.7 126
## 9 2023-07-09 July 9 0 0 0 20.7 126
## 10 2023-07-10 July 10 0 0 0 20.7 126
## # ℹ 52 more rows
We want to make a graph to display the cumulative distance traveled that month by each day comparing July and August, like the graph seen in Brightspace
A line graph is a type a graph that uses a line to play “connect the dots” with the data points represented. It’s not required, but the x-axis on most line graphs represent time in some way.
Let’s create a blank graph below with day on the x-axis and cumul_dist on the y-axis. Have the x-axis label state “Day of the month” and the y-axis “Total distance traveled (mi)”. Uncomment the code below to change the tick marks on the x-axis.
Save the graph as gg_traveled
Now that we have a blank graph, how do we add a line or lines? Can we
just use geom_smooth()
?
Not quite. geom_smooth()
fits a smooth, trend line
across the graph. We want a geom that will connect the left-most dot
with the next left-most dot, and so on. So which geom should we use?
It’s not much of a surprise that we should use
geom_line()
! Add it to gg_traveled that was
created in the previous code chunk:
Uh, that’s not quite what we wanted. Why are there so many vertical lines?
The way geom_line()
works is it will connect the dots
from left to right. If there are 2 dots with the same x-value, it will
draw a vertical line to connect them!
But is that what is going on in our graph?
Instead of geom_line()
, add geom_point()
to
our graph
We get a better look at the dots being connected now! Since there is
a dot for each day in both July and August, by default
geom_line()
will connect the 1st of July with the 1st of
August, then repeat for each day. In the codechunk above, color the dots
by month to paint a clearer picture.
So how do we fix it?
From what we saw when we were looking at making bar charts,
ggplot()
will form groups in the data whenever an aesthetic
is mapped to a categorical variable. So how does that help us here?
geom_line()
will only play connect-the-dots with points
in the same group. So if we have ggplot()
form groups in the data, it will draw multiple lines using 1
geom_line()
function. We just need to map an appropriate
aesthetic to the column(s) that forms the groups!
Some choices are:
color
: the most popular aesthetic to uselinetype
: changes how the line is drawn - solid,
dashed, dotted, etc…group
: won’t the color or lineweight of the line is
drawn, but will draw a separate line for each group.
Let’s map color to month and see what happens!
Try changing color
to linetype
and see what
changes!
If we want our graph to look a little more similar to the one created
by Strava, let’s change the colors of the lines to black and dark orange
for July and August, respectively, using the correct
scale_{aesthetic}_{type}()
function.
One caution about working with lines is you want to make sure that
there aren’t too many groups, otherwise geom_line()
may not
work correctly!
Let’s look over the example below:
If I wanted all 31 days to appear on the x-axis? One quick way I
could do that is by mapping the x-axis to factor(day)
instead of day
:
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
Uh oh, what happened?
The warning message gives a hint- geom_line()
: each
group consists of only one observation. Do you need to adjust the group
aesthetic?
What that means is since both x and color are now mapped to
categorical columns, the groups are now formed for each combination of
month and day. And since each combination of month and day only has 1
row, geom_line()
doesn’t have 2 points it can play connect
the dots with! geom_point()
is unaffected since it just
places a dot at each (x,y) coord combo. But if there aren’t 2 points in
the same group, it can’t connect any of the dots!
So what could we do? We can use the group
aesthetic to
try and fix it!
Including group = month
either inside
ggplot()
or geom_line()
will override the
initial groups formed to only define groups based on month.