First, you should upload the .Rmd file in Moodle to RStudio. Also be sure to upload 192.ics which contains some sample data. After you have both of those in, open the .Rmd file. Click the “Knit” button at the top of the window to put it all together and see if things work.
Once it’s working, you edit some of the commands or text in Rstudio, and it will pop up a window showing you what you’ve changed. You should keep doing this periodically as you make changes, just to see what’s happening and to make sure that what you’ve entered in is working the way you wanted. Eventually, you can even use this document to generate the write-up you will turn in.
This assignment involves the following three key steps:
.ics file..ics data with your partner.You will definitely encounter different types of problems during Steps 1 and 2 above. I therefore recommend you iterate through Steps 1 and 2 in their entirety early and often. That way you can identify and address any problems early on. For a full demonstration of Steps 1 and 2, watch this 6m56s YouTube screencast.
Here is a blog post written by a statistics student who completed a similar assignment. You will notice that her analysis is very in-depth, and that she analyzed her own data.
Don’t collect data on everything! That will eventually make you sick of it. Instead, remember what your question was ahead of time. What data do you need to answer that? If it turns out you need more than you can collect, it’s ok to narrow the scope of your question.
In class, I demonstrated how you can upload this file and the .ics file into RStudio. Once you do this, the code that appears here gets the calendar data into R to analyze. The lines with # briefly explain what’s happening. If there is more analysis or wrangling that you want to do, feel free to ask me how, and I can just give you the code you need. The OPTIONAL lines give instructions for two things you might want to do. You can include them by removing the # symbol. You may have to include a pipe %>% to chain things together if it doesn’t appear.
# DO THIS: Replace "192.ics" with the name of your calendar file here
calendar_data <- "192.ics" %>%
# Use ical package to import into R
ical_parse_df() %>%
# Convert to "tibble" data frame format
as_tibble() %>%
# Use lubridate package to wrangle dates, times, and timezones. For a list of
# timezones run the following command in R: OlsonNames()
mutate(
start_datetime = with_tz(start, tzone = "America/New_York"),
end_datetime = with_tz(end, tzone = "America/New_York"),
duration = end_datetime - start_datetime,
date = floor_date(start_datetime, unit = "day")
) %>%
# Convert calendar entry to all lowercase and rename:
mutate(activity = tolower(summary)) %>%
# Do data wrangling to compute number of minutes and hours per day
group_by(date, activity) %>%
summarize(duration = sum(duration) %>% as.numeric()) # %>%
# OPTIONAL: After looking at your data, change the time interval length units
# if necessary. The line below changes units of time from hours to minutes.
# mutate(hours = duration/60)# %>%
# OPTIONAL: Filter out rows to only include certain dates:
# filter("2019-09-01" <= date, date <= "2019-09-06")This line makes sure your data was uploaded by displaying the first few rows:
| date | activity | duration |
|---|---|---|
| 2019-09-02 | sleep | 480 |
| 2019-09-02 | study | 60 |
| 2019-09-03 | exercise | 60 |
| 2019-09-04 | sleep | 960 |
| 2019-09-04 | study | 180 |
| 2019-09-05 | sleep | 540 |
| 2019-09-06 | exercise | 30 |
| 2019-09-06 | study | 90 |
| 2019-09-07 | exercise | 30 |
| 2019-09-07 | sleep | 540 |
Using glimpse() from the dplyr package gives you an alternative look at your data. It also gives you the type of data each column is: <dttm> being date-time, <chr> being character (i.e. text), and <dbl> being double i.e. decimal numerical values.
## Rows: 10
## Columns: 3
## Groups: date [6]
## $ date <dttm> 2019-09-02, 2019-09-02, 2019-09-03, 2019-09-04, 2019-09-04,…
## $ activity <chr> "sleep", "study", "exercise", "sleep", "study", "sleep", "ex…
## $ duration <dbl> 480, 60, 60, 960, 180, 540, 30, 90, 30, 540
RStudio’s spreadsheet viewer lets you see all the data at once. In the top-right window labeled “Environment”, if you click “calendar_data” it will open up a spreadsheet. Note by setting eval=FALSE in this code chunk, R Markdown will not “evaluate” this code chunk and ignore it.
Here you can play around with the data as you like. Remember that when you want to see the result of your code, you can click “Knit” and RStudio will show you the output.
The space below is for you to make a rough draft of your write-up. You could write everything here, click the black downward-pointing arrow next to “Knit”, choose “Knit to PDF”, and submit that. Or you could write everything up in a Google Doc, copy and paste your images from this document, export as a PDF submit that.
Describe the question here.
Describe data visualization #1 (that you’ll create below) here:
# Write your code to create data visualization #1 here. Be sure to label your
# axes and include a title to explain your graphic.
ggplot(calendar_data, mapping=aes(x=date, y=duration, fill=activity))+geom_col()Describe data visualization #2 (that you’ll create below) here:
# Write your code to create data visualization #1 here. Be sure to label your
# axes and include a title to explain your graphic.
sleephours <- calendar_data %>% filter(activity=="sleep")
ggplot(sleephours, mapping=aes(x=date, y=duration))+geom_line() Notice that this line graph looks strange; it’s because we’re missing data for Sept. 3 and Sept. 6! There was no “sleep” event for ggplot to use. Keep this in mind as you collect data.
Describe how these visualizations answer the question here.