Select a column of your data that encodes time (e.g., “date”, “time stamp”, “year”, etc.). Convert this into a Date in R. - Released
bechdel_data_movies$released_as_date <- as.Date(bechdel_data_movies$released, format = "%d %b %Y")
view(bechdel_data_movies$released_as_date)Choose a column of data to analyze over time. This should be a “response-like” variable that is of particular interest. - binary
Create a tsibble object of just the date and response variable. Then, plot your data over time. Consider different windows of time.
bechdel_ts <- bechdel_data_movies |>
filter(!is.na(released_as_date)) |>
select(released_as_date, budget, imdb_id) |>
as_tsibble(key = imdb_id, index = released_as_date)
bechdel_ts
## # A tsibble: 1,591 x 3 [1D]
## # Key: imdb_id [1,591]
## released_as_date budget imdb_id
## <date> <dbl> <chr>
## 1 1970-06-17 1000000 0065466
## 2 1971-05-21 2500000 0067065
## 3 1971-10-09 2200000 0067116
## 4 1971-07-02 53012938 0067741
## 5 1971-12-29 25000000 0067800
## 6 1972-11-17 4000000 0068156
## 7 1972-03-24 7000000 0068646
## 8 1973-08-22 15700000 0068699
## 9 1979-10-01 12000 0069089
## 10 1973-08-11 777000 0069704
## # ℹ 1,581 more rows
budate_graph <- ggplot(bechdel_ts, aes(x = released_as_date, y = budget)) + geom_point() + geom_smooth(method = 'lm', color = 'red', se = FALSE) + labs (title = 'Movie Gross Unadjusted Budget by Release Date')
budate_graph
## `geom_smooth()` using formula = 'y ~ x'
What stands out immediately?
Use linear regression to detect any upwards or downwards trends.
The positive trend of roughly $1.3 million per year ($3,569.83 per day) likely reflects a combination of inflation, rising production standards, expensive technological advances, and the industry’s increasing emphasis on high‑budget films.
budget_model <- lm(budget ~ released_as_date, data = bechdel_ts)
budget_model$coefficients
## (Intercept) released_as_date
## 1940050.057 3569.834
budget_model2 <- loess(budget ~ as.numeric(released_as_date), data = bechdel_ts, span = .5, na.rm = TRUE)
budget_model2
## Call:
## loess(formula = budget ~ as.numeric(released_as_date), data = bechdel_ts,
## span = 0.5, na.rm = TRUE)
##
## Number of Observations: 1591
## Equivalent Number of Parameters: 6.99
## Residual Standard Error: 46640000
There is a modest upward trend.
Do you need to subset the data for multiple trends?
How strong are these trends?
Use smoothing to detect at least one season in your data, and interpret your results.
budate_graph2 <- ggplot(bechdel_ts, aes(x = released_as_date, y = budget)) + geom_point() + geom_smooth(method = 'loess', span = .75, se = FALSE) + labs (title = 'Movie Gross Unadjusted Budget by Release Date with Loess')
budate_graph2
## `geom_smooth()` using formula = 'y ~ x'
There doesn’t appear to be seasonality in this data, even when fitted with a loess curve with a high span. I expected to see some wiggle within years to account for things like summer blockbusters and holiday movies, but that does not appear to be the case. There is a steady rise until about 2008, where the rate of increase gets somewhat steeper.
Can you illustrate the seasonality using ACF or PACF?
acfbechdel <- acf(bechdel_ts$budget)
acfbechdel
##
## Autocorrelations of series 'bechdel_ts$budget', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## 1.000 0.085 0.054 0.043 0.086 0.062 0.057 0.040 0.072 0.096 0.036 0.050 0.031
## 13 14 15 16 17 18 19 20 21 22 23 24 25
## 0.031 0.048 0.100 0.043 0.059 0.041 0.068 0.043 0.057 0.063 0.073 0.090 0.053
## 26 27 28 29 30 31 32
## 0.074 0.047 0.055 0.058 0.050 0.027 0.063
pacfbechdel <- pacf(bechdel_ts$budget)
pacfbechdel
##
## Partial autocorrelations of series 'bechdel_ts$budget', by lag
##
## 1 2 3 4 5 6 7 8 9 10 11
## 0.085 0.047 0.035 0.078 0.046 0.040 0.023 0.055 0.075 0.009 0.029
## 12 13 14 15 16 17 18 19 20 21 22
## 0.006 0.005 0.027 0.078 0.014 0.033 0.012 0.039 0.012 0.030 0.037
## 23 24 25 26 27 28 29 30 31 32
## 0.037 0.054 0.017 0.039 0.012 0.018 0.024 0.008 -0.010 0.026