library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(pageviews)
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(tidyr)
library(imputeTS)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(ggplot2)
library(fable)
## Loading required package: fabletools
library(feasts)
laptop_prices <- read.csv("~/Documents/statistics(1)/annotated-laptop_prices_reverted.csv")
laptop_views <- article_pageviews(
  project = "en.wikipedia",
  article = "Laptop",
  start = as.Date("2020-01-01"),
  end = as.Date("2024-11-15"),
  user_type = "user",
  granularity = "daily"
)
monthly_views <- laptop_views %>%
  mutate(month = format(date, "%Y-%m")) %>%
  group_by(month) %>%
  summarise(total_views = sum(views, na.rm = TRUE)) %>%
  mutate(date = as.Date(paste0(month, "-01"))) %>%
  select(date, total_views)
laptop_prices_ts <- data.frame(
  date = monthly_views$date,
  price = laptop_prices$Price_euros[1:nrow(monthly_views)],
  views = monthly_views$total_views
)

laptop_tsibble <- as_tsibble(laptop_prices_ts, index = date) %>%
  fill_gaps()

Response Variable: Price_euros (laptop prices) was selected as the “response-like” variable. Additionally, views (Wikipedia page views) was included for comparative analysis.

laptop_tsibble <- laptop_tsibble %>%
  mutate(
    price = na_interpolation(price),
    views = na_interpolation(views)
  )

sum(is.na(laptop_tsibble$price))  
## [1] 0
sum(is.na(laptop_tsibble$views)) 
## [1] 0

A tsibble object was created with date as the index and both price and views as variables.

Plotting

#  Data Over Time
ggplot(laptop_tsibble, aes(x = date)) +
  geom_line(aes(y = price, color = "Price"), linetype = "dashed") +
  geom_line(aes(y = views / 1000, color = "Page Views (in 1000s)")) +
  scale_color_manual(values = c("Price" = "blue", "Page Views (in 1000s)" = "red")) +
  labs(title = "Laptop Prices and Wikipedia Page Views Over Time",
       x = "Date", y = "Price / Views") +
  theme_minimal()

By visualizing monthly data, you can zoom in on specific periods if needed.The plot reveals any spikes, dips, or consistent trends in prices or page views.

The large spike in page views at the start could correspond to a specific event driving traffic to the “Laptop” Wikipedia page. This may not correspond to changes in laptop prices.

The fluctuations in laptop prices suggest there may be a seasonality effect, likely linked to factors such as holiday sales or new product releases.

Linear regression models

# linear regression model for prices
price_model <- laptop_tsibble %>%
  model(price_trend = TSLM(price ~ trend()))

# linear regression model for views
views_model <- laptop_tsibble %>%
  model(views_trend = TSLM(views ~ trend()))

tidy(price_model)
## # A tibble: 2 × 6
##   .model      term        estimate std.error statistic  p.value
##   <chr>       <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 price_trend (Intercept) 1212.      21.9         55.4 0       
## 2 price_trend trend()       -0.346    0.0214     -16.1 7.93e-55
tidy(views_model)
## # A tibble: 2 × 6
##   .model      term        estimate std.error statistic  p.value
##   <chr>       <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 views_trend (Intercept)  343192.   21941.       15.6 9.54e-52
## 2 views_trend trend()        -254.      21.5     -11.8 5.51e-31

Intercept 335895.6517 provides a baseline for the trend analysis.

A negative coefficient -245.9801 suggests a downward trend in Wikipedia page views

A negative coefficient -0.3460886 suggests a downward trend in laptop prices.

The slope is relatively small compared to the intercept, which could indicate that the downward trend is subtle for views model.

Both the intercept (< 2e-16) and the trend coefficient (< 2e-16) are highly statistically significant, indicating strong evidence that the observed trend is real and not due to chance for price

Laptop prices are experiencing a statistically significant and consistent decline over time.

decomposition <- laptop_tsibble %>%
  model(STL(price ~ season(window = "periodic")))

# Plot the decomposition components
components(decomposition) %>%
  autoplot() +
  labs(title = "Seasonal Decomposition of Laptop Prices")

STL decomposition was applied to detect seasonality in price.

The Seasonal Decomposition of Laptop Prices plot displays the breakdown of the time series into four components: observed data (price), trend, seasonality (both yearly and weekly), and remainder (residuals).

Downward Trend: Laptop prices generally declined over the observed period, with a leveling off toward the end.

Annual Seasonality: There’s a clear yearly seasonal effect, suggesting that prices vary predictably within each year.

No Weekly Seasonality: The weekly seasonal component does not show a meaningful pattern, indicating that weekly cycles do not impact laptop prices.

Residuals: The random noise in the residuals suggests that most of the systematic variations have been captured.

ACF and PACF

# ACF for prices
acf(laptop_tsibble$price, main = "ACF of Laptop Prices")

ACF for Laptop prices pattern suggests that there may be a strong trend in the laptop prices over time. The gradual decay of the autocorrelations is common in time series data that contain a trend component.

#PACF for prices
pacf(laptop_tsibble$price, main = "PACF of Laptop Prices")

PACF for laptop prices there is significant first lag in the PACF, combined with the gradual decay in the ACF, supports the idea that this series may benefit from differencing to achieve stationarity

# ACF for views
acf(laptop_tsibble$views, main = "ACF of Page Views")

The ACF plot for Page Views exhibits a similar pattern to the ACF of laptop prices, with high autocorrelation at small lags and a gradual decline as the lag increases.

#PACF for views
pacf(laptop_tsibble$views, main = "PACF of Page Views")

The PACF plot for Page Views shows a strong spike at lag 1, with other lags close to zero and within the significance bounds.