International Business Machines (IBM) is a global technology company whose stock is widely followed in equity markets.
This report analyzes IBM’s historical daily and monthly stock data, focusing on data quality, trends across different time scales, correlations, and forecasting of closing prices.
Objectives
Clean and prepare IBM stock data (monthly, daily, yearly) for analysis.
Explore trends in closing prices and trading volume across time scales.
Examine correlations among key variables and perform basic hypothesis testing.
Build ARIMA-based forecasts for daily, monthly, and yearly closing prices and interpret the results.
This section retrieves IBM monthly and daily data from Alpha Vantage, converts JSON to data frames, and persists them to MySQL.
## [1] TRUE
## [1] TRUE
## [1] TRUE
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
This step produces clean, typed monthly and daily IBM stock tables in MySQL and R, providing the basis for all subsequent analysis.
##Structural Summary and Missing Values This section inspects structure, summary statistics, and missing values for monthly and daily data.
## $timestamp
## [1] "POSIXct" "POSIXt"
##
## $open
## [1] "numeric"
##
## $high
## [1] "numeric"
##
## $low
## [1] "numeric"
##
## $close
## [1] "numeric"
##
## $volume
## [1] "numeric"
##
## timestamp open high low close volume
## 2025-11-25 2025-11-25 308.000 324.90 288.0700 304.45 82006551
## 2025-10-31 2025-10-31 280.200 319.35 263.5623 307.41 140510939
## 2025-09-30 2025-09-30 240.900 288.85 238.2500 282.16 110277863
## 2025-08-29 2025-08-29 251.405 255.00 233.3600 243.49 104957357
## 2025-07-31 2025-07-31 294.550 295.61 252.2200 253.15 109055173
## 2025-06-30 2025-06-30 257.850 296.16 257.2200 294.78 74395935
## timestamp open high low
## Min. :1999-12-31 00:00:00 Min. : 59.2 Min. : 74.2 Min. : 54.01
## 1st Qu.:2006-06-22 12:00:00 1st Qu.:103.9 1st Qu.:110.5 1st Qu.: 96.72
## Median :2012-12-15 12:00:00 Median :130.9 Median :137.2 Median :124.59
## Mean :2012-12-14 06:00:00 Mean :136.7 Mean :144.0 Mean :129.96
## 3rd Qu.:2019-06-07 00:00:00 3rd Qu.:161.8 3rd Qu.:166.3 3rd Qu.:157.31
## Max. :2025-11-25 00:00:00 Max. :308.0 Max. :324.9 Max. :288.07
## close volume
## Min. : 58.31 Min. : 53441600
## 1st Qu.:104.31 1st Qu.: 88677505
## Median :130.97 Median :110776200
## Mean :137.23 Mean :122389348
## 3rd Qu.:162.04 3rd Qu.:145822678
## Max. :307.41 Max. :314972000
## 'data.frame': 312 obs. of 6 variables:
## $ timestamp: POSIXct, format: "2025-11-25" "2025-10-31" ...
## $ open : num 308 280 241 251 295 ...
## $ high : num 325 319 289 255 296 ...
## $ low : num 288 264 238 233 252 ...
## $ close : num 304 307 282 243 253 ...
## $ volume : num 8.20e+07 1.41e+08 1.10e+08 1.05e+08 1.09e+08 ...
## NULL
## timestamp open high low close volume
## 0 0 0 0 0 0
## $timestamp
## [1] "POSIXct" "POSIXt"
##
## $open
## [1] "numeric"
##
## $high
## [1] "numeric"
##
## $low
## [1] "numeric"
##
## $close
## [1] "numeric"
##
## $volume
## [1] "numeric"
##
## timestamp open high low close volume
## 2025-11-25 2025-11-25 304.34 306.0000 297.06 304.45 2820230
## 2025-11-24 2025-11-24 299.18 307.1800 297.51 304.12 6050640
## 2025-11-21 2025-11-21 293.48 300.4800 291.89 297.44 5710903
## 2025-11-20 2025-11-20 294.64 300.7100 290.16 290.40 5597028
## 2025-11-19 2025-11-19 290.50 291.1099 288.07 288.53 3595912
## 2025-11-18 2025-11-18 297.00 297.0000 289.92 289.95 4861928
## timestamp open high low
## Min. :2025-07-08 00:00:00 Min. :236.2 Min. :238.0 Min. :233.4
## 1st Qu.:2025-08-11 18:00:00 1st Qu.:252.6 1st Qu.:256.6 1st Qu.:249.2
## Median :2025-09-16 12:00:00 Median :280.0 Median :284.0 Median :276.3
## Mean :2025-09-15 18:00:00 Mean :274.0 Mean :277.4 Mean :270.9
## 3rd Qu.:2025-10-21 06:00:00 3rd Qu.:290.7 3rd Qu.:293.5 3rd Qu.:287.9
## Max. :2025-11-25 00:00:00 Max. :319.9 Max. :324.9 Max. :314.5
## close volume
## Min. :234.8 Min. : 2719918
## 1st Qu.:252.9 1st Qu.: 3484176
## Median :281.0 Median : 4591995
## Mean :274.3 Mean : 5339362
## 3rd Qu.:289.6 3rd Qu.: 5848993
## Max. :315.0 Max. :22647720
## 'data.frame': 100 obs. of 6 variables:
## $ timestamp: POSIXct, format: "2025-11-25" "2025-11-24" ...
## $ open : num 304 299 293 295 290 ...
## $ high : num 306 307 300 301 291 ...
## $ low : num 297 298 292 290 288 ...
## $ close : num 304 304 297 290 289 ...
## $ volume : num 2820230 6050640 5710903 5597028 3595912 ...
## NULL
## timestamp open high low close volume
## 0 0 0 0 0 0
Both monthly and daily IBM data sets are structurally consistent, with numeric price and volume columns ready for transformation and modeling.
Outliers are common in financial series, so they are visualized and optionally smoothed.
## Column 'high': 11 outliers replaced with median (137.23)
## Column 'high': 0 outliers replaced with median (284.01)
## Column 'low': 4 outliers replaced with median (124.59)
## Column 'low': 0 outliers replaced with median (276.30)
## Column 'volume': 7 outliers replaced with median (110776200.00)
## Column 'volume': 8 outliers replaced with median (4591995.00)
Boxplots show potential outliers in price and volume series for monthly and daily data.
Outliers beyond interquartile thresholds are replaced with medians, reducing the impact of extreme values while preserving overall level.
##Conclusion
After outlier handling, IBM price and volume series are more robust for correlation analysis and ARIMA modeling.
Date-derived features are added for multi-scale trend analysis.
## timestamp open high low close volume date
## 2025-11-25 2025-11-25 304.34 306.0000 297.06 304.45 2820230 24
## 2025-11-24 2025-11-24 299.18 307.1800 297.51 304.12 6050640 23
## 2025-11-21 2025-11-21 293.48 300.4800 291.89 297.44 5710903 20
## 2025-11-20 2025-11-20 294.64 300.7100 290.16 290.40 5597028 19
## 2025-11-19 2025-11-19 290.50 291.1099 288.07 288.53 3595912 18
## 2025-11-18 2025-11-18 297.00 297.0000 289.92 289.95 4861928 17
## timestamp open high low close volume month
## 2025-11-25 2025-11-25 308.000 137.23 124.59 304.45 82006551 11
## 2025-10-31 2025-10-31 280.200 137.23 124.59 307.41 140510939 10
## 2025-09-30 2025-09-30 240.900 137.23 238.25 282.16 110277863 9
## 2025-08-29 2025-08-29 251.405 137.23 233.36 243.49 104957357 8
## 2025-07-31 2025-07-31 294.550 137.23 124.59 253.15 109055173 7
## 2025-06-30 2025-06-30 257.850 137.23 124.59 294.78 74395935 6
## timestamp open high low close volume year
## 2025-11-25 2025-11-25 308.000 137.23 124.59 304.45 82006551 2025
## 2025-10-31 2025-10-31 280.200 137.23 124.59 307.41 140510939 2025
## 2025-09-30 2025-09-30 240.900 137.23 238.25 282.16 110277863 2025
## 2025-08-29 2025-08-29 251.405 137.23 233.36 243.49 104957357 2025
## 2025-07-31 2025-07-31 294.550 137.23 124.59 253.15 109055173 2025
## 2025-06-30 2025-06-30 257.850 137.23 124.59 294.78 74395935 2025
Year, month, and day components are extracted from timestamps and attached to the data frames.
Separate df_daily, df_monthly, and df_yearly objects allow analysis at multiple temporal resolutions.
With explicit date features, the analysis can now describe IBM stock behavior across calendar cycles and long-term horizons.
##Monthly Closing Price Patterns
Monthly patterns are explored using average closing prices by calendar month.
## [1] "Maximum monthly closing value: 307.41"
## [1] "Minimum monthly closing value: 58.31"
## [1] "Correlation of month over close price is -0.00273706398157688"
The plot shows average closing prices for each calendar month, making seasonal tendencies easier to spot.
The Pearson correlation between numeric month and close gives a rough measure of whether prices trend up or down across the year.
IBM’s monthly averages reveal whether particular months tend to be stronger or weaker and whether there is a systematic seasonal drift in closing prices.
Average closing prices by weekday highlight short-horizon behavior.
## [1] "Minimum day index in sample: 1"
## [1] "Maximum day index in sample: 31"
## [1] "Correlation of day index over close price is 0.0191385339658155"
Daily-of-week averages provide insight into recurring weekly patterns in IBM closing prices, while the correlation shows whether prices drift across the month.
Yearly aggregates reveal long-term stock behavior.
## [1] "Minimum year in sample: 1999"
## [1] "Maximum year in sample: 2025"
## [1] "Correlation of year over close price is 0.644367950261839"
Yearly average closing prices summarize IBM’s performance over time, smoothing noise into a structural trend.
Correlation between year and close quantifies whether the long-term trend is generally upward or downward over the sample.
The yearly trend plot and correlation indicate IBM’s long-run direction, showing whether recent prices are high or low relative to history.
Features are scaled to support models and distance-based methods.
Scaled close and volume variables are ready for use in machine learning models or clustering that are sensitive to feature magnitudes.
Mean closing prices are compared across aggregation levels.
##
## Welch Two Sample t-test
##
## data: monthly_close and daily_close
## t = -39.778, df = 322.05, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -143.8417 -130.2840
## sample estimates:
## mean of x mean of y
## 137.2306 274.2935
##
## Welch Two Sample t-test
##
## data: yearly_close and monthly_close
## t = 0, df = 622, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.998683 6.998683
## sample estimates:
## mean of x mean of y
## 137.2306 137.2306
A t-test compares average monthly vs daily closing prices to see if the aggregation level changes the estimated mean.
A second t-test compares yearly vs monthly average closing prices to test for structural differences.
The p-values indicate whether differences in mean IBM closing prices across time scales are statistically significant or likely due to random variation.
Correlation matrices summarize relationships among prices, volume, and time indices.
## open high low close volume
## open 1.000000000 0.8213881617 0.90260324 0.975381271 -0.49839271
## high 0.821388162 1.0000000000 0.89526094 0.806482201 -0.51151716
## low 0.902603240 0.8952609435 1.00000000 0.903946116 -0.56323815
## close 0.975381271 0.8064822011 0.90394612 1.000000000 -0.50244028
## volume -0.498392708 -0.5115171615 -0.56323815 -0.502440279 1.00000000
## month 0.004574921 -0.0001538866 -0.02535645 -0.002737064 -0.09580112
## month
## open 0.0045749215
## high -0.0001538866
## low -0.0253564498
## close -0.0027370640
## volume -0.0958011160
## month 1.0000000000
## open high low close volume date
## open 1.0000000000 0.98722357 0.994394371 0.97883509 0.05291925 0.0009727259
## high 0.9872235733 1.00000000 0.989528537 0.99497227 0.07644769 0.0171448593
## low 0.9943943706 0.98952854 1.000000000 0.98688131 0.03069859 0.0051498119
## close 0.9788350917 0.99497227 0.986881307 1.00000000 0.04111387 0.0191385340
## volume 0.0529192547 0.07644769 0.030698594 0.04111387 1.00000000 0.0561938572
## date 0.0009727259 0.01714486 0.005149812 0.01913853 0.05619386 1.0000000000
## open high low close volume year
## open 1.0000000 0.8213882 0.9026032 0.9753813 -0.4983927 0.6405887
## high 0.8213882 1.0000000 0.8952609 0.8064822 -0.5115172 0.5638781
## low 0.9026032 0.8952609 1.0000000 0.9039461 -0.5632382 0.6155643
## close 0.9753813 0.8064822 0.9039461 1.0000000 -0.5024403 0.6443680
## volume -0.4983927 -0.5115172 -0.5632382 -0.5024403 1.0000000 -0.5417292
## year 0.6405887 0.5638781 0.6155643 0.6443680 -0.5417292 1.0000000
The correlation structure confirms tight links among price components and clarifies how volume interacts with price moves across different horizons.
ARIMA models capture temporal dependence and generate forecasts.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 4.333333 290.6610 284.1544 297.1677 280.7100 300.6121
## 4.366667 290.6255 281.4237 299.8272 276.5526 304.6983
## 4.400000 290.9692 279.6994 302.2390 273.7335 308.2049
## 4.433333 290.5794 277.5661 303.5926 270.6773 310.4814
## 4.466667 291.2774 276.7282 305.8267 269.0262 313.5286
## 4.500000 291.0759 275.1380 307.0138 266.7010 315.4508
## 4.533333 290.2896 273.0747 307.5045 263.9617 316.6175
ARIMA models for daily, monthly, and yearly closing price series use automatic order selection to match observed autocorrelation.
Forecast plots show point predictions and confidence intervals for IBM closing prices over short and medium horizons.
ARIMA forecasts provide a quantitative baseline for IBM’s future closing prices, illustrating expected trajectories and uncertainty bands while acknowledging that real-world events can shift prices outside model assumptions.
IBM’s stock series exhibit clear structure across monthly, daily, and yearly aggregations, with strong correlations among price fields and meaningful relationships to calendar time.
Outlier handling and feature scaling improve stability of estimates, t-tests highlight differences in mean closing prices across time scales, and ARIMA models offer baseline forecasts under historical dynamics.