Introduction

International Business Machines (IBM) is a global technology company whose stock is widely followed in equity markets.

This report analyzes IBM’s historical daily and monthly stock data, focusing on data quality, trends across different time scales, correlations, and forecasting of closing prices.

Objectives

  • Clean and prepare IBM stock data (monthly, daily, yearly) for analysis.

  • Explore trends in closing prices and trading volume across time scales.

  • Examine correlations among key variables and perform basic hypothesis testing.

  • Build ARIMA-based forecasts for daily, monthly, and yearly closing prices and interpret the results.

Data Acquisition and Storage

API Fetching and MySQL Ingestion

This section retrieves IBM monthly and daily data from Alpha Vantage, converts JSON to data frames, and persists them to MySQL.

## [1] TRUE
## [1] TRUE
## [1] TRUE
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Description

  • The code calls Alpha Vantage’s monthly and daily time series endpoints for IBM and converts JSON responses into tidy R data frames.
  • The data frames are saved into MySQL tables, creating a local store for repeated analysis and future extensions.

Conclusion

This step produces clean, typed monthly and daily IBM stock tables in MySQL and R, providing the basis for all subsequent analysis.

Data Structure and Preprocessing

##Structural Summary and Missing Values This section inspects structure, summary statistics, and missing values for monthly and daily data.

## $timestamp
## [1] "POSIXct" "POSIXt" 
## 
## $open
## [1] "numeric"
## 
## $high
## [1] "numeric"
## 
## $low
## [1] "numeric"
## 
## $close
## [1] "numeric"
## 
## $volume
## [1] "numeric"
## 
##             timestamp    open   high      low  close    volume
## 2025-11-25 2025-11-25 308.000 324.90 288.0700 304.45  82006551
## 2025-10-31 2025-10-31 280.200 319.35 263.5623 307.41 140510939
## 2025-09-30 2025-09-30 240.900 288.85 238.2500 282.16 110277863
## 2025-08-29 2025-08-29 251.405 255.00 233.3600 243.49 104957357
## 2025-07-31 2025-07-31 294.550 295.61 252.2200 253.15 109055173
## 2025-06-30 2025-06-30 257.850 296.16 257.2200 294.78  74395935
##    timestamp                        open            high            low        
##  Min.   :1999-12-31 00:00:00   Min.   : 59.2   Min.   : 74.2   Min.   : 54.01  
##  1st Qu.:2006-06-22 12:00:00   1st Qu.:103.9   1st Qu.:110.5   1st Qu.: 96.72  
##  Median :2012-12-15 12:00:00   Median :130.9   Median :137.2   Median :124.59  
##  Mean   :2012-12-14 06:00:00   Mean   :136.7   Mean   :144.0   Mean   :129.96  
##  3rd Qu.:2019-06-07 00:00:00   3rd Qu.:161.8   3rd Qu.:166.3   3rd Qu.:157.31  
##  Max.   :2025-11-25 00:00:00   Max.   :308.0   Max.   :324.9   Max.   :288.07  
##      close            volume         
##  Min.   : 58.31   Min.   : 53441600  
##  1st Qu.:104.31   1st Qu.: 88677505  
##  Median :130.97   Median :110776200  
##  Mean   :137.23   Mean   :122389348  
##  3rd Qu.:162.04   3rd Qu.:145822678  
##  Max.   :307.41   Max.   :314972000  
## 'data.frame':    312 obs. of  6 variables:
##  $ timestamp: POSIXct, format: "2025-11-25" "2025-10-31" ...
##  $ open     : num  308 280 241 251 295 ...
##  $ high     : num  325 319 289 255 296 ...
##  $ low      : num  288 264 238 233 252 ...
##  $ close    : num  304 307 282 243 253 ...
##  $ volume   : num  8.20e+07 1.41e+08 1.10e+08 1.05e+08 1.09e+08 ...
## NULL
## timestamp      open      high       low     close    volume 
##         0         0         0         0         0         0
## $timestamp
## [1] "POSIXct" "POSIXt" 
## 
## $open
## [1] "numeric"
## 
## $high
## [1] "numeric"
## 
## $low
## [1] "numeric"
## 
## $close
## [1] "numeric"
## 
## $volume
## [1] "numeric"
## 
##             timestamp   open     high    low  close  volume
## 2025-11-25 2025-11-25 304.34 306.0000 297.06 304.45 2820230
## 2025-11-24 2025-11-24 299.18 307.1800 297.51 304.12 6050640
## 2025-11-21 2025-11-21 293.48 300.4800 291.89 297.44 5710903
## 2025-11-20 2025-11-20 294.64 300.7100 290.16 290.40 5597028
## 2025-11-19 2025-11-19 290.50 291.1099 288.07 288.53 3595912
## 2025-11-18 2025-11-18 297.00 297.0000 289.92 289.95 4861928
##    timestamp                        open            high            low       
##  Min.   :2025-07-08 00:00:00   Min.   :236.2   Min.   :238.0   Min.   :233.4  
##  1st Qu.:2025-08-11 18:00:00   1st Qu.:252.6   1st Qu.:256.6   1st Qu.:249.2  
##  Median :2025-09-16 12:00:00   Median :280.0   Median :284.0   Median :276.3  
##  Mean   :2025-09-15 18:00:00   Mean   :274.0   Mean   :277.4   Mean   :270.9  
##  3rd Qu.:2025-10-21 06:00:00   3rd Qu.:290.7   3rd Qu.:293.5   3rd Qu.:287.9  
##  Max.   :2025-11-25 00:00:00   Max.   :319.9   Max.   :324.9   Max.   :314.5  
##      close           volume        
##  Min.   :234.8   Min.   : 2719918  
##  1st Qu.:252.9   1st Qu.: 3484176  
##  Median :281.0   Median : 4591995  
##  Mean   :274.3   Mean   : 5339362  
##  3rd Qu.:289.6   3rd Qu.: 5848993  
##  Max.   :315.0   Max.   :22647720  
## 'data.frame':    100 obs. of  6 variables:
##  $ timestamp: POSIXct, format: "2025-11-25" "2025-11-24" ...
##  $ open     : num  304 299 293 295 290 ...
##  $ high     : num  306 307 300 301 291 ...
##  $ low      : num  297 298 292 290 288 ...
##  $ close    : num  304 304 297 290 289 ...
##  $ volume   : num  2820230 6050640 5710903 5597028 3595912 ...
## NULL
## timestamp      open      high       low     close    volume 
##         0         0         0         0         0         0

Description

  • The helper function reports data types, previews rows, descriptive statistics, and counts of missing values for each column.
  • This confirms that numeric fields are correctly typed and any data gaps are visible before modeling.

Conclusion

Both monthly and daily IBM data sets are structurally consistent, with numeric price and volume columns ready for transformation and modeling.

Outliers: Detection and Treatment

Outliers are common in financial series, so they are visualized and optionally smoothed.

## Column 'high': 11 outliers replaced with median (137.23)
## Column 'high': 0 outliers replaced with median (284.01)
## Column 'low': 4 outliers replaced with median (124.59)
## Column 'low': 0 outliers replaced with median (276.30)
## Column 'volume': 7 outliers replaced with median (110776200.00)
## Column 'volume': 8 outliers replaced with median (4591995.00)

Description

  • Boxplots show potential outliers in price and volume series for monthly and daily data.

  • Outliers beyond interquartile thresholds are replaced with medians, reducing the impact of extreme values while preserving overall level.

##Conclusion

After outlier handling, IBM price and volume series are more robust for correlation analysis and ARIMA modeling.

Date Features: Year, Month, Day

Date-derived features are added for multi-scale trend analysis.

##             timestamp   open     high    low  close  volume date
## 2025-11-25 2025-11-25 304.34 306.0000 297.06 304.45 2820230   24
## 2025-11-24 2025-11-24 299.18 307.1800 297.51 304.12 6050640   23
## 2025-11-21 2025-11-21 293.48 300.4800 291.89 297.44 5710903   20
## 2025-11-20 2025-11-20 294.64 300.7100 290.16 290.40 5597028   19
## 2025-11-19 2025-11-19 290.50 291.1099 288.07 288.53 3595912   18
## 2025-11-18 2025-11-18 297.00 297.0000 289.92 289.95 4861928   17
##             timestamp    open   high    low  close    volume month
## 2025-11-25 2025-11-25 308.000 137.23 124.59 304.45  82006551    11
## 2025-10-31 2025-10-31 280.200 137.23 124.59 307.41 140510939    10
## 2025-09-30 2025-09-30 240.900 137.23 238.25 282.16 110277863     9
## 2025-08-29 2025-08-29 251.405 137.23 233.36 243.49 104957357     8
## 2025-07-31 2025-07-31 294.550 137.23 124.59 253.15 109055173     7
## 2025-06-30 2025-06-30 257.850 137.23 124.59 294.78  74395935     6
##             timestamp    open   high    low  close    volume year
## 2025-11-25 2025-11-25 308.000 137.23 124.59 304.45  82006551 2025
## 2025-10-31 2025-10-31 280.200 137.23 124.59 307.41 140510939 2025
## 2025-09-30 2025-09-30 240.900 137.23 238.25 282.16 110277863 2025
## 2025-08-29 2025-08-29 251.405 137.23 233.36 243.49 104957357 2025
## 2025-07-31 2025-07-31 294.550 137.23 124.59 253.15 109055173 2025
## 2025-06-30 2025-06-30 257.850 137.23 124.59 294.78  74395935 2025

Description

  • Year, month, and day components are extracted from timestamps and attached to the data frames.

  • Separate df_daily, df_monthly, and df_yearly objects allow analysis at multiple temporal resolutions.

Conclusion

With explicit date features, the analysis can now describe IBM stock behavior across calendar cycles and long-term horizons.

Trend Analysis

##Monthly Closing Price Patterns

Monthly patterns are explored using average closing prices by calendar month.

## [1] "Maximum monthly closing value: 307.41"
## [1] "Minimum monthly closing value: 58.31"

## [1] "Correlation of month over close price is -0.00273706398157688"

Description

  • The plot shows average closing prices for each calendar month, making seasonal tendencies easier to spot.

  • The Pearson correlation between numeric month and close gives a rough measure of whether prices trend up or down across the year.

Conclusion

IBM’s monthly averages reveal whether particular months tend to be stronger or weaker and whether there is a systematic seasonal drift in closing prices.

Daily Trend by Day of Week

Average closing prices by weekday highlight short-horizon behavior.

## [1] "Minimum day index in sample: 1"
## [1] "Maximum day index in sample: 31"

## [1] "Correlation of day index over close price is 0.0191385339658155"

Description

  • The weekday plot shows whether IBM tends to close higher or lower on certain days, which may relate to news flow and trading patterns.
  • The correlation of calendar day-of-month with close provides a coarse check for within-month drift.

Conclusion

Daily-of-week averages provide insight into recurring weekly patterns in IBM closing prices, while the correlation shows whether prices drift across the month.

Yearly Trend in Closing Prices

Yearly aggregates reveal long-term stock behavior.

## [1] "Minimum year in sample: 1999"
## [1] "Maximum year in sample: 2025"

## [1] "Correlation of year over close price is 0.644367950261839"

Description

  • Yearly average closing prices summarize IBM’s performance over time, smoothing noise into a structural trend.

  • Correlation between year and close quantifies whether the long-term trend is generally upward or downward over the sample.

Conclusion

The yearly trend plot and correlation indicate IBM’s long-run direction, showing whether recent prices are high or low relative to history.

Feature Scaling and Hypothesis Testing

Feature Scaling for Modeling

Features are scaled to support models and distance-based methods.

Description

  • Closing prices are standardized (zero mean, unit variance), and volume is rescaled to.
  • This makes features comparable in scale and prevents large-magnitude variables from dominating models.

Conclusion

Scaled close and volume variables are ready for use in machine learning models or clustering that are sensitive to feature magnitudes.

Comparing Price Levels: t-Tests

Mean closing prices are compared across aggregation levels.

## 
##  Welch Two Sample t-test
## 
## data:  monthly_close and daily_close
## t = -39.778, df = 322.05, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -143.8417 -130.2840
## sample estimates:
## mean of x mean of y 
##  137.2306  274.2935
## 
##  Welch Two Sample t-test
## 
## data:  yearly_close and monthly_close
## t = 0, df = 622, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.998683  6.998683
## sample estimates:
## mean of x mean of y 
##  137.2306  137.2306

Description

  • A t-test compares average monthly vs daily closing prices to see if the aggregation level changes the estimated mean.

  • A second t-test compares yearly vs monthly average closing prices to test for structural differences.

Conclusion

The p-values indicate whether differences in mean IBM closing prices across time scales are statistically significant or likely due to random variation.

Correlation Analysis

Correlation Matrices and Heatmaps

Correlation matrices summarize relationships among prices, volume, and time indices.

##                open          high         low        close      volume
## open    1.000000000  0.8213881617  0.90260324  0.975381271 -0.49839271
## high    0.821388162  1.0000000000  0.89526094  0.806482201 -0.51151716
## low     0.902603240  0.8952609435  1.00000000  0.903946116 -0.56323815
## close   0.975381271  0.8064822011  0.90394612  1.000000000 -0.50244028
## volume -0.498392708 -0.5115171615 -0.56323815 -0.502440279  1.00000000
## month   0.004574921 -0.0001538866 -0.02535645 -0.002737064 -0.09580112
##                month
## open    0.0045749215
## high   -0.0001538866
## low    -0.0253564498
## close  -0.0027370640
## volume -0.0958011160
## month   1.0000000000

##                open       high         low      close     volume         date
## open   1.0000000000 0.98722357 0.994394371 0.97883509 0.05291925 0.0009727259
## high   0.9872235733 1.00000000 0.989528537 0.99497227 0.07644769 0.0171448593
## low    0.9943943706 0.98952854 1.000000000 0.98688131 0.03069859 0.0051498119
## close  0.9788350917 0.99497227 0.986881307 1.00000000 0.04111387 0.0191385340
## volume 0.0529192547 0.07644769 0.030698594 0.04111387 1.00000000 0.0561938572
## date   0.0009727259 0.01714486 0.005149812 0.01913853 0.05619386 1.0000000000

##              open       high        low      close     volume       year
## open    1.0000000  0.8213882  0.9026032  0.9753813 -0.4983927  0.6405887
## high    0.8213882  1.0000000  0.8952609  0.8064822 -0.5115172  0.5638781
## low     0.9026032  0.8952609  1.0000000  0.9039461 -0.5632382  0.6155643
## close   0.9753813  0.8064822  0.9039461  1.0000000 -0.5024403  0.6443680
## volume -0.4983927 -0.5115172 -0.5632382 -0.5024403  1.0000000 -0.5417292
## year    0.6405887  0.5638781  0.6155643  0.6443680 -0.5417292  1.0000000

Description

  • Heatmaps visualize how strongly IBM’s open, high, low, close, and volume move together and how they relate to month, day, or year.
  • Strong correlations among price fields are expected, while volume correlations may signal whether large activity coincides with big moves.

Conclusion

The correlation structure confirms tight links among price components and clarifies how volume interacts with price moves across different horizons.

Time Series Modeling and Forecasting

ARIMA Models on Close Prices

ARIMA models capture temporal dependence and generate forecasts.

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 4.333333       290.6610 284.1544 297.1677 280.7100 300.6121
## 4.366667       290.6255 281.4237 299.8272 276.5526 304.6983
## 4.400000       290.9692 279.6994 302.2390 273.7335 308.2049
## 4.433333       290.5794 277.5661 303.5926 270.6773 310.4814
## 4.466667       291.2774 276.7282 305.8267 269.0262 313.5286
## 4.500000       291.0759 275.1380 307.0138 266.7010 315.4508
## 4.533333       290.2896 273.0747 307.5045 263.9617 316.6175

Description

  • ARIMA models for daily, monthly, and yearly closing price series use automatic order selection to match observed autocorrelation.

  • Forecast plots show point predictions and confidence intervals for IBM closing prices over short and medium horizons.

Conclusion

ARIMA forecasts provide a quantitative baseline for IBM’s future closing prices, illustrating expected trajectories and uncertainty bands while acknowledging that real-world events can shift prices outside model assumptions.

Overall Conclusions

  • IBM’s stock series exhibit clear structure across monthly, daily, and yearly aggregations, with strong correlations among price fields and meaningful relationships to calendar time.

  • Outlier handling and feature scaling improve stability of estimates, t-tests highlight differences in mean closing prices across time scales, and ARIMA models offer baseline forecasts under historical dynamics.