Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.
The primary purpose of technological advancement is to help firms and organizations arrive at the most appropriate decision regarding production and efficiency. As a result, it is worth mentioning that data mining is the most outstanding in business decision-making, among other technological advancements that various organizations and businesses engage in. Therefore, this paper focused on the impacts of data mining on business decision and how that help in the improvement of the overall business performance. The bigger question is carry out data mining? Businesses compete in the global market to retain their market power, relevance and customer acquisition. Besides, businesses use various data mining techniques to arrive at the most appropriate decision. This paper employed a univariate time series to model the prediction of the adjusted prices for Apple Inc. for the next five years from 2023. A sample of forty-two observations from 1981 to 2022 will be downloaded using the company’s tickers symbol in RStudio statistical software program. Further, this study will review and report various findings from scholarly published articles and journals.
In the past couple of years, businesses globally have taken various technological advancements, including but not limited to artificial intelligence, cloud computing and data analytics. The primary purpose of technological advancement is to help firms and organizations arrive at the most appropriate decision regarding production and efficiency. It is worth mentioning that among technological advancements that various organizations and businesses engage in, data mining is the most outstanding in reference to business decision-making. In layman’s language, data mining is the process of processing of transforming raw data to give specific and important information to aid businesses in the decision-making process. However, the term data mining is defined differently by various scholars. For instance, according to Chen, Han, and Yu (1996), data mining is a subset of machine learning where large raw data sets are explored to derive the most important information helpful for business decision-making. In their paper, Chen, Han, and Yu (1996) argued that data mining not only helps businesses make the most appropriate decision but also help businesses secure and maintain their competitive space in the global market. In this regard, data mining is not a static process but rather a dynamic and continuous process. Considering all that has been said and done, this paper focuses on the impacts of data mining on business decision and how that help in the improvement of the overall business performance. The bigger question is carry out data mining? In the global market, businesses compete to retain their market power, relevance and customer acquisition. Besides, businesses use various data mining techniques to arrive at the most appropriate decision. According to Scott (2013), regression analysis is one of the traditional data mining techniques that businesses have been applying for many years. Scott went further and argued that regression analysis, especially the linear regression technique helps businesses identify how one business factor says, for instance, demand for certain commodities, changes with respect to change in commodity preference. For instance, real estate owners may be interested in finding out how the areas per square foot influence the house listing prices in certain areas. Using the linear regression technique, real estate owners would model a linear regression model to model the linear effect of area per square foot on the house listing prices. The results would provide an insight to the real estate owners on the future of the real estate market. Despite the roles that linear regression has played in business decision for many years, the technique was found to have quite a number of shortcoming that has led to the diminishing relevance of this technique in today’s business analytics approaches. Despite the shortcoming underlying the linear regression modeling in business analytics, various businesses still employ the technique today because other techniques believed to be perfect also have shortfalls. Overall, this paper purposely aims to evaluate how various data mining techniques help businesses make decisions. Business decisions arrived at will help businesses unveil the following; their competitive advantage in the global market, a better understanding of their customers and clients, new business opportunities and acquisition of customers. Further, data mining will significantly help provide better oversight of the business operation.
As mentioned earlier, data mining is a dynamic and continuous process with various stages. Problem identification is the first stage of data mining. During this stage, the business identifies a problem that is directly linked to the laid down objective and goals. Upon identification of the problem, data exploration is done as the second stage. In this stage, the required data is collected and explored. Thirdly, the data is cleansed using various approaches. Once the data is cleansed, a model is developed to address the problem under consideration. The model is then evaluated to see whether such a model would help address the problem under consideration. Finally, a fully evaluated model is deployed. It is worth mentioning that research involving data mining puts into consideration all the steps for data mining identified above. Among the recent studies in this field include but are not limited to data spectroscopic clustering, K-mean algorithm for data clustering, Latent Dirichlet Allocation (LDA) model, and MATLAB spectral clustering package.
In a study to establish responsible factors for successful startups for the User Generated Content (UGC), Saura, Palos-Sanchez, and Grilo (2019) found that “the topics with positive feelings for the identification of key factors for the startup business success are startup tools, technology-based startup, the attitude of the founders, and the startup methodology development.” This study used a Supervised Vector Machine (SVM), supported by Python focusing on the Twitter social network. As a result, the study concluded that businesses are currently operating in a technologically competitive environment requiring appropriate technological startup tools and technology. As discussed earlier, businesses globally are in the competition to have higher market power as well as a retaining the customer. Therefore, further customers acquisition is vital in the current business environment. In this regard, businesses and corporations are doing what they can to ensure improved customer acquisition. As a result, quite a lot of studies have tried to establish techniques responsible for improved customer acquisition. Additionally, customer acquisition not only applies to a large corporation but to retail marketing as well. In a study, “Data mining and machine learning: developing efficiency for better customer retention,” Kumar, Venkatesh, and Rahman (2021) used Multi-Variant K-Mean clustering, which primarily focused on E-commerce. The study found that identifying the purchasing patterns and users’ interest in the commodity play a vital role in customers’ retention and acquisition. The study concluded that customers’ retention and acquisition were higher upon identification of the customers’ interest and purchase patterns. According to the available literature, data mining is currently playing the most vital role in various companies’ financial projections and performance. In this regard, various organizations are determined to gather and store quite a huge amount of data in their databases. Among the remarkable organizations with huge database systems include but are not limited to Taobao, Walmart, Facebook and Google. The data stored in these databases come in two main forms, either unstructured or structured. According to Yafooz, Bakar, Fahad, and M Mithun (2019), on average, organizations make a revenue increase of approximately 20% by analyzing the structured data and a revenue increase of approximately 80% by analyzing the unstructured data (XML and RSS fees). This clearly indicates that data mining not only helps businesses make decisions but also provides insight into the available business opportunities. From the available research discussed above from various scholars, it is evident that the research primarily focused on the inferential parts of data mining to help firms arrive at the most appropriate decision. However, this paper shall employ data visualization techniques including but not limited to scatter plots, dual combinations and side-by-side bar graphs to help businesses at the most appropriate decision. Besides, the available studies focused on various data mining techniques, with only a few studies, if not none, focusing on Autoregressive Integrated Moving Average (ARIMA) model to make predictions. Therefore, this study shall make use of ARIMA modeling to make some basic business predictions making this study distinguishable from all other studies discussed above.
This section discusses the research design, data collection techniques, and analysis. The paper will employ a univariate time series to model the prediction of the adjusted prices for Apple Inc. for the next five years. A sample of forty-two observations from 1981 to 2022 will be downloaded using the company’s tickers symbol in RStudio statistical software program. Further, this study will also review and report various findings from scholarly published articles and journals.
The section discusses empirical literature on data mining and business decision-making. As discussed earlier, data mining is a subject matter that is currently at the core of many organizations and organizations.
A couple of years ago, several entertainment platforms such as Facebook, YouTube and Instagram, among other social media platforms, experienced an increase in the number of live streams. As a result, live streaming plays a significant role in increasing the revenue for various organizations. For instance, the study “Data mining analytics investigate Facebook Live stream users’ behavior and business model: The evidence from Thailand” by Liao, Widowati, and Puttong (2022) establishes the behavior effect of various Facebook users. This employed an empirical research survey using a sample size of approximately 4000 observations. Besides, the study used K0-means clustering techniques. The author concluded that an increase in Facebook live streams over time had increased the possibility of obtaining users’ knowledge. In return, this help organizations analyze users’ preference, taste and choice. Ultimately, this results in improved customer acquisition and retention. According to Gazzawe and Alturki (2022), “most businesses have invested heavily in the data mining process, which enables them to easily study and analyze the market environment and improve their domination in the market.” The study reveals that acquiring market power and dominance is a product of appropriate data mining techniques. The study used categorical data and found that “the effects of using data mining techniques are huge towards achieving success for various businesses.” Furthermore, the study concluded that data mining is more influential in determining an organization’s future financial and overall performance. This study’s results align with the results found by Francisca et al. (2019). In their study, “Business Intelligence and Data Mining to Support Sales in Retail,” Francisca et al. found that the success of retail businesses relies on business intelligence and the appropriate data mining techniques.
As discussed earlier, quite a number of studies focused on inferential statistics to determine the effect of one variable on the other. For instance, according to the report released by Federal Reserve, the media house listing prices were found to be positively related to the area per square foot. The report established a linear association between house listing prices and area per square foot. However, this study will employ ARIMA modeling to predict the adjusted stock prices of Apple Inc., which is a new concept in data mining. Further, the results from this study will add knowledge to the existing body of literature on data mining.
The Development of Data Mining The origin of data warehousing is dated back to the late 1980s and the beginning of the 1990s. The origin of data warehousing led to an increase in the amount of data collected and stored. As a result, data warehousing facilitated the mobility of data analytics techniques. Data mining technique as one of the critical disciplines of data science emerged in 1995 during the First International Conference on Knowledge Discovery. Broadly speaking, data mining addresses certain business problems related to business objectives and goals. The solution to the problem under consideration is arrived at through data analysis. In other words, data mining can be termed as a critical subset of data analytics involving various stages. Further, various scholars argue that data mining is also a critical core of data science. This technique uses quite more advanced statistical analytic techniques. Using all these techniques helps to derive meaning from the raw data. Data mining, as discussed earlier, has various stages discussed below.
This is the first stage of the data mining process. During this stage, the raw data from the field is examined and cleaned. The process involves removing missing observations or removing the observations that are not needed. However, this stage may also include filling in the missing values to have complete raw data. In addition, removing outliers from the raw data is done in this stage and solving the inconsistency problem in our data set.
In layman’s language, data integration combines various data sets needed for analysis. For instance, one may have various data sources from various fields needed to address the organizational problem under consideration. Combining various data sets from various data sources increases the accuracy of the results. It is worth mentioning that data integration involves using various tools, including but not limited to Microsoft SQL and Oracle Data Service Integrator (ODSI).
The entire data mining process may not use all the observations of the raw data collected. It is, therefore, appropriate to reduce the data to the appropriate and manageable sample size. Data reduction is done using several methods, including but not limited to Neural networks, decision trees, and Naïve Bayes. Besides various data reduction methods, various data reduction strategies are used, including numerosity reduction and dimensionality reduction, among other strategies.
In the data mining process, data collected may not be in the form required for analysis, thus necessitating the transformation of the data. Transformation of data help in attaining some aspects of consistency in the data. Besides, data transformation may be in the form of normalizing the data or log-linearizing the data. It is important to note that data coding is one of the most important data transformation techniques.
In this stage, data structuring methods are applied. The data under consideration is structured to form certain patterns from which various models will be developed. This method involves the data clustering technique.
Data patterns are evaluated at this stage using various data visualization techniques. Some of the most commonly used data visualization techniques include but are limited to scatter plotting, combined line graphs, pie charts, and histograms, among other techniques. In addition, data visualization provides an in-depth meaning to the data. Therefore, data visualization is quite meaningful, especially for someone with zero knowledge of data analytics.
This forms the last step in data mining, where the results of data mining are basically presented in the form of charts, graphs and tables. Application of Data Mining in Business Today In the digitalized business world, data mining plays a very key role in the overall performance of quite a number of businesses. Some areas where this technique is applied include the retail, financial, and insurance markets, among others. For example, retail marketing uses data mining to customize companies’ ads to increase overall sales. On the other hand, financial markets apply data mining to predict future borrowing trends as well as to detect fraudulent transactions. Besides, various organizations are currently using data mining techniques to predict their future stock prices using various techniques; however, this paper shall use ARIMA modeling to predict the adjusted stock prices for Apple Inc. for the next five years.
Functions of data mining are classified into two categories; unsupervised and supervised. The supervised function, also called the directed model, includes the following approaches; regression analysis, classification and attribute importance. From the available literature, directed models are also called predictive models. These models help to predict the values of a certain variable. For instance, one may be interested in predicting the future sale of a company in the next few years, months or days. As a result, ARIMA modeling, a subset of time series modeling in that data science field, is the most appropriate forecasting technique to predict the value of a certain variable. On the other hand, the unsupervised function of data mining makes use of structure and relations. Such models are also called descriptive models.
The appropriateness in decision-making is vital to the success of any business. According to Bara and Lungu (2012), data mining is regarded as the decision support system. Bara and Lungu went further in their study to conclude, “In order to make a decision, the managers need knowledge. In the case of massive data amounts, issues may occur because of data analysis and necessary knowledge extract. Data is analyzed through an automated process, known as Knowledge Discovery in data mining techniques.” Data mining can solve various objective and goal-related problems for an organization. For instance, business managers may be interested in establishing the future performance of adjusted stock prices. Consider the data below-containing opening stock price, lowest and highest stock prices, closing stock price and adjusted stock prices, among other variables.
library(pacman)
pacman::p_load(data.table, fixest, BatchGetSymbols, finreportr, ggplot2, lubridate)
package 'dreamerr' successfully unpacked and MD5 sums checked
package 'fixest' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\user\AppData\Local\Temp\RtmpkTtLFk\downloaded_packages
package 'globals' successfully unpacked and MD5 sums checked
package 'listenv' successfully unpacked and MD5 sums checked
package 'parallelly' successfully unpacked and MD5 sums checked
package 'furrr' successfully unpacked and MD5 sums checked
package 'future' successfully unpacked and MD5 sums checked
package 'BatchGetSymbols' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\user\AppData\Local\Temp\RtmpkTtLFk\downloaded_packages
package 'XBRL' successfully unpacked and MD5 sums checked
package 'finreportr' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\user\AppData\Local\Temp\RtmpkTtLFk\downloaded_packages
first.date <- Sys.Date() - 15000
last.date <- Sys.Date()
freq.data <- "yearly"
tickers <- c( "AAPL")
stocks <- BatchGetSymbols(tickers = tickers,
first.date = first.date,
last.date = last.date,
freq.data = freq.data,
do.cache = FALSE,
thresh.bad.data = 0)
stocks_DT <- stocks$df.tickers %>% setDT() %>%
.[order(ticker, ref.date)]
head(stocks_DT,5)
ticker ref.date volume price.open price.high price.low price.close
1: AAPL 1982-04-29 17302745600 0.065290 0.155692 0.049107 0.133371
2: AAPL 1983-01-03 44513011200 0.133371 0.282366 0.077009 0.108817
3: AAPL 1984-01-03 41979033600 0.108817 0.153460 0.097656 0.130022
4: AAPL 1985-01-02 45492272000 0.130022 0.138951 0.064732 0.098214
5: AAPL 1986-01-02 53323222400 0.098214 0.195871 0.097098 0.180804
price.adjusted ret.adjusted.prices ret.closing.prices
1: 0.10348108 NA NA
2: 0.08442992 -0.1841029 -0.1841029
3: 0.10088265 0.1948685 0.1948684
4: 0.07620320 -0.2446353 -0.2446355
5: 0.14028388 0.8409186 0.8409188
tail(stocks_DT,5)
ticker ref.date volume price.open price.high price.low price.close
1: AAPL 2019-01-02 28254942800 38.7225 73.4925 35.5000 73.4125
2: AAPL 2020-01-02 39863855600 74.0600 138.7900 53.1525 132.6900
3: AAPL 2021-01-04 22812206100 133.5200 182.1300 116.2100 177.5700
4: AAPL 2022-01-03 22065504500 177.8300 182.9400 125.8700 129.9300
5: AAPL 2023-01-03 6204307300 130.2800 176.3900 124.1700 171.5600
price.adjusted ret.adjusted.prices ret.closing.prices
1: 71.71174 0.8895785 0.8616076
2: 130.73534 0.8230674 0.8074579
3: 176.03275 0.3464818 0.3382320
4: 129.55272 -0.2640419 -0.2682886
5: 171.56000 0.3242485 0.3204033
` ### View the structure of the data
str(stocks_DT)
Classes 'data.table' and 'data.frame': 42 obs. of 10 variables:
$ ticker : chr "AAPL" "AAPL" "AAPL" "AAPL" ...
$ ref.date : Date, format: "1982-04-29" "1983-01-03" ...
$ volume : num 1.73e+10 4.45e+10 4.20e+10 4.55e+10 5.33e+10 ...
$ price.open : num 0.0653 0.1334 0.1088 0.13 0.0982 ...
$ price.high : num 0.156 0.282 0.153 0.139 0.196 ...
$ price.low : num 0.0491 0.077 0.0977 0.0647 0.0971 ...
$ price.close : num 0.1334 0.1088 0.13 0.0982 0.1808 ...
$ price.adjusted : num 0.1035 0.0844 0.1009 0.0762 0.1403 ...
$ ret.adjusted.prices: num NA -0.184 0.195 -0.245 0.841 ...
$ ret.closing.prices : num NA -0.184 0.195 -0.245 0.841 ...
- attr(*, ".internal.selfref")=<externalptr>
class(stocks_DT)
[1] "data.table" "data.frame"
library(vars)
library(tseries)
library(tidyverse)
library(stargazer)
library(readxl)
library(forecast)
library(ggplot2)
library(ggthemes)
data_mining<-read.csv("C:\\Users\\user\\Downloads\\data_mining.csv")
head(data_mining,10)
ticker year volume price.open price.high price.low price.close
1 AAPL 1981 5193395200 0.136719 0.147321 0.063616 0.098772
2 AAPL 1982 21365008000 0.098772 0.155692 0.049107 0.133371
3 AAPL 1983 44513011200 0.133371 0.282366 0.077009 0.108817
4 AAPL 1984 41979033600 0.108817 0.153460 0.097656 0.130022
5 AAPL 1985 45492272000 0.130022 0.138951 0.064732 0.098214
6 AAPL 1986 53323222400 0.098214 0.195871 0.097098 0.180804
7 AAPL 1987 59771308800 0.180246 0.533482 0.179129 0.375000
8 AAPL 1988 41292977600 0.381696 0.426339 0.316964 0.359375
9 AAPL 1989 50905825600 0.359375 0.449777 0.290179 0.314732
10 AAPL 1990 44401940800 0.314732 0.426339 0.216518 0.383929
price.adjusted ret.adjusted.prices ret.closing.prices
1 0.077094 NA NA
2 0.104099 0.35028666 0.35029158
3 0.084934 -0.18410359 -0.18410299
4 0.101485 0.19486896 0.19486845
5 0.076658 -0.24463714 -0.24463552
6 0.141121 0.84091680 0.84091881
7 0.294160 1.08445235 1.07406916
8 0.284300 -0.03351917 -0.04166667
9 0.251429 -0.11562082 -0.12422400
10 0.310517 0.23500869 0.21986007
attach(data_mining)
str(data_mining)
'data.frame': 42 obs. of 10 variables:
$ ticker : chr "AAPL" "AAPL" "AAPL" "AAPL" ...
$ year : int 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 ...
$ volume : num 5.19e+09 2.14e+10 4.45e+10 4.20e+10 4.55e+10 ...
$ price.open : num 0.1367 0.0988 0.1334 0.1088 0.13 ...
$ price.high : num 0.147 0.156 0.282 0.153 0.139 ...
$ price.low : num 0.0636 0.0491 0.077 0.0977 0.0647 ...
$ price.close : num 0.0988 0.1334 0.1088 0.13 0.0982 ...
$ price.adjusted : num 0.0771 0.1041 0.0849 0.1015 0.0767 ...
$ ret.adjusted.prices: num NA 0.35 -0.184 0.195 -0.245 ...
$ ret.closing.prices : num NA 0.35 -0.184 0.195 -0.245 ...
adjusted_price<-ts(data_mining$price.adjusted, start=min(data_mining$year), end=max(data_mining$year), frequency = 1)
head(adjusted_price,5)
Time Series:
Start = 1981
End = 1985
Frequency = 1
[1] 0.077094 0.104099 0.084934 0.101485 0.076658
class(adjusted_price)
[1] "ts"
ts.plot(adjusted_price)
The graph above shows that adjusted price has been increasing over time
from 1980. However, Apple Inc. experienced a rapid increase in adjusted
stock price from 2010 as indicated in the graph above.
Test the stationarity of the data under consideration
adf.test(adjusted_price)
Augmented Dickey-Fuller Test
data: adjusted_price
Dickey-Fuller = 3.0647, Lag order = 3, p-value = 0.99
alternative hypothesis: stationary
From the result above, adjusted prices for Apple Incorporation is highly correlated with itself and therefore not stationary. As a result the assumption of stationarity of the data is violated.
acf(adjusted_price)
pacf(adjusted_price)
auto.arima(adjusted_price,ic="aic", trace = TRUE)
ARIMA(2,1,2) with drift : Inf
ARIMA(0,1,0) with drift : 337.8141
ARIMA(1,1,0) with drift : 335.7798
ARIMA(0,1,1) with drift : 331.4016
ARIMA(0,1,0) : 338.018
ARIMA(1,1,1) with drift : 333.3192
ARIMA(0,1,2) with drift : 326.8844
ARIMA(1,1,2) with drift : Inf
ARIMA(0,1,3) with drift : 323.768
ARIMA(1,1,3) with drift : Inf
ARIMA(0,1,4) with drift : Inf
ARIMA(1,1,4) with drift : Inf
ARIMA(0,1,3) : 323.1537
ARIMA(0,1,2) : 324.9334
ARIMA(1,1,3) : 323.6009
ARIMA(0,1,4) : Inf
ARIMA(1,1,2) : 333.3123
ARIMA(1,1,4) : Inf
Best model: ARIMA(0,1,3)
Series: adjusted_price
ARIMA(0,1,3)
Coefficients:
ma1 ma2 ma3
0.7583 0.1441 -0.4715
s.e. 0.1491 0.2060 0.1536
sigma^2 = 129.9: log likelihood = -157.58
AIC=323.15 AICc=324.26 BIC=330.01
ARIMAMODEL<-auto.arima(adjusted_price,ic="aic", trace = TRUE)
ARIMA(2,1,2) with drift : Inf
ARIMA(0,1,0) with drift : 337.8141
ARIMA(1,1,0) with drift : 335.7798
ARIMA(0,1,1) with drift : 331.4016
ARIMA(0,1,0) : 338.018
ARIMA(1,1,1) with drift : 333.3192
ARIMA(0,1,2) with drift : 326.8844
ARIMA(1,1,2) with drift : Inf
ARIMA(0,1,3) with drift : 323.768
ARIMA(1,1,3) with drift : Inf
ARIMA(0,1,4) with drift : Inf
ARIMA(1,1,4) with drift : Inf
ARIMA(0,1,3) : 323.1537
ARIMA(0,1,2) : 324.9334
ARIMA(1,1,3) : 323.6009
ARIMA(0,1,4) : Inf
ARIMA(1,1,2) : 333.3123
ARIMA(1,1,4) : Inf
Best model: ARIMA(0,1,3)
summary(ARIMAMODEL)
Series: adjusted_price
ARIMA(0,1,3)
Coefficients:
ma1 ma2 ma3
0.7583 0.1441 -0.4715
s.e. 0.1491 0.2060 0.1536
sigma^2 = 129.9: log likelihood = -157.58
AIC=323.15 AICc=324.26 BIC=330.01
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 1.770245 10.84252 4.712395 -0.3101748 58.17462 0.8311495
ACF1
Training set 0.05248978
acf(ts(ARIMAMODEL$residuals))
pacf(ts(ARIMAMODEL$residuals))
forecasted_adjusted_p<-forecast(ARIMAMODEL,level=c(95), h=5)
forecasted_adjusted_p
Point Forecast Lo 95 Hi 95
2023 99.37089 77.02534 121.7164
2024 93.39623 48.19999 138.5925
2025 106.46761 44.42708 168.5081
2026 106.46761 36.67486 176.2604
2027 106.46761 29.70154 183.2337
The forecast above show the projected adjusted stock prices from 2023 to 2027. From the results above, the projected price.adjusted is approximately 99.37089 for 2023, 93.3923 for 2024, 106.46761 for 2025, 106.46761 for 2026 and finally 106.46761 for 2027. Apple Inc. will have evaluate its operation to remain within the projection.
plot(forecasted_adjusted_p,type="l",main="Time Series plot of Forecasted adjusted price for AAPL (ARIMA 0,1,3)",xlab="Time in Years",ylab="Forecasted adjusted price")
Box.test(forecasted_adjusted_p$resid, lag=5,type="Ljung-Box")
Box-Ljung test
data: forecasted_adjusted_p$resid
X-squared = 1.8907, df = 5, p-value = 0.8641
Box.test(forecasted_adjusted_p$resid, lag=2,type="Ljung-Box")
Box-Ljung test
data: forecasted_adjusted_p$resid
X-squared = 1.2617, df = 2, p-value = 0.5322
Box.test(forecasted_adjusted_p$resid, lag=20,type="Ljung-Box")
Box-Ljung test
data: forecasted_adjusted_p$resid
X-squared = 2.7611, df = 20, p-value = 1
From the results, ARIMA(0,1,3) is good for prediction for the adjusted price and has no autocorrelation issue. Box-Ljung test gave a p-value value of 0.8641 at lag 5, a p-value of 0.5322 at lag 2 and a p-value of 1 at lag 20. Both lags shows that there is no autocorrelation issue and thus our ARIMA (0,1,3) is good for the prediction of the adjusted stock prices at Apple Incorporation.
The correlational analysis establishes the direction and degree of association between two variables. However, it should not be mistaken to mean causation. In this paper, a correlational analysis was performed to establish the association between opening stock prices and closing stock prices of Apple Incorporation. Consider the results below.
cor.test(data_mining$price.open,data_mining$price.close, method = "pearson")
Pearson's product-moment correlation
data: data_mining$price.open and data_mining$price.close
t = 16.902, df = 40, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8843898 0.9656345
sample estimates:
cor
0.9365755
The results above show a high and positive correlation coefficient of approximately 0.9366. Besides, the correlation between the two variables is significant, as indicated by a probability of approximately 0.001. Therefore, these results imply that a higher closing stock price is associated with a higher opening stock price.
According to Dogan and Birant (2021), data mining is currently considered a tool for knowledge acquisition. In their study, Dogan and Birant argued, “data mining has been widely used as a fundamental tool for knowledge discovery from manufacturing databases, where the necessary data to be analyzed can be gathered throughout the ordinary manufacturing operations.” It is therefore evident that businesses are currently operating in a data-driven environment. In other words, the refined information from the raw data tells the company the course of action in the future. For instance, Dogan and Birant (2021) argued that to determine the number of products to produce, the company needs to establish the demand patterns. That is, what the company produces must be able to meet the current demand. Various techniques can be used in this case to forecast the future, including but not limited to linear regression analysis and Autoregressive Integrated Moving Average (ARIMA). By evaluating the current demand, the company will be able to control the quantity and quality of products as well as manage the resources. Their study concluded that data mining plays a more significant role in the manufacturing industry than ever before.
At this point, it is important to acknowledge that the business world is data-driven, having seen how important data mining is to businesses. From the discussion above, this study found that data mining is the supporting decision system where the organization will decide based on the refined information from the raw data. However, the data-driven environment is associated with several challenges, including sampling error and missing data, among other problems. First, conducting data reduction during the data mining process is a vital stage that helps remain with the required data. However, the problem associated with a small sample might lead to incorrect estimates, thus, making the wrong decisions. For instance, if data is collected about preference for pizza from a population of people ages 55 to 70 years. In that case, it won’t be appropriate to make the conclusion and decision based on such a sample because the sample would contain external validity, and the results should only be generalized to people aged between 55 and 70 years. Secondly, the sample size matters a lot in the entire process of data mining. A smaller sample than required may lead to incorrect estimates and wrong decisions. Finally, the problem of large data sets and dirty and noisy data is another problem that leads to wrong estimates.
The future of the business world is expected to change due to technology change. In layman’s language, technology is a continuous change in manufacturing, production and supply chain, and customer satisfaction. Besides, technology will not only affect businesses operation but will also affect nearly every industry globally. Technology’s effects range from manufacturing, production, and logistics, among other business operations. Besides, the rapidly spreading artificial intelligence in manufacturing and production is taking business operations to another level. As a result, the widespread installation of artificial intelligence threatens the future labor workforce. Thus, the labor workforce is expected to reduce as popularity and preference for artificial intelligence increase significantly. However, change is inevitable in the business world. Therefore, businesses globally will change regarding time tracking, expenses management and invoice handling, among other operations.
As a critical discipline in data science, data mining has shaped and continues to shape the business world as far as manufacturing and production are concerned. The study found that data mining has a positive and significant effect on business operation, where data mining was found to have two functions, supervised function and unsupervised function. The two functions are vital in shaping businesses globally. For instance, supervised functions, also sometimes known as predictive models, significantly help predict various business parameters such as demand and stock prices, among other parameters. Using predictive models such as Autoregressive Integrated Moving Average helps businesses predict their future financial performance. For example, this study used ARIMA modeling to predict the future adjusted stock price for Apple Incorporation. Predictive models are vital in helping businesses plan for the unforeseeable future. Consequently, this helps organizations retain their competitive position and market power. The study focused on the predictive function of data mining and therefore recommends future studies to focus on the unsupervised function of data mining.
Bâra, A., & Lungu, I. (2012). Improving Decision Support Systems with Data Mining Techniques”. Advances in Data Mining Knowledge Discovery and Applications. https://doi.org/10.5772/47788
Chen, M.-S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883. https://doi.org/10.1109/69.553155
Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060. https://doi.org/10.1016/j.eswa.2020.114060
Francisca, C.-B., Reis, J. L., Vieira, J. C., & Cayolla, R. (2019). Business Intelligence and Data Mining to Support Sales in Retail. Marketing and Smart Technologies, 406–419. https://doi.org/10.1007/978-981-15-1564-4_38
FRED. (2016). Housing Inventory: Median Listing Price per Square Feet in the United States. Stlouisfed.org. https://fred.stlouisfed.org/series/MEDLISPRIPERSQUFEEUS
Gazzawe, F., & Alturki, R. (2022). Data Mining and Soft Computing in Business Model for Decision Support System. Scientific Programming, 2022, 1–6. https://doi.org/10.1155/2022/9147444
Kumar, M. R., Venkatesh, J., & Rahman, A. M. J. M. Z. (2021). Data mining and machine learning in retail: developing efficiencies for better customer retention. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-020-02711-7
Liao, S.-H., Widowati, R., & Puttong, P. (2022). Data mining analytics investigate Facebook Live stream users’ behaviors and business models: The evidence from Thailand. Entertainment Computing, 41, 100478. https://doi.org/10.1016/j.entcom.2022.100478
Saura, J. R., Palos-Sanchez, P., & Grilo, A. (2019). Detecting Indicators for Startup Business Success: Sentiment Analysis Using Text Data Mining. Sustainability, 11(3), 917. https://doi.org/10.3390/su11030917
Scott, D. (2013). Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R by Daniel S. Putler, Robert E. Krider. International Statistical Review, 81(2), 328–328. https://doi.org/10.1111/insr.12020_19
Yafooz, W. M. S., Bakar, Z. B. A., Fahad, S. K. A., & M Mithun, Ahamed. (2019). Business Intelligence Through Big Data Analytics, Data Mining and Machine Learning. Data Management, Analytics and Innovation, 217–230. https://doi.org/10.1007/978-981-13-9364-8_17