I. Introduction


      Exploratory Data Analysis (EDA) plays a pivotal role in the process of analyzing data, serving as a crucial step in the systematic investigation of datasets. It involves a methodical examination aimed at discovering patterns, identifying anomalies (outliers), and revealing concealed insights within the data. In the context of this report, our primary focus is to apply EDA techniques to a dataset specifically designed for stock market analysis. Through the exploration of this dataset, our objective is to gain a deeper understanding of the stock price movement. Leveraging EDA techniques such as summary statistics and visualizations, we aspire to unearth valuable information that can contribute significantly to our comprehension of a company’s stock market profile.
      This Exploratory Data Analysis (EDA) Report focuses on the stock price analysis of Alphabet Inc. (GOOGL). Alphabet Inc., the parent company of Google, was established in 2015 following a corporate restructuring. Featuring two types of stock, Class A and Class C, each with unique voting rights in comparison to Class B shares, Alphabet is distinctive in its ownership structure. Class B shares, held by insiders, grant higher voting power. Notably, Class C shares trade under the ticker symbol “GOOG,” while Class A shares are traded under “GOOGL.” Google is prominently featured in major stock market indices such as S&P 500 and NASDAQ-100, serving as benchmarks for gauging overall market performance.
      The dataset at hand represents Google’s stock price from August 19, 2004, when the company went public, to November 10, 2023, encompassing 4842 trading sessions. This dataset comprises seven distinct elements, providing comprehensive insights into the trading activity and performance of Google’s stock over the specified period. Sourced from a Kaggle dataset by SURAJ JOSHI, the data was acquired through a straightforward CSV file download.
      This dataset illustrates Alphabet’s Quarterly Shares Outstanding, sourced from macrotrends.net, a premier research platform for long-term investors. Acquiring the data involved extracting information from tables provided by the website, as direct web scraping was not permissible. The extracted information was then organized into a CSV file.
      The analysis employs R programming, focusing on Time Series Analysis as the primary technique. Additionally, Table Joining is applied to integrate insights derived from multiple datasets, offering a comprehensive perspective on Alphabet Inc.’s stock performance.
      Before concluding, I would like to extend my heartfelt thanks to everyone who has contributed to the development of this report. Your input has been invaluable. I welcome any comments, critiques, or suggestions that can help enhance the quality of our work. Please feel free to share your thoughts via email. Your feedback is highly appreciated. Thank you once again for your collaboration and contributions.


II. Dataset Properties


      In this section, we will begin by exploring the foundational aspects of our dataset. This includes the process of loading the data, understanding its overall dimensions, and gaining insights into the variables at our disposal. By delving into these key properties, we aim to lay the groundwork for a comprehensive exploration of the dataset’s nuances and characteristics.


2.1. Loading Data

      There are two different datasets that we are going to use. We save our first dataset (from Kaggle) to an object called google and second set to shares.
#first data 
google = read.csv("google_stock_price.csv")

#second data
shares = read.csv("alphabet_quarterly_shares_outstanding.csv")

2.2. Data Dimension

      We want to know how many observations and variables inside our datasets. Using dim function, we know that the google has 4842 observations and 7 variables. Meanwhile shares has 59 observations and 2 variables.
#first data
dim(google)
## [1] 4842    7

#second data
dim(shares)
## [1] 59  2

2.3. Variables

      These are the explanation of variables inside each dataset that we have. In our first data, google, we have 7 observations which are:


  1. Date (character): This column represents the calender date when the data about the stock is recorded.

  2. Open (numeric): This column represents the first recorded price of the stock for a trading session.

  3. High(numeric): The high price represents the highest traded price of the stock during a given trading session. It reflects the peak value that the stock reached during the day.

  4. Low (numeric): The low price is the lowest traded price of the stock during a specific trading session. It indicates the minimum value that the stock reached during the day.

  5. Close (numeric): The closing price is the last traded price of the stock at the end of a trading session. It reflects the final value at which the stock was traded before the market closes.

  6. Adj Close (numeric): The adjusted closing price accounts for corporate actions, such as dividends, stock splits, and new stock offerings, that may affect the stock’s price but are not directly related to its performance. The adjusted close is often used to assess the stock’s performance over time.

  7. Volume (integer): Volume represents the total number of shares traded during a specific time period. It gives an indication of the level of market activity and liquidity for that stock. High volume often suggests increased investor interest, while low volume may indicate less active trading.

In our second data, shares, we have 2 observations which are:

  1. Date (character): This column represents the calender date when the data about the shares outstanding is recorded.

  2. Shares (numeric): This column represents the company’s shares outstanding at that time. Shares outstanding refer to the total number of a company’s shares of stock that are currently held by shareholders, including both institutional investors and individual investors.


III. Problems


      Throughout the report, we will systematically analyze and address the following list of problems to uncover insightful answers.


  1. How has the stock market price for Google fluctuated over time?

  2. What does the historical annual stock price data for Alphabet reveal? Which year boasts the highest or lowest average stock price?

  3. What is the annual price change in Alphabet’s stock price, and which year exhibits the widest range (lowest to highest)?

  4. How does Alphabet’s stock price differ (highest minus lowest) per day for the entire 2023 records, and when did the highest range occur?

  5. During which periods of the year 2023 is the closing price higher than the opening price, and vice versa?

  6. Is there a prevailing trend where the closing price consistently surpasses the opening price, or vice versa?

  7. How does Alphabet’s moving average change over 50, 100, and 200 days, and does it mirror the movement of the close price?

  8. How has Alphabet’s cumulative return evolved over time?

  9. What is Alphabet’s average monthly return, and which month experiences the highest and lowest average return?

  10. How does Alphabet’s average daily return vary within a week, and which day witnesses the highest and lowest average return?

  11. What is Alphabet’s average daily trading volume within a week, and which day experiences the highest or lowest average volume?

  12. What are the average daily highest and lowest prices for Alphabet within a week, and which day exhibits the highest average high price and the lowest average low price?

  13. How has the stock market volume for Google changed over time?

  14. How do the variables in our dataset correlate with each other, and are there any variables that significantly influence one another?

  15. How has Alphabet’s volatility changed over time, and during which period did it reach its peak?

  16. How is Alphabet’s volatility distributed in a density curve, and does it adhere to a normal distribution?

  17. To what extent does Alphabet’s volatility align with a normal distribution?

  18. What is Alphabet’s shares outstanding per quarter from 2009 to 2023, and when does the shares outstanding reach its peak?

  19. How does Alphabet’s market capitalization change over time, and when does it reach its peak?


IV. Data Wrangling


      In this section, we conduct a comprehensive analysis of the statistical features within our dataset, evaluating the need for potential modifications to certain aspects of our observations.


4.1. Descriptive Statistics

      In this section, we provide a comprehensive summary of each variable in our dataset, encompassing key statistical measures such as length, class, mode, minimum value, median, maximum value, mean, and quartile values. This thorough examination allows us to grasp the range of our variables and identify potential outliers or missing values. Through this analysis, we can affirm the absence of any missing values in our dataset.
#first data 
knitr::kable(summary(google), caption = 'Table 4.1 Summary of "google"')
Table 4.1 Summary of “google”
Date Open High Low Close Adj.Close Volume
Length:4842 Min. : 2.47 Min. : 2.534 Min. : 2.39 Min. : 2.491 Min. : 2.491 Min. :1.584e+05
Class :character 1st Qu.: 12.82 1st Qu.: 12.949 1st Qu.: 12.70 1st Qu.: 12.828 1st Qu.: 12.828 1st Qu.:2.867e+07
Mode :character Median : 26.47 Median : 26.668 Median : 26.26 Median : 26.470 Median : 26.470 Median :6.202e+07
NA Mean : 41.16 Mean : 41.602 Mean : 40.75 Mean : 41.181 Mean : 41.181 Mean :1.192e+08
NA 3rd Qu.: 57.20 3rd Qu.: 57.730 3rd Qu.: 56.63 3rd Qu.: 57.315 3rd Qu.: 57.315 3rd Qu.:1.470e+08
NA Max. :151.86 Max. :152.100 Max. :149.89 Max. :150.709 Max. :150.709 Max. :1.651e+09
#second data
knitr::kable(summary(shares), caption = 'Table 4.2 Summary of "shares"')
Table 4.2 Summary of “shares”
Date Shares
Length:59 Length:59
Class :character Class :character
Mode :character Mode :character


4.2. Data Transformation

      Here we transform our data into shapes or types that make data processing convenient. For both data, we turn their data type from data.frame to tibble for simplicity. We also transform variable Date type in google and shares from character type to Date type. Next, we transform variable Shares in shares to match the unit that we want. Next, we create google_volume, a derivative of google that only contains the date and volume of our stock price. Meanwhile google_price excludes volume, containing only different types of stock prices, and google_2023 contains only stock prices recorded in 2023.
#first data: turn data type into tibble
google = as_tibble(google)

#first data: variable transformation
google = google %>% 
  mutate(Date = ymd(Date))

#first data: isolate variable "Date" and "Volume"
google_volume = google %>% 
  select(Date, Volume)

#first data: containing variables other than "Volume"
google_price = google %>% 
  select(-c(Volume)) %>% 
  pivot_longer(-Date, 
               names_to="Category",
               values_to="Price")

#first data: containing only 2023 price records
google_2023 = google %>%
  filter(year(Date) == 2023)

#second data: turn data type into tibble
shares = as_tibble(shares)

#second data: variable transformation
shares <- shares %>%
  mutate(Shares = as.numeric(gsub(",", "", Shares)) / 10**3) %>% 
  mutate(Date = mdy(Date))


4.3. Data Cleaning

      Here, we undertake the transformation of our data into structured formats or types that facilitate convenient data processing. This step is crucial for effective data cleaning and ensures optimal preparation for subsequent analyses.


      4.3.1 Irrelevant Features
      Irrelevant features are variables in our dataset that will not give us any particular information related to our problems. In our first dataset google, we found that the variable Close and Adj. Close have the same value. Hence we decided to remove the latter.
#check identical values
identical(google$Close, google$Adj.Close)
## [1] TRUE

#remove variable
google = google %>%
  select(-Adj.Close)
      4.3.2 Duplicate Records
      Duplicate Records refer to identical or nearly identical rows in a dataset. There are no duplicate records, in terms of double entry for the same date, in our datasets.
#first data
identical(length(unique(google$Date)), 
          nrow(google))
## [1] TRUE

#second data
identical(length(unique(shares$Date)), 
          nrow(shares))
## [1] TRUE
      4.3.3 Missing Values
      Missing value refer to the absence of data in a specific variable or observation within a dataset. We have shown in Descriptive Statistics section that through data summary, there are no missing values in our data.


      4.3.4 Outliers
      In the outliers subsection, we address unusual data points that deviate significantly from the majority of our dataset. Identifying and understanding these outliers is crucial as they can impact statistical analyses and interpretations. While looking at our data with boxplots, we found some unusual points (outliers) that stand out in our time series analysis. Though we see these outliers, our current knowledge does not cover specific methods to deal with them. Therefore, for this report, we have decided to keep the outliers as they are. We are optimistic that as we learn more in the future, we will be better equipped to handle and process outliers in time series analyses.
par(mfrow=c(1,6))  

boxplot(google$Open, main = "'Open'")
boxplot(google$Close, main = "'Close'")
boxplot(google$High, main = "'High'")
boxplot(google$Low, main = "'Low'")
boxplot(google$Volume, main = "'Volume'")
boxplot(shares$Shares, main = "'Shares'")


V. Exploratory Data Analysis


      Enter the Exploratory Data Analysis (EDA) section, where we employ visual representations and clear responses to address the previously raised questions. Through a thorough examination, we aim to uncover significant patterns and insights within our dataset, fostering a comprehensive understanding of its dynamics.


0. Libraries and Fonts


| Throughout our data processing, we use these libraries and fonts to improve our data visualization.

library(dslabs)
library(dplyr)
library(tidyverse)
library(lubridate)
library(gridExtra)
library(tidyr)
library(plotly)
library(extrafont)
library(reshape2)
font_import(pattern = "lmroman*")
## Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
## Continue? [y/n]
loadfonts()
library("TTR")


1. Stock Market Price


| Problem: How has the stock market price for Google fluctuated over time?


| Explanation: The stock price represents the monetary value assigned to a single share of a company’s stock, determined by market forces such as supply and demand. Each year label on the x-axis marks the first recorded closing price at the beginning of that specific year.

| Brief Analysis: Our plot reveals a consistent upward trajectory over the years, with a notable peak in late 2021, marking the highest closing price. However, a substantial decline is observed from the commencement of 2022 until 2023. Interestingly, the stock demonstrates a renewed upward trend from the early months of 2023, showcasing dynamic market movements and suggesting a resilient growth pattern amid fluctuations.

a1 = google_price %>% 
  filter(Category == "Close") %>% 
  ggplot(aes(Date, Price)) +
  geom_line(col = "olivedrab")

a1 + 
  scale_x_date(date_breaks = "1 year", 
               date_labels = "%Y") +
  labs(y = "Price (in US Dollar) \n",
       x = "\n Date",
       title = "Alphabet Stock Market Price\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, 
                                   hjust = 1),
        text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"))


2. Historical Annual Stock Price Data


| Problem: What does the historical annual stock price data for Alphabet reveal? Which year boasts the highest or lowest average stock price?


| Explanation: The table provides a comprehensive overview of Alphabet’s historical annual stock price data, featuring key metrics for each year. The “Year” column indicates the respective calendar year. “Average Stock Price” represents the mean value of stock prices throughout the year. “Year Open” and “Year Close” signify the opening and closing prices for the year, respectively. “Year High” and “Year Low” highlight the maximum and minimum stock prices during the year. The “Annual Change (%)” column reflects the percentage change in stock price from the opening to the closing of the year. This table is crucial for investors, analysts, and stakeholders as it encapsulates vital information for evaluating historical stock performance, identifying trends, and making informed investment decisions.


| Brief Analysis: The analysis of Alphabet’s historical annual stock price data reveals significant trends and variations. The year 2021 stands out with the highest average stock price, while 2005 marks the lowest. Notably, the year 2009 experienced a remarkable annual change, reaching almost 101 percent compared to the previous year. The annual change for each year showcases substantial variability, reflecting the dynamic nature of market conditions. Recent years, particularly 2022, witnessed a decline in annual change after three consecutive years of positive shifts.

t1 = google %>% 
  mutate(Year = year(Date), 
         Month = month(Date), 
         Date_in_a_month = mday(Date),
         Date_in_a_year = yday(Date)) %>% 
  filter(!Year %in% c(2004, 2023)) %>% 
  group_by(Year) %>%    #closing price only state in the table
  summarize("Average Stock Price" = mean(Close),
            Year_Open = Open[Date == min(Date)],
            Year_High = max(High),
            Year_Low = min(Low),
            Year_Close = Close[Date == max(Date)],
            Annual_Change = (Year_Close - Year_Open) / Year_Open * 100) %>% 
  arrange(desc(Year))

knitr::kable(t1, 
             col.names = c("Year", 
                           "Average Stock Price", 
                           "Year Open",
                           "Year High",
                           "Year Low",
                           "Year Close",
                           "Annual Change (%)"),
             caption = 'Table 2.1 Alphabet Historical Annual Stock Price Data')
Table 2.1 Alphabet Historical Annual Stock Price Data
Year Average Stock Price Year Open Year High Year Low Year Close Annual Change (%)
2022 115.193719 144.475494 152.10001 83.449997 88.730003 -38.584738
2021 125.530687 87.876999 151.85001 84.949997 144.679504 64.638650
2020 74.070191 67.077499 92.36000 50.676800 87.594002 30.586266
2019 59.419653 50.828499 68.25000 50.703499 66.850998 31.522668
2018 55.661257 52.417000 63.69450 48.505501 51.780499 -1.214303
2017 46.089042 38.940498 53.92450 38.790001 52.320000 34.358834
2016 37.174335 37.150002 40.83400 33.153000 38.591000 3.878864
2015 30.100284 26.378079 38.99900 24.311253 37.944000 43.846718
2014 27.990164 27.782366 30.60728 24.383057 26.247936 -5.523034
2013 22.023519 17.918339 27.92035 17.323069 27.913124 55.779643
2012 16.010401 16.262545 19.28721 13.861045 17.618462 8.337668
2011 14.171237 14.856315 16.10862 11.781341 16.087200 8.285268
2010 13.340574 15.615220 15.71236 10.800268 14.793799 -5.260385
2009 10.951161 7.686190 15.59131 7.042354 15.441621 100.900844
2008 11.577903 17.257067 17.36915 6.159413 7.662529 -55.597732
2007 13.418465 11.606496 18.61124 10.884203 17.222446 48.386271
2006 10.241246 10.523555 12.77711 8.257798 11.469011 8.984193
2005 6.918047 4.916571 11.11359 4.298140 10.332770 110.162124


3. Annual Price Range


| Problem: What is the annual price change in Alphabet’s stock price, and which year exhibits the widest range (lowest to highest)?


| Explanation: Plotting Alphabet’s stock market annual price range provides a concise overview of how the stock’s price has fluctuated throughout each year. This is important for investors and analysts to gauge the volatility and stability of Alphabet’s stock over time.


| Brief Analysis: The analysis of Alphabet’s stock market annual price range reveals distinct trends over the years. Initially, the range between the highest and lowest prices remained relatively stable during the early years of trading. However, in 2021, there was a notable surge in the highest price compared to previous years, indicating a potential uptrend in market sentiment or company performance. Interestingly, the widest range was observed in 2022, suggesting increased volatility or uncertainty in the market during that period.

a13 = google_price %>% 
  mutate(Year = year(Date)) %>% 
  filter(Category %in% c("Low", "High")) %>% 
  ggplot(aes(Year, Price, group = Year)) +
  geom_boxplot(col = "purple")

a13 +
  labs(y = "Price (in US Dollar) \n",
       x = "\n Year",
       title = "Alphabet Annual Price Range\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"))


4. Daily Price Range in 2023


| Problem: How does Alphabet’s stock price differ (highest minus lowest) per day for the entire 2023 records, and when did the highest range occur?


| Explanation: Plotting Alphabet’s stock market daily price range in 2023 provides a granular view of how the stock’s price fluctuated on a day-to-day basis throughout the year. This information is crucial for investors and traders as it allows them to assess the intraday volatility, identify patterns or trends, and make timely decisions regarding buying, selling, or holding the stock.


| Brief Analysis: The analysis of Alphabet’s stock market daily price range in 2023 reveals consistent patterns, with ranges typically peaking at the beginning and end of each month while dipping during the mid-month period, except for March. Interestingly, March deviates from this trend, indicating potential market-specific factors influencing price movements during that month. Notably, the lowest range occurred at the end of February, suggesting decreased volatility or stability in the market during that period, while the highest range was observed at the beginning of June, signifying heightened intraday price fluctuations.

a14 = google_2023 %>% 
  mutate(price_range = High - Low) %>% 
  ggplot(aes(Date, price_range)) +
  geom_bar(stat = "identity",
           col = "olivedrab")

a14 +
  scale_x_date(date_breaks = "1 month",
               date_labels = "%b") +
  labs(y = "Price Difference (in US Dollar) \n",
       x = "\n Date",
       title = "Alphabet Daily Price Range in 2023\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"))


5. Stock Market Candlestick Representation in 2023


| Problem: During which periods of the year 2023 is the closing price higher than the opening price, and vice versa?


| Explanation: Candlestick charts are a visual representation of stock prices, displaying the open, high, low, and close prices for a specific time period. Each “candlestick” on the chart represents the trading activity within that timeframe. The body of the candlestick is colored, typically green or red, to indicate whether the closing price is higher or lower than the opening price. A green (or hollow) candlestick suggests a bullish market, where the close is higher than the open, symbolizing upward price movement. Conversely, a red (or filled) candlestick indicates a bearish market, signifying a close lower than the open and a potential downward trend.


| Brief Analysis: Our plot reveals a consistent upward trajectory, aligning with the broader trend observed in the stock market. Noteworthy instances include a substantial reduction in price range around February and the end of October, indicative of potential market stability or specific influencing factors during those periods. The presence of a cluster of green candles in late April and mid-May suggests bullish market sentiment, corresponding to periods of increased closing prices compared to openings. Conversely, pronounced occurrences of red candles at the beginning of February, the end of July, and the end of October signal bearish tendencies.

a3 = google_2023 %>% 
  plot_ly(x = ~Date, 
          type = "candlestick",
          open = google_2023$Open, 
          close = google_2023$Close,
          high = google_2023$High, 
          low = google_2023$Low) 

a3 %>% 
  layout(title = list(text = "Alphabet Stock Market Analysis: Candlestick Representation",
                      font = list(size = 25)),
         xaxis = list(rangeslider = list(visible = F)),
         yaxis = list(title = 'Price (in US Dollar)'),
         font = list(family = 'Latin Modern'),
         margin = list(
           l = 100,   
           r = 70,   
           b = 80,   
           t = 100   
         ))


6. Comparison of Alphabet Open and Close Frequencies


| Problem: Is there a prevailing trend where the closing price consistently surpasses the opening price, or vice versa?


| Explanation: The comparison of Alphabet’s open and close frequencies provides valuable insights into the stock’s daily trading dynamics. An abundance of records where the close price consistently exceeds the open may suggest sustained positive market sentiment, while frequent instances of the open price surpassing the close could indicate potential bearish trends.


| Brief Analysis: The analysis of the plot depicting Alphabet’s open and close frequencies reveals a relatively balanced distribution, indicating a comparable occurrence of instances where the close price is higher than the open and vice versa. The near equilibrium between these occurrences suggests a dynamic and fluctuating market sentiment, with neither bullish nor bearish trends dominating consistently.

a4 = google %>% 
  mutate(Case = ifelse(Open > Close, 
                       "Open > Close", 
                       "Close > Open")) %>% 
  select(Case) %>% 
  group_by(Case) %>% 
  summarize(Count = n() / 4842*100) %>% 
  ggplot(aes(x = Case,y = Count, fill = Case)) +
  geom_bar(stat = "identity",
           width = 0.6)

a4 +
  labs(y = "Percentage \n",
       x = "\n Frequency Comparison",
       title = "Alphabet Open and Close Frequencies\n",
       caption = "\nSource: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        legend.position = "none",
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        axis.line.x = element_line(),
        axis.ticks = element_blank(),
        axis.title.x=element_blank()) +
  coord_cartesian(ylim=c(0,65)) +
  geom_text(aes(x = "Close > Open", 
                y = 55, label = "50,4%")) +
  geom_text(aes(x = "Open > Close", 
                y = 54, label = "49,6%")) +
  scale_fill_manual(values = c("Close > Open" = "olivedrab", 
                               "Open > Close" = "firebrick"))


7. Simple Moving Average (Rolling Mean)


| Problem: How does Alphabet’s moving average change over 50, 100, and 200 days, and does it mirror the movement of the close price?


| Explanation: Plotting Alphabet’s stock market moving average changes over 50, 100, and 200 days provides insights into the stock’s trend and volatility over different time horizons. Moving averages are calculated by averaging the closing prices of a stock over a specified period, such as 50, 100, or 200 days. These averages smooth out short-term fluctuations, revealing underlying trends in the stock’s price movement. This analysis is crucial for investors as it helps identify long-term trends, short-term fluctuations, and potential buy or sell signals.


| Brief Analysis: The analysis of the plot reveals that the moving average changes over 50, 100, and 200 days exhibit a strong similarity to the close price plot across all observed days. This suggests that the moving averages effectively capture the underlying trend and volatility of Alphabet’s stock price over various time horizons.

a12 = google %>% 
  mutate(SMA_50 = SMA(Close, n = 50),
         SMA_100 = SMA(Close, n=100),
         SMA_200 = SMA(Close, n = 200)) %>% 
  pivot_longer(-Date, 
               names_to = "Category",
               values_to = "Value") %>% 
  filter(Category %in% c("Close",
                         "SMA_50",
                         "SMA_100",
                         "SMA_200")) %>% 
  filter(!is.na(Value)) %>% 
  ggplot(aes(Date, Value, col = Category)) +
  geom_line()

a12 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "Price (in US Dollar) \n",
       x = "\n Date",
       title = "Alphabet Moving Average\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        axis.text.x = element_text(angle = 30, 
                                   hjust = 1)) +
  scale_color_manual(
    values = c("Close" = "red", 
               "SMA_50" = "blue", 
               "SMA_100" = "green", 
               "SMA_200" = "yellow"),  # Specify colors
    breaks = c("Close",
               "SMA_50",
               "SMA_100",
               "SMA_200"),
    labels = c("Close",
               "SMA for 50 days",
               "SMA for 100 days",
               "SMA for 200 days")
  ) +
  theme(
    legend.position = c(0.215, 0.695),  
    legend.background = 
      element_rect(fill = "white", 
                   color = "black"))


8. Cumulative Return


| Problem: How has Alphabet’s cumulative return evolved over time?


| Explanation: The cumulative return plot for Alphabet provides a snapshot of the overall performance and growth of the stock over the specified time period. Cumulative return in stock market analysis refers to the average return over a period of time, such as a day, week, or month. This visualization is crucial for investors as it illustrates the overall profitability of an investment over time. Positive cumulative returns indicate profit, while negative values suggest losses.


| Brief Analysis: The cumulative return plot for Alphabet Inc. mirrors the movement observed in the stock market price plot, displaying a pattern of fluctuations over time. As the stock market experiences variations in value, the cumulative return, which accumulates these changes, naturally exhibits corresponding oscillations. This alignment is expected, as the cumulative return serves as a cumulative measure of the stock’s overall performance.

a7 = google %>% 
  mutate(daily_return = Close / lag(Close) - 1) %>% 
  filter(!is.na(daily_return)) %>% 
  mutate(cumulative_return = cumprod(1 + daily_return) - 1) %>% 
  ggplot(aes(Date, cumulative_return))+
  geom_line(col = "#003366", linewidth = 1)+
  geom_bar(stat = "identity",
           fill = "#FF8C00")

a7 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "Price (in US Dollar) \n",
       x = "\n Date",
       title = "Alphabet Cumulative Return\n", 
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        axis.text.x = element_text(angle = 30, 
                                   hjust = 1))


9. Average Monthly Return


| Problem: What is Alphabet’s average monthly return, and which month experiences the highest and lowest average return?


| Explanation: Plotting Alphabet’s stock market average monthly return provides a summarized view of the stock’s performance on a monthly basis, showing the average change in price over each month. Monthly return refers to the percentage change in the stock’s price over a single month, calculated as the difference between the closing price at the end of the month and the closing price at the beginning of the month, divided by the initial price and multiplied by 100. This information is crucial for investors as it allows them to assess the stock’s trend over time, identify seasonal patterns or trends, and evaluate the stock’s overall volatility and stability.


| Brief Analysis: The analysis of Alphabet’s stock market average monthly return unveils several key insights. Firstly, the returns tend to fluctuate, indicating dynamic market conditions. Interestingly, the last quarter exhibits stability and positivity in returns, suggesting potential market resilience or favorable company developments. Notably, October emerges as a standout month with the highest return, nearly reaching 0.004, possibly driven by positive market catalysts or strong financial performance. Conversely, February stands out with the lowest return, dipping to -0.001, indicating a period of underperformance.

a15 = google %>% 
  mutate(daily_return = Close / lag(Close) - 1,
         Month = month(Date)) %>% 
  filter(!is.na(daily_return)) %>% 
  group_by(Month) %>% 
  summarize(avg = mean(daily_return)) %>% 
  ggplot(aes(factor(Month), avg)) +
  geom_bar(stat = "identity",
           aes(fill = avg < 0))

a15 +
  scale_fill_manual(values = c("olivedrab","firebrick"),
                    guide = FALSE) +
  scale_x_discrete(
    labels = month.abb) +
  labs(y = "Return \n",
       x = "\n Month",
       title = "Alphabet Average Monthly Return\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 20, "pt"),
        legend.position = "none")


10. Day of the Week Effect


| Problem: How does Alphabet’s average daily return vary within a week, and which day witnesses the highest and lowest average return?


| Explanation: Plotting Alphabet’s stock market average daily return provides a snapshot of the stock’s performance on a day-to-day basis, showcasing the average change in price over each trading day. Daily return refers to the percentage change in the stock’s price over a single trading day, calculated as the difference between the closing price at the end of the day and the closing price at the beginning of the day, divided by the initial price and multiplied by 100.


| Brief Analysis: The analysis of Alphabet’s stock market average daily return reveals notable patterns in its performance throughout the trading week. Firstly, returns tend to surge on Tuesday and Wednesday, with the highest peak observed on Wednesday. This trend suggests increased investor activity and potentially positive market sentiment mid-week. Conversely, Monday stands out with the lowest return, reaching only 0.0003, indicating a subdued start to the trading week.

a16 = google %>% 
  mutate(daily_return = Close / lag(Close) - 1,
         Day = factor(weekdays(Date), 
                      levels = c("Monday", 
                                 "Tuesday", 
                                 "Wednesday", 
                                 "Thursday", 
                                 "Friday"))) %>% 
  filter(!is.na(daily_return)) %>% 
  group_by(Day) %>% 
  summarize(avg = mean(daily_return)) %>% 
  ggplot(aes(Day, avg)) +
  geom_bar(stat = "identity",
           aes(fill = Day))

a16 +
  labs(y = "Return \n",
       x = "\n Day",
       title = "Alphabet Day of the Week Effect\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 20, "pt"),
        legend.position = "none") 


11. Daily Volume Average


| Problem: What is Alphabet’s average daily trading volume within a week, and which day experiences the highest or lowest average volume?


| Explanation: Analyzing Alphabet’s average daily trading volume within a week provides insights into the variations in trading volume on specific days, helping identify potential trends or anomalies. Monitoring these fluctuations is crucial for investors and traders as it can influence decision-making, such as determining optimal entry and exit points based on the market’s liquidity and activity levels.


| Brief Analysis: The analysis of the plot reveals that Friday stands out with the highest volume average, indicating increased market activity, possibly influenced by end-of-week trading strategies. In contrast, Monday records the lowest volume average, suggesting a subdued start to the trading week. The remaining days exhibit relatively similar average volume levels.

a5 = google_volume %>% 
  mutate(Day = wday(Date, label = TRUE, abb = FALSE)) %>% 
  group_by(Day) %>% 
  summarize(Average_per_Day = mean(Volume) / 10**8) %>% 
  ggplot(aes(Day, Average_per_Day))+
  geom_bar(stat = "identity", aes(fill = Day))+
  scale_y_continuous(trans = "log10")

a5 +
  labs(y = "Volume (in Hundred Millions) \n",
       x = "\n Day",
      title = "Alphabet Daily Volume Average\n",
      caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        legend.position = "none") 


12. Daily High and Low Average


| Problem: What are the average daily highest and lowest prices for Alphabet within a week, and which day exhibits the highest average high price and the lowest average low price?


| Explanation: The plotted average daily highest and lowest prices for Alphabet within a week offer insights into the stock’s weekly performance dynamics. This analysis is crucial as it provides a concise overview of the price behavior throughout the trading week. Identifying patterns and trends helps market participants comprehend the stock’s weekly volatility and potential trading opportunities.


| Brief Analysis: The analysis of the plot reveals distinct patterns in Alphabet’s average daily high and low prices within a week. Notably, Thursday stands out as the day with the highest average highest and average lowest prices, reflecting potential market activity peaks. In contrast, Monday exhibits the lowest averages for both highest and lowest prices. The remaining days of the week show a relatively consistent pattern.

a6 = google %>% 
  mutate(Day = wday(Date, 
                    label = TRUE, 
                    abb = FALSE)) %>% 
  group_by(Day) %>% 
  summarize(average_high = mean(High),
            average_low = mean(Low)) %>% 
  pivot_longer(-Day,
               names_to = "Category",
               values_to = "Price") %>% 
  mutate(Category = as.factor(Category)) %>% 
  ggplot(aes(Day, Price, 
             group = Category, 
             fill = Category)) +
  geom_bar(stat="identity",
           position = "identity") +
  coord_cartesian(ylim = c(40,42))
  
a6 +
  labs(y = "Price (in US Dollar) \n",
       x = "\n Day",
       title = "Alphabet Daily Open and Close Average\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        legend.position = "bottom") +
  scale_fill_manual(values = c("average_high" = "#003366", 
                               "average_low" = "#FFD700"),
                    name = "Price Types: ",
                    labels = c("Average High",
                               "Average Low"))


13. Stock Market Volume


| Problem: How has the stock market volume for Google changed over time?


| Explanation: Stock market volume refers to the total number of shares traded within a specific period, typically during a trading day. It is a crucial indicator of market activity and liquidity. High volume often accompanies significant price movements. Each year label on the x-axis marks the first recorded closing price at the beginning of that specific year.

| Brief Analysis: Our plot reveals a general decreasing trend over the years, marked by notable fluctuations in shorter periods. Notably, the volume peaked at the beginning of 2006, reflecting heightened market activity during that period. However, a significant decline is observed towards the end of 2022, suggesting potential shifts in investor sentiment or changes in market dynamics.

a2 = google_volume %>% 
  ggplot(aes(Date, Volume /  10^9)) +
  geom_line(col = "#1f77b4")

a2 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "Volume (in Billions) \n",
       x = "\n Date",
       title = "Alphabet Stock Market Volume\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, 
                                   hjust = 1),
        text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"))


14. Variable Correlation Analysis


| Problem: How do the variables in our dataset correlate with each other, and are there any variables that significantly influence one another?


| Explanation: The correlation analysis of Alphabet’s stock market variables, including open, close, high, low, and volume, reveals the strength and direction of relationships between variables. The correlation coefficient ranges from -1 to 1, indicating perfect positive (implying that as one variable increases, the other also increases proportionally), perfect negative (suggesting that as one variable increases, the other decreases proportionally), or no linear correlation (signifies a weak or no linear correlation between the variables.)


| Brief Analysis: The correlation analysis of Alphabet’s stock market variables reveals a perfect positive relationship (correlation coefficient of 1) among open, close, high, and low prices, indicating a strong linear association. On the other hand, volume demonstrates a moderate negative correlation (approximately -0.47) with the rest of the variables.

#new data: Alphabet's Variables Correlation Matrix
correlation_matrix = google %>% 
  select(-Date) %>% 
  cor()

correlation_melted = melt(correlation_matrix)

a11 = correlation_melted %>% 
  ggplot(aes(Var1, Var2, fill = value,
             label = sprintf("%.2f", value))) +
  geom_tile(color = "white")

a11 +
  geom_text(size = 3, color = "white", 
            show.legend = FALSE) +
  labs(y = "",
       x = "",
       title = "Variable Correlation Analysis",
       subtitle = "\n1: perfect positive, -1: perfect negative, ~0: non-linear \n", 
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  scale_fill_gradient(low = "#8B6969", high = "firebrick") +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        legend.position = "none")


15. Volatility Analysis


| Problem: How has Alphabet’s volatility changed over time, and during which period did it reach its peak?


| Explanation: Alphabet Inc.’s volatility analysis plot captures the fluctuations and variations in the stock’s price over time. Stock market volatility is the measure of how far the current price of an asset deviates from its average past prices, indicating the strength or conviction behind a price movement. A higher volatility implies greater price variability, indicating a potentially riskier asset. Understanding volatility is fundamental for effective risk management and optimizing investment portfolios.


| Brief Analysis: The volatility analysis for Alphabet Inc. reveals noteworthy trends in the stock’s price fluctuation. Notably, volatility reached its zenith in the early months of 2008, coinciding with a period of heightened market uncertainty. Subsequently, it experienced a significant downturn at the close of 2008 and the onset of 2020, possibly indicative of more stable market conditions during those periods. The overall pattern showcases considerable fluctuations, highlighting the dynamic nature of the stock’s price movements over the analyzed timeframe. Interestingly, recent observations in 2023 exhibit periods with volatility registering below zero, suggesting potential anomalies (outliers) or unique market dynamics during that period.

a8 = google %>% 
  mutate(daily_return = Close / lag(Close) - 1) %>% 
  filter(!is.na(daily_return)) %>% 
  ggplot(aes(Date, daily_return)) +
  geom_line(col = "#4169E1")

a8 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "Volatility \n",
       x = "\n Date",
       title = "Alphabet Volatility Analysis",
       subtitle = "\nNegative volatility is uncommon and may indicate anomalies. \n", 
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        axis.text.x = element_text(angle = 30, 
                                   hjust = 1))


16. Volatility Distribution


| Problem: How is Alphabet’s volatility distributed in a density curve, and does it adhere to a normal distribution?


| Explanation: Analyzing Alphabet’s volatility distribution with a density curve allows us to understand the pattern of volatility fluctuations over time. Comparing it to a normal density curve helps identify deviations from expected market behavior. This is crucial for assessing whether Alphabet’s volatility follows a typical market trend or if there are unique patterns requiring attention.


| Brief Analysis: The analysis of Alphabet’s volatility distribution reveals a peak higher than the normal distribution, suggesting periods of intensified market activity. Additionally, the distribution appears slimmer, indicating less variability than the expected normal market conditions. This can also be caused by anomalies incurred inside our data.

#new data: Alphabet's daily return
google_dr = google %>% 
  mutate(daily_return = Close / lag(Close) - 1) %>% 
  filter(!is.na(daily_return))

a9 = google_dr %>% 
  ggplot(aes(daily_return)) +
  geom_density(aes(col = "Volatility Dens."),
               size = 1) +
  stat_function(
    fun = dnorm,
    args = list(mean = mean(google_dr$daily_return), 
                sd = sd(google_dr$daily_return)),
    size = 1,
    linetype = 1,
    aes(color = "Normal Dist.")
  )

a9 +
  labs(y = "Density \n",
       x = "\n Daily Return",
       title = "Alphabet Volatility Distribution\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt")) +
  scale_color_manual(values = c("#003366","firebrick"),
                     name = "Density Curve") +
  theme(
    legend.position = c(0.87, 0.8),  
    legend.background = 
      element_rect(fill = "white", 
                   color = "black"))


17. Volatility Normality


| Problem: To what extent does Alphabet’s volatility align with a normal distribution?


| Explanation: Plotting volatility distribution using a QQ plot is important to assess whether the distribution of volatility follows a normal distribution. The QQ plot visually compares the quantiles of the observed data with the quantiles of a theoretical normal distribution. If the points on the QQ plot closely align with a straight line, it suggests that the data is approximately normally distributed. This analysis is crucial in understanding the statistical properties of the volatility data and can guide further modeling or analysis decisions based on the assumed distribution.


| Brief Analysis: The analysis of Alphabet’s stock market volatility distribution using a QQ plot reveals a significant deviation from normality, indicating that a substantial portion of the data does not conform to a standard normal distribution. This departure from normality is likely influenced by potential outliers within the dataset or dynamic market at that time.

#new data: Alphabet's daily return
daily_return = google %>% 
  mutate(daily_return = Close / lag(Close) - 1) %>% 
  filter(!is.na(daily_return)) %>% 
  select(daily_return)

a10 = daily_return %>% 
  ggplot(aes(sample = daily_return)) +
  stat_qq(col= "#FF8C00") +
  stat_qq_line()

a10 +
  labs(y = "",
       x = "",
       title = "Alphabet Volatility Normality\n",
       caption = "Source: Kaggle, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"))


18. Quarterly Shares Outstanding 2009-2023


| Problem: What is Alphabet’s shares outstanding per quarter from 2009 to 2023, and when does the shares outstanding reach its peak?


| Explanation: Shares outstanding refer to the total number of a company’s shares that are currently owned by shareholders, including share blocks held by institutional investors and restricted shares owned by the company’s officers and insiders. The objective of plotting shares outstanding is to analyze the changes in the number of shares over time, which can provide insights into the company’s stock issuance and buyback activities, investor demand, and overall market strategy.


| Brief Analysis: The highest number of shares outstanding for Alphabet Inc. occurred in the second quarter of 2018, reaching over 15 million shares. Conversely, the lowest point was at the beginning of 2009, with less than 13 million shares outstanding. A sudden increase in shares outstanding was observed in the first quarter of 2014, which triggered a consistent upward trend until the end of 2019. This trend showed a decline starting in 2020, likely due to the impact of the COVID-19 pandemic.

a17 = shares %>% 
  ggplot(aes(Date, Shares)) +
  geom_bar(stat = "identity",
           fill = "firebrick") +
  coord_cartesian(ylim = c(12,15.))

a17 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "Shares (in Million) \n",
       x = "\n Date",
       title = "Alphabet Quarterly Shares Outstanding\n",
       caption = "Source: macrotrends.net, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 30, "pt"),
        axis.text.x = element_text(angle = 30, 
                                   hjust = 1))


19. Market Capitalization 2009-2023


| Problem: How does Alphabet’s market capitalization change over time, and when does it reach its peak?


| Explanation: Market capitalization (market cap) is the total market value of a company’s outstanding shares of stock, calculated by multiplying the current share price by the total number of outstanding shares. The objective of plotting market capitalization is to track the company’s valuation over time, reflecting its growth, investor sentiment, and market conditions.


| Brief Analysis: Alphabet’s market capitalization reached its peak at the beginning of 2022, surpassing 2000 (units). From 2009 to 2020, the market cap showed a gradual increase, followed by a rapid rise from 2020 to its peak in early 2022. In 2022-2023, there was a significant decline, with the market cap almost losing 500 units within a year. However, since the beginning of 2023, there has been a recovery trend, indicating a gradual increase in market capitalization.

a18 = left_join(shares, google_price, by = "Date") %>% 
  filter(!is.na(Price) & Category == "Close") %>% 
  mutate(Market_cap = Shares * Price) %>% 
  ggplot(aes(Date, Market_cap)) +
  geom_line(size = 1,
            col = "#4169E1")

a18 +
  scale_x_date(date_breaks = "1 year",
               date_labels = "%Y") +
  labs(y = "",
       x = "\n Date",
       title = "Alphabet Market Capitalization\n",
       caption = "Source: Kaggle and macrotrends.net, 2023\n Data Visualization by Radithya Chaidir") +
  theme_bw() +
  theme(text = element_text(family = "LM Roman 10"),
        plot.title = element_text(size = 20,
                                  hjust = 0.5),
        plot.margin = margin(30, 30, 20, 20, "pt"),
        axis.text.x = element_text(angle = 30, 
                                   hjust = 1))


VI. Conclusion


      This Exploratory Data Analysis (EDA) of Alphabet Inc. (GOOGL) stock price from 2004 to 2023 revealed significant patterns and trends. The stock price consistently increased, peaking in late 2021 before declining through 2022 and rebounding in early 2023. Historical data showed 2021 had the highest average stock price, while 2005 had the lowest. The most remarkable annual change was in 2009, with a nearly 101 percent increase.
      Daily price ranges in 2023 were highest at the beginning and end of each month, with June having the widest range. The frequency analysis showed a balanced distribution of open and close prices, and the correlation analysis highlighted a strong positive relationship among price variables, with volume showing a moderate negative correlation.
      Volatility peaked in early 2008 and again in early 2020, reflecting market sensitivity. Shares outstanding peaked in the second quarter of 2018 and declined after 2019 due to the COVID-19 pandemic. Market capitalization reached its highest in early 2022, with a significant decline in 2022-2023, followed by recovery in early 2023.
      In summary, this EDA provided a detailed view of Alphabet Inc.’s stock performance, revealing growth, volatility, and resilience, offering valuable insights for investors and analysts.


VII. Sources


[1] S. Navami. (2024). Complete Exploratory Data Analysis (EDA) using Python. Medium.

[2] Joshi, S. (2023). Google Stock Price (2004-2023). Kaggle. Retrieved January, 2024, from here.

[3] Slamet, I., Laela, N., & P. (2016). Analysis of Calendar Anomaly in Indonesia Stock Market using Stochastic Dominance. AIP Publishing.

[4] Alphabet Shares Outstanding 2010-2024|GOOGL. Retrieved January, 2024, from here.

[5] Selvam, M., Raja, M., & Mozhi, P. (2007). Forecasting the Time Volatility of Emerging Asian Stock Market Index. Asia Pasific Business Review.