In this data analysis case study, I will utilize R as my data analytics tool of choice. My task is to analyze and compare the performance of four prominent companies in the stock market - Apple, Microsoft, Netflix, and Google. We will be working with historical stock price data for these companies over the past three months.
The primary objective of this analysis is to gain insights into the trends and patterns exhibited by the stock prices of these companies. I will examine their price movements, calculate moving averages and assess volatility between the different stock prices.
The dataset utilized in this case study, was provided by Kaggle user Amir Motefaker. It encompasses a comprehensive set of data for each company, including the date, opening price, highest price, lowest price, closing price, adjusted closing price and volume.These data points will serve as the foundation for my analysis.
Commencing the preparation stage for the analysis phase by loading the required libraries, uploading the necessary dataset, and performing data cleaning.
Loading essential libraries needed for the analysis process
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(tidyquant)
## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
##
## Attaching package: 'PerformanceAnalytics'
##
## The following object is masked from 'package:graphics':
##
## legend
##
## Loading required package: quantmod
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(TTR)
Load the CSV dataset into the working environment.
#importing csv file
stocks <- read.csv("stocks.csv")
Explore the dataset to determine the dimensions, data types, and overall structure of the data.
#look through the dataset
glimpse(stocks)
## Rows: 248
## Columns: 8
## $ Ticker <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAP…
## $ Date <chr> "2023-02-07", "2023-02-08", "2023-02-09", "2023-02-10", "202…
## $ Open <dbl> 150.64, 153.88, 153.78, 149.46, 150.95, 152.12, 153.11, 153.…
## $ High <dbl> 155.23, 154.58, 154.33, 151.34, 154.26, 153.77, 155.50, 156.…
## $ Low <dbl> 150.64, 151.17, 150.42, 149.22, 150.92, 150.86, 152.88, 153.…
## $ Close <dbl> 154.65, 151.92, 150.87, 151.01, 153.85, 153.20, 155.33, 153.…
## $ Adj.Close <dbl> 154.4142, 151.6884, 150.6400, 151.0100, 153.8500, 153.2000, …
## $ Volume <int> 83322600, 64120100, 56007100, 57450700, 62199000, 61707600, …
Upon examination, it was observed that the “Date” column in the dataset is being interpreted as a character data type. Given the historical nature of the dataset, it is essential to convert the data type of the “Date” column to the appropriate format, namely Date.
#convert data type from character to date
stocks$Date <- ymd(stocks$Date)
Verify that the data type has been successfully updated.
#confirm that the data type has been updated
class(stocks$Date)
## [1] "Date"
Modify the column name “Ticker” to “Company” for improved clarity and comprehension.
#rename column "Ticker" to "Company"
stocks <- stocks %>%
rename(Company = Ticker)
Confirm that the column name has been successfully updated, it is necessary to verify the changes.
#confirm that the column name has been updated
colnames(stocks)
## [1] "Company" "Date" "Open" "High" "Low" "Close"
## [7] "Adj.Close" "Volume"
Verifying that the data frame does not contains any missing values and eliminate any duplicated entries if found.
#check if data frame contains any missing values and delete any duplicates
unique (unlist (lapply (stocks, function (x) which (is.na (x)))))
## integer(0)
Inspect the data frame to identify the unique company names listed in the “Company” column.
#list distinct values in the Company column
unique(stocks$Company)
## [1] "AAPL" "MSFT" "NFLX" "GOOG"
Entering into the analysis phase, where the focus will be on investigating the movement of stock prices, computing moving averages, and evaluating volatility.
Generate visualizations depicting the daily stock price movement for each company.
#Price Movement in Daily stock Prices of each company
p <- ggplot(data = stocks) +
geom_line(aes(x = Date , y = Close, color = Company)) +
labs(title = "Stock Market Movement within the Last Three Months", subtitle = "Apple, Google, Microsoft and Netflix") +
ylab("Closing Price") +
scale_color_manual(values = c("red", "green", "black", "darkviolet")) +
theme_bw()
ggplotly(p) %>%
layout(title = list(text = paste0("Stock Market Movement Within The Last Three Months",
'<br>',
'<sup>',
"Apple, Google, Microsoft and Netflix",
'</sup>')))
Calculate the average volume for each company and create visualizations to depict the results.
#calculate average volume of stock for each company
average_volume <- stocks %>%
group_by(Company) %>%
summarise(Volume = mean(Volume))
#plot average volume of stock
ggplot(data = average_volume) +
geom_col(aes(x = Company, y = Volume, fill = Company)) +
labs(title = "Average Volume of Stock", subtitle = "Apple, Google, Microsoft and Netflix") +
scale_y_continuous(labels = scales::comma_format()) +
scale_fill_manual(values = c("red", "green", "black", "darkviolet")) +
theme_bw()
Calculate the 10-day and 20-day moving averages, and create corresponding plots to visualize their trends.
#Calculate 10 and 20 day moving average
selected_data <- stocks[,c("Company", "Date", "Close")]
selected_data <- selected_data %>%
group_by(Company) %>%
mutate(MA_10 = SMA(Close, n = 10)) %>%
mutate(MA_20 = SMA(Close, n = 20))
#Visualize moving Average of all companies
m <- ggplot(data = selected_data) +
geom_line(aes(x = Date, y = Close, color = Company), linetype = "solid") +
geom_line(aes(x = Date, y = MA_10, color = Company), linetype = "dashed") +
geom_line(aes(x = Date, y = MA_20, color = Company), linetype = "dotted") +
labs(x = "Date", y = "Price", title = "Moving Average Graph") +
scale_color_manual(values = c("red", "green", "black", "darkviolet")) +
theme_bw()
ggplotly(m)
Calculate and visualize the volatility of all four companies
#Calculate Volatility for all companies
volatility_data <- stocks %>%
group_by(Company) %>%
mutate(volatility = volatility(Close))
#plot volatilty
v <- ggplot(data = volatility_data) +
geom_line(aes(x = Date, y = volatility, color = Company)) +
scale_color_manual(values = c("red", "green", "black", "darkviolet")) +
labs(title = "Volatilty of All Companies") +
theme_bw()
ggplotly(v)
The stock prices of all four companies remained relatively stable throughout the examined three-month period, with no significant changes or fluctuations observed.
Apple emerged as the company with the highest stock volume sold during the analyzed timeframe, indicating strong market demand and desirability.
Analyzing the 10-day and 20-day moving averages provides valuable insights into the potential volatility of the four companies’ stocks. Notably, the visualization demonstrates that Netflix’s stock prices consistently deviate significantly from the moving averages.
The moving average graph offers valuable indications of stocks that may continue to experience a decline in prices, providing insights for potential downward trends.
Further analysis of each company’s volatility affirms that Netflix, along with Microsoft, exhibits higher levels of volatility compared to Apple and Google. These findings can assist buyers and investors in making informed decisions based on their risk tolerance.
Apple and Google stocks, in contrast, demonstrate lower volatility, suggesting a relatively more stable investment option compared to Netflix and Microsoft.
These key takeaways provide valuable insights into the stock performance, volume, volatility, and potential trends of the examined companies. They serve as a basis for making informed decisions and formulating investment strategies.
Consider Apple stocks: Apple has demonstrated strong market demand and desirability with the highest stock volume sold. This indicates a favorable market position and suggests that Apple stocks may have growth potential. It could be worth considering including Apple stocks in your investment portfolio.
Monitor Netflix stock closely: The analysis of the 10-day and 20-day moving averages indicates that Netflix’s stock prices consistently deviate significantly from the moving averages. This suggests higher volatility and potential for price fluctuations. If you are comfortable with higher risk, closely monitoring Netflix stocks may present opportunities for potential gains or strategic entry points.
Exercise caution with Microsoft stocks: Similar to Netflix, Microsoft exhibits higher levels of volatility. This implies greater potential for price swings and increased risk. If considering investing in Microsoft stocks, it is recommended to exercise caution, closely monitor market trends, and be prepared for potential fluctuations.
Evaluate risk tolerance: The analysis of volatility highlights the varying levels of risk associated with each company. For investors with a lower risk tolerance, Apple and Google stocks, which demonstrate lower volatility, may be more suitable options. On the other hand, investors with a higher risk tolerance might be open to exploring opportunities in Netflix and Microsoft stocks, considering their potential for higher returns but accompanied by higher risk.
It is important to note that these recommendations are based on the insights derived from this analysis, but individual investment goals, risk preferences, and financial circumstances should also be taken into account when making investment decisions. It is advisable to conduct further research and consult with a financial advisor before making any investment choices.