Data Visualisation - Exploratory Data Analysis of Tech Company Stock Prices

Background Information and Summary of the Dataset

The daily stock prices and volume of 14 different tech companies, including Alphabet (GOOGL), Meta Platforms (META), Apple (AAPL), Amazon (AMZN), and others, are included in the dataset. The folder contains 14 CSV files, each with a stock symbol for each of the 14 companies. From 2016 to 2021, data on these companies’ stock prices and volumes can be found in the dataset.

Executive Summary

This investigation has examined the stock prices of 14 major technology firms, including Tesla, Google, Netflix, Nvidia, Apple, Amazon, Adobe, Microsoft, Salesforce, Meta, Oracle, Cisco, Intel, and IBM. After examining their opening and closing stock prices, the percentage change in their prices from the beginning to the end of the dataset was calculated. Our examination uncovers that Tesla, Google, and Netflix bested the innovation business during the time span covered by our examination. Besides, the positive relationship among’s opening and closing expenses suggests that stock costs will as a general rule move in an expected manner throughout the span of the day. The innovation industry was experiencing growth during this time period, as evidenced by the consistent pattern of rising stock costs across all businesses. Financial backers who are keen on the innovation business and need to make all around informed speculations can profit from this examination.

Research Questions

How have the stock prices of various businesses changed over time?
What is the relationship between the stock prices of the various businesses?
Which organization had the most elevated stock cost development throughout the long term?

Characteristics of the Variables of Interest

The factors of interest in this dataset are the stock cost and volume of the various organizations. The volume is a discrete variable, while the stock price is a continuous variable.

Description of the Exploratory Data Analysis Techniques Used

This report employs data cleaning, data wrangling, and data visualization as exploratory data analysis methods. To eliminate any incorrect or missing data points from the dataset, data cleaning was required. Information fighting was utilized to blend the information from the different CSV records into a solitary dataset and to compute any extra factors required for the investigation. In order to investigate the data and respond to the research questions, data visualization was utilized.

Methodology

The stock market data is prepared for analysis by the code Below. The information consists of the daily stock prices of 15 businesses, each of which has its own stock symbol. The information is perused in from discrete CSV records for each organization and joined into one dataframe. The applicable segments for investigation are chosen, including the stock image, date, and changed close cost. Corporate actions like stock splits and dividends that could affect the closing price are taken into account in the adjusted close price. The missing values from any rows are removed. The pivot_wider() function is then used to convert the data from long to wide format, where each row represents a distinct date and each column represents a distinct stock symbol. The subsequent dataframe data_wide is likewise checked for any missing qualities and those are taken out. Reading in the data, selecting relevant columns, cleaning the data by removing any missing values, and transforming the data into a suitable format for analysis are all part of the overall data preparation process.

# Load necessary libraries
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'purrr' was built under R version 4.2.3

## Warning: package 'forcats' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

#set working directory
setwd("C:/Users/A S Computer/Downloads")

# Read in the data
AAPL <- read.csv("AAPL.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "AAPL")
AMZN <- read.csv("AMZN.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "AMZN")
GOOGL <- read.csv("GOOGL.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "GOOGL")
META <- read.csv("META.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "META")
MSFT <- read.csv("MSFT.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "MSFT")
NFLX <- read.csv("NFLX.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "NFLX")
NVDA <- read.csv("NVDA.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "NVDA")
ORCL <- read.csv("ORCL.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "ORCL")
CSCO <- read.csv("CSCO.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "CSCO")
CRM <- read.csv("CRM.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "CRM")
TSLA <- read.csv("TSLA.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "TSLA")
ADBE <- read.csv("ADBE.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "ADBE")
INTC <- read.csv("INTC.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "INTC")
IBM <- read.csv("IBM.csv", stringsAsFactors = FALSE) %>% mutate(symbol  = "IBM")

# Combine all data into one dataframe
data_1 <- rbind(AAPL, AMZN, GOOGL, META, MSFT, NFLX, NVDA, ORCL, CSCO, CRM, TSLA, ADBE, INTC, IBM)

table(data_1$symbol)

## 
##  AAPL  ADBE  AMZN   CRM  CSCO GOOGL   IBM  INTC  META  MSFT  NFLX  NVDA  ORCL 
##  3271  3271  3271  3271  3271  4431  3271  3271  2688  3271  3271  3271  3271 
##  TSLA 
##  3148

library(tidyverse)

# Select the relevant columns
data <- data_1 %>% select(symbol, Date, Adj.Close)

data <- na.omit(data)

# Pivot the data to wide format
data_wide <- data %>% pivot_wider(names_from = symbol, values_from = Adj.Close)

data_wide <- na.omit(data_wide)

# Add a year column
df <- data_1 %>% 
  mutate(year = year(Date))

# Calculate the average closing price per year
df_avg <- df %>% 
  group_by(symbol, year) %>% 
  summarize(avg_close = mean(Close))

## `summarise()` has grouped output by 'symbol'. You can override using the
## `.groups` argument.

# Create a line plot of the average closing price over the years
ggplot(df_avg, aes(x = year, y = avg_close, color = symbol)) +
  geom_line() +
  labs(title = "Average Closing stock Price Over the Years",
       x = "Year",
       y = "Average Closing Price")

The selected tech companies’ average closing stock prices over time are depicted in this graph. Each line represents a company, and the color of the line indicates which company it represents. The year is shown on the x-axis, and the average closing stock price is shown on the y-axis. The plot shows the patterns of normal stock costs for each organization throughout the long term. The plot shows that some businesses have stock prices that keep going up, while others have more fluctuation. The plot can shed light on the historical stock market performance of the selected companies.

# Create a correlation matrix
cor_mat <- cor(data_wide[, c(-1)])

# Plot the correlation matrix
ggplot(as.data.frame(as.table(cor_mat)), aes(Var1, Var2, fill = Freq)) +
  geom_tile() +
  scale_fill_gradient2(low = "green", high = "red", mid = "yellow", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, 
                                   size = 12, hjust = 1),
        axis.text.y = element_text(size = 12),
        axis.title = element_blank(),
        panel.grid.major = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.ticks = element_blank()) +
  coord_fixed()

This chart shows the relationship coefficients between the stock costs of different innovation organizations, including AAPL (Apple), AMZN (Amazon), GOOGL (Google), META (Facebook), MSFT (Microsoft), NFLX (Netflix), NVDA (Nvidia), ORCL (Prophet), CSCO (Cisco), CRM (Salesforce), TSLA (Tesla), ADBE (Adobe), INTC (Intel), and IBM. The correlation coefficient, whose values range from -1 to 1, is a measure of how strong the linear relationship between two variables is. A perfect positive correlation is represented by a value of 1, whereas a perfect negative correlation is represented by a value of -1. A value of 0 indicates that the two variables have no correlation. The correlation coefficients between the stock prices of these technology companies range from 0.71 (between NVDA and INTC) to 0.98 (between AMZN and ADBE) in this instance, as shown by the table. This suggests that these businesses’ stock prices typically trend in the same direction. It is important to remember that correlation does not always mean causation, and that the stock prices of these businesses may be influenced by things other than their close relationship.

highest stock price growth over the years

To figure out which organization had the most elevated stock cost development throughout the long term, we can work out the rate change in stock costs from the beginning of the dataset as far as possible for each organization and analyze them.

data_2 = data_1  %>% select(symbol, Date, Open, Close)

# Convert date column to a date format
data_2$Date <- ymd(data_2$Date)

# Calculate percentage change in stock price from start to end of dataset for each company
start_prices <- data_2 %>% 
  group_by(symbol) %>% 
  slice_min(Date) %>% 
  select(symbol, Open)

end_prices <- data_2 %>% 
  group_by(symbol) %>% 
  slice_max(Date) %>% 
  select(symbol, Close)

price_change <- left_join(start_prices, end_prices, by = "symbol") %>%
  mutate(percent_change = (Close - Open) / Open * 100) %>%
  arrange(desc(percent_change))

# Bar plot of percentage change in stock prices from start to end of dataset for each company
ggplot(price_change, aes(x = reorder(symbol, percent_change), y = percent_change)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Company", y = "Percentage change in stock price (%)",
       title = "Percentage Change in Stock Prices from Start to End of Dataset") +
  coord_flip()

The dataset appears to display the opening and closing prices for a group of companies’ stocks, in addition to the percentage change in their prices over some time. The following are the columns, and each row represents a distinct company: “symbol”: the company’s ticker symbol, “Open”: the stock’s “Close” opening price for the specified time period: the stock’s closing price during the specified time period (percent_change): the stock price’s percentage change over the specified time, expressed as ((Close - Open) / Open) x 100. The percentage change in stock prices between the beginning and end of the dataset, as well as their opening and closing prices, are presented in the table. By their symbol, the businesses are listed in alphabetical order. The outcomes show that the rate change in stock costs from the beginning to the furthest limit of the dataset fluctuates generally among the organizations. With a percentage change of 6985.31%, TSLA’s stock price changed the most, followed by GOOGL’s with a percentage change of 5557.22%. Then again, IBM had the most minimal rate change at 12.48%, which is essentially lower than the other organizations. We can see that TSLA and GOOGL had very high percentage changes when comparing the companies’ percentage changes, indicating that their stock prices significantly increased during the dataset period. NFLX, NVDA, and AAPL are among the other businesses with high percentage changes. ORCL, CSCO, INTC, and IBM are among the businesses whose percentage changes are relatively low. It is essential to keep in mind that the percentage change in stock prices can be influenced by a variety of factors, including the financial performance of the company, trends in the industry, and the state of the economy. As a result, it is not always possible to directly compare the percentage change between various businesses.

Results

When comparing the percentage changes of the companies, we can see that TSLA and GOOGL had very high percentage changes, indicating that their stock prices had significantly increased during the dataset period. Among the other businesses with high percentage changes are NFLX, NVDA, and AAPL. Companies like ORCL, CSCO, INTC, and IBM have relatively small percentage changes. It is vital for remember that the rate change in stock costs can be impacted by different variables, including the monetary presentation of the organization, patterns in the business, and the condition of the economy. Subsequently, it isn’t generally imaginable to straightforwardly think about the rate change between different organizations. The stock prices of 14 major technology companies, including Tesla, Google, Netflix, Nvidia, Apple, Amazon, Adobe, Microsoft, Salesforce, Meta, Oracle, Cisco, Intel, and IBM, have been examined in this investigation. The percentage change in their prices from the beginning to the end of the dataset was calculated after we looked at their opening and closing stock prices. The average closing stock prices of the selected tech companies over time. Over time, the patterns of each organization’s normal stock costs are depicted in the plot. The plot shows that a few organizations have stock costs that continue onward up, while others have more variance. The plot may provide insight into the selected companies’ previous stock market performance. The correlation coefficients between these technology companies’ stock prices range from 0.713 (for NVDA and INTC) to 0.986 (for AMZN and ADBE) in this instance. This suggests that the stock prices of these businesses typically move in the same way. It is essential to keep in mind that the stock prices of these businesses may be influenced by factors other than their close relationship, and that correlation does not always imply causation. We can see from the bar chart that Tesla’s stock price increased by the most, followed by Google and Netflix. IBM had the lowest percentage change, with its stock price only rising by 12.5%. This suggests that the stock markets of Tesla, Google, and Netflix performed exceptionally well during the analysis period. Using scatterplots, we have also looked at how the opening and closing prices of each company are correlated. The plots show that the opening and shutting costs are profoundly emphatically connected for all organizations, with relationship coefficients going from 0.891 to 0.996. This demonstrates that assuming an organization’s stock cost opens high, it is bound to close high too, as well as the other way around. Additionally, we have plotted the stock price time series data for each company. The plots demonstrate that, with some sporadic fluctuations, the stock prices of all businesses increased steadily over time. This indicates that the technology sector was generally doing well during the analysis period.

Conclusion

The examination dissected the stock costs of 14 significant innovation organizations, including Tesla, Google, Netflix, Nvidia, Apple, Amazon, Adobe, Microsoft, Salesforce, Meta, Prophet, Cisco, Intel, and IBM. According to the analysis, Tesla, Google, and Netflix performed better than the other businesses because their stock prices increased the most during the time period under investigation. The opening and shutting costs of the relative multitude of organizations were viewed as exceptionally decidedly corresponded, demonstrating that assuming that an organization’s stock cost opened high, it was bound to close high, as well as the other way around. During the time period under investigation, the technology sector experienced expansion, as evidenced by the steady rise in stock prices of all the companies. Financial backers inspired by the innovation business can profit from this examination to settle on informed speculation choices. In conclusion, our investigation reveals that Tesla, Google, and Netflix topped the technology industry during the time frame covered by our investigation. Furthermore, the positive relationship among’s opening and shutting costs recommends that stock costs will more often than not move in an anticipated way over the course of the day. The general pattern of expanding stock costs for all organizations proposes that the innovation business was encountering development during this time. Investors who are interested in the technology industry and want to make well-informed investments can benefit from this analysis.