This project analyzes the stock prices of five major Indian
companies: HDFC Bank, Reliance
Industries, Infosys, TCS, and
ITC for the year 2024.
We explore their stock trends, perform descriptive statistics, and
visualize insights for informed investment decision-making.
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# Set path
path <- "D:/etc/lpu/@4/cap 482/projects/data sets/"
# Import data
reliance <- read_csv(paste0(path, "Reliance Industries Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
infosys <- read_csv(paste0(path, "Infosys Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tcs <- read_csv(paste0(path, "Tata Consultancy Stock Price History.csv"))
## Rows: 20 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
itc <- read_csv(paste0(path, "ITC Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## dbl (4): Price, Open, High, Low
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hdfc <- read_csv(paste0(path, "HDFC Bank Stock Price History.csv"))
## Rows: 20 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The datasets for five leading Indian companies were successfully imported into R. Each dataset contains date-wise closing stock prices recorded throughout the year 2024. This import step forms the foundation for further analysis. Ensuring accurate and complete data is crucial before proceeding to deeper statistical exploration.
str(reliance)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
## $ Price : num [1:249] 1215 1211 1221 1217 1223 ...
## $ Open : num [1:249] 1208 1216 1218 1224 1222 ...
## $ High : num [1:249] 1219 1223 1228 1228 1234 ...
## $ Low : num [1:249] 1206 1208 1217 1214 1221 ...
## $ Vol. : chr [1:249] "6.41M" "8.82M" "7.00M" "10.02M" ...
## $ Change %: chr [1:249] "0.39%" "-0.85%" "0.37%" "-0.51%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Price = col_number(),
## .. Open = col_number(),
## .. High = col_number(),
## .. Low = col_number(),
## .. Vol. = col_character(),
## .. `Change %` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(infosys)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
## $ Price : num [1:249] 1880 1906 1917 1907 1909 ...
## $ Open : num [1:249] 1892 1916 1909 1909 1927 ...
## $ High : num [1:249] 1897 1916 1924 1920 1942 ...
## $ Low : num [1:249] 1845 1886 1904 1902 1840 ...
## $ Vol. : chr [1:249] "3.61M" "7.79M" "3.94M" "3.62M" ...
## $ Change %: chr [1:249] "-1.36%" "-0.56%" "0.49%" "-0.09%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Price = col_number(),
## .. Open = col_number(),
## .. High = col_number(),
## .. Low = col_number(),
## .. Vol. = col_character(),
## .. `Change %` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(tcs)
## spc_tbl_ [20 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:20] "04/04/2025" "04/03/2025" "04/02/2025" "04/01/2025" ...
## $ Price : num [1:20] 3299 3403 3544 3551 3606 ...
## $ Open : num [1:20] 3363 3491 3544 3573 3651 ...
## $ High : num [1:20] 3400 3502 3564 3584 3661 ...
## $ Low : num [1:20] 3296 3396 3530 3525 3592 ...
## $ Vol. : chr [1:20] "5.32M" "4.54M" "1.76M" "2.62M" ...
## $ Change %: chr [1:20] "-3.05%" "-3.97%" "-0.19%" "-1.53%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Price = col_number(),
## .. Open = col_number(),
## .. High = col_number(),
## .. Low = col_number(),
## .. Vol. = col_character(),
## .. `Change %` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(itc)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
## $ Price : num [1:249] 458 451 453 451 453 ...
## $ Open : num [1:249] 453 455 454 455 449 ...
## $ High : num [1:249] 458 455 456 456 454 ...
## $ Low : num [1:249] 449 450 450 450 447 ...
## $ Vol. : chr [1:249] "13.28M" "13.07M" "16.28M" "8.85M" ...
## $ Change %: chr [1:249] "1.39%" "-0.33%" "0.35%" "-0.31%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Price = col_double(),
## .. Open = col_double(),
## .. High = col_double(),
## .. Low = col_double(),
## .. Vol. = col_character(),
## .. `Change %` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
str(hdfc)
## spc_tbl_ [20 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : chr [1:20] "04/07/2025" "04/04/2025" "04/03/2025" "04/02/2025" ...
## $ Price : num [1:20] 1743 1817 1795 1797 1768 ...
## $ Open : num [1:20] 1763 1813 1781 1771 1802 ...
## $ High : num [1:20] 1780 1842 1804 1799 1810 ...
## $ Low : num [1:20] 1738 1811 1777 1771 1765 ...
## $ Vol. : chr [1:20] "12.40M" "16.77M" "5.63M" "5.79M" ...
## $ Change %: chr [1:20] "-4.10%" "1.25%" "-0.11%" "1.64%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Price = col_number(),
## .. Open = col_number(),
## .. High = col_number(),
## .. Low = col_number(),
## .. Vol. = col_character(),
## .. `Change %` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
summary(reliance)
## Date Price Open High Low
## Length:249 Min. :1205 Min. :1208 Min. :1219 Min. :1202
## Class :character 1st Qu.:1357 1st Qu.:1360 1st Qu.:1372 1st Qu.:1341
## Mode :character Median :1455 Median :1457 Median :1470 Median :1445
## Mean :1423 Mean :1424 Mean :1437 Mean :1411
## 3rd Qu.:1482 3rd Qu.:1483 3rd Qu.:1494 3rd Qu.:1468
## Max. :1601 Max. :1604 Max. :1609 Max. :1586
## Vol. Change %
## Length:249 Length:249
## Class :character Class :character
## Mode :character Mode :character
##
##
##
summary(infosys)
## Date Price Open High Low
## Length:249 Min. :1394 Min. :1377 Min. :1415 Min. :1358
## Class :character 1st Qu.:1515 1st Qu.:1516 1st Qu.:1529 1st Qu.:1498
## Mode :character Median :1676 Median :1676 Median :1689 Median :1661
## Mean :1701 Mean :1701 Mean :1717 Mean :1685
## 3rd Qu.:1880 3rd Qu.:1880 3rd Qu.:1900 3rd Qu.:1865
## Max. :2000 Max. :1995 Max. :2006 Max. :1978
## Vol. Change %
## Length:249 Length:249
## Class :character Class :character
## Mode :character Mode :character
##
##
##
summary(tcs)
## Date Price Open High Low
## Length:20 Min. :3299 Min. :3363 Min. :3400 Min. :3296
## Class :character 1st Qu.:3510 1st Qu.:3502 1st Qu.:3554 1st Qu.:3478
## Mode :character Median :3569 Median :3558 Median :3586 Median :3528
## Mean :3553 Mean :3552 Mean :3589 Mean :3519
## 3rd Qu.:3607 3rd Qu.:3602 3rd Qu.:3631 3rd Qu.:3576
## Max. :3658 Max. :3651 Max. :3710 Max. :3638
## Vol. Change %
## Length:20 Length:20
## Class :character Class :character
## Mode :character Mode :character
##
##
##
summary(itc)
## Date Price Open High
## Length:249 Min. :381.6 Min. :383.1 Min. :384.6
## Class :character 1st Qu.:405.8 1st Qu.:406.1 1st Qu.:409.1
## Mode :character Median :439.9 Median :439.1 Median :442.6
## Mean :434.5 Mean :435.0 Mean :438.8
## 3rd Qu.:461.8 3rd Qu.:461.6 3rd Qu.:466.6
## Max. :494.5 Max. :492.5 Max. :500.0
## Low Vol. Change %
## Min. :377.8 Length:249 Length:249
## 1st Qu.:402.8 Class :character Class :character
## Median :435.2 Mode :character Mode :character
## Mean :430.8
## 3rd Qu.:457.9
## Max. :492.2
summary(hdfc)
## Date Price Open High Low
## Length:20 Min. :1686 Min. :1682 Min. :1694 Min. :1670
## Class :character 1st Qu.:1711 1st Qu.:1713 1st Qu.:1723 1st Qu.:1704
## Mode :character Median :1768 Median :1759 Median :1777 Median :1752
## Mean :1760 Mean :1755 Mean :1774 Mean :1746
## 3rd Qu.:1802 3rd Qu.:1800 3rd Qu.:1814 3rd Qu.:1782
## Max. :1828 Max. :1829 Max. :1844 Max. :1811
## Vol. Change %
## Length:20 Length:20
## Class :character Class :character
## Mode :character Mode :character
##
##
##
Upon inspecting the structure and summary of the datasets, it was observed that the closing price values are stored as characters with commas. Additionally, each dataset includes a date column that needs to be appropriately formatted. Understanding the initial structure highlights necessary cleaning steps for accurate analysis.
clean_data <- function(data) {
data$Date <- as.Date(data$Date, format = "%d-%m-%Y")
data$Price <- as.numeric(gsub(",", "", data$Price))
data <- data[!is.na(data$Price), ]
return(data)
}
# Apply cleaning
reliance <- clean_data(reliance)
infosys <- clean_data(infosys)
tcs <- clean_data(tcs)
itc <- clean_data(itc)
hdfc <- clean_data(hdfc)
The datasets were cleaned by converting the Date column into a proper date format and transforming the Price column into numeric values by removing commas. Any rows containing NA values were also removed. These cleaning operations ensure that the datasets are ready for numerical and time-series analysis without errors.
calculate_stats <- function(data, company_name) {
data.frame(
Company = company_name,
Mean = round(mean(data$Price), 2),
Median = round(median(data$Price), 2),
SD = round(sd(data$Price), 2),
Variance = round(var(data$Price), 2)
)
}
# Calculate for all companies
stats <- bind_rows(
calculate_stats(reliance, "Reliance"),
calculate_stats(infosys, "Infosys"),
calculate_stats(tcs, "TCS"),
calculate_stats(itc, "ITC"),
calculate_stats(hdfc, "HDFC")
)
print(stats)
## Company Mean Median SD Variance
## 1 Reliance 1423.19 1455.03 90.43 8177.88
## 2 Infosys 1700.74 1675.81 186.67 34844.34
## 3 TCS 3553.03 3569.23 86.26 7440.82
## 4 ITC 434.48 439.85 31.68 1003.39
## 5 HDFC 1760.35 1768.30 49.95 2494.76
Descriptive statistics such as mean, median, standard deviation, and variance were calculated for each stock’s closing prices. These metrics provide insights into the central tendency and variability of each stock. Understanding these statistics helps compare the performance and stability of different companies in 2024.
highest_avg <- stats[which.max(stats$Mean), ]
lowest_avg <- stats[which.min(stats$Mean), ]
highest_avg
## Company Mean Median SD Variance
## 3 TCS 3553.03 3569.23 86.26 7440.82
lowest_avg
## Company Mean Median SD Variance
## 4 ITC 434.48 439.85 31.68 1003.39
From the analysis, the stock with the highest average closing price and the stock with the lowest average closing price in 2024 were identified. This comparison helps investors assess which companies are valued higher or lower relative to each other. It provides a basic understanding of stock pricing levels across the selected companies.
hdfc$Daily_Change <- c(NA, diff(hdfc$Price))
reliance$Daily_Change <- c(NA, diff(reliance$Price))
infosys$Daily_Change <- c(NA, diff(infosys$Price))
tcs$Daily_Change <- c(NA, diff(tcs$Price))
itc$Daily_Change <- c(NA, diff(itc$Price))
Daily changes in closing prices were computed by taking the difference between consecutive days. This allows observation of how much the stock price fluctuates daily. Understanding daily movements is crucial for evaluating short-term volatility and identifying potential trading opportunities.
volatility <- data.frame(
Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
Volatility = round(c(
sd(na.omit(hdfc$Daily_Change)),
sd(na.omit(reliance$Daily_Change)),
sd(na.omit(infosys$Daily_Change)),
sd(na.omit(tcs$Daily_Change)),
sd(na.omit(itc$Daily_Change))
), 2)
)
print(volatility)
Volatility was measured as the standard deviation of daily changes in closing prices. A higher volatility value indicates larger fluctuations in stock prices, while a lower value suggests more stability. Comparing volatility across companies helps classify stocks into risk categories for investors.
trend_summary <- data.frame(
Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
Positive_Days = c(
sum(hdfc$Daily_Change > 0, na.rm = TRUE),
sum(reliance$Daily_Change > 0, na.rm = TRUE),
sum(infosys$Daily_Change > 0, na.rm = TRUE),
sum(tcs$Daily_Change > 0, na.rm = TRUE),
sum(itc$Daily_Change > 0, na.rm = TRUE)
),
Negative_Days = c(
sum(hdfc$Daily_Change < 0, na.rm = TRUE),
sum(reliance$Daily_Change < 0, na.rm = TRUE),
sum(infosys$Daily_Change < 0, na.rm = TRUE),
sum(tcs$Daily_Change < 0, na.rm = TRUE),
sum(itc$Daily_Change < 0, na.rm = TRUE)
)
)
print(trend_summary)
The number of days with positive and negative returns was counted for each company. This analysis reflects the general market sentiment and performance trend of the stocks. Stocks with more positive return days are considered to have better momentum during the year 2024.
get_monthly_returns <- function(data, company_name) {
data %>%
mutate(Month = format(as.Date(Date), "%b")) %>%
arrange(Date) %>%
mutate(Daily_Return = (Price - lag(Price)) / lag(Price)) %>%
group_by(Month) %>%
summarise(Avg_Monthly_Return = mean(Daily_Return, na.rm = TRUE)) %>%
mutate(Company = company_name)
}
monthly_data <- bind_rows(
get_monthly_returns(hdfc, "HDFC"),
get_monthly_returns(reliance, "Reliance"),
get_monthly_returns(infosys, "Infosys"),
get_monthly_returns(tcs, "TCS"),
get_monthly_returns(itc, "ITC")
)
print(monthly_data)
Average monthly returns were calculated for each stock to evaluate performance trends across different months. Identifying the best and worst-performing months helps in understanding seasonal effects or market patterns. It also assists investors in planning entry or exit strategies based on historical trends.
Based on volatility and positive return analysis, recommendations were made for long-term and short-term investors. Low-volatility stocks are advised for long-term stable investments. High-volatility stocks with frequent positive days are suitable for short-term trading or speculative strategies. The final suggestion aligns investment style with stock characteristics.
ggplot(stats, aes(x = Company, y = Mean, fill = Company)) +
geom_col() +
ggtitle("Average Closing Price per Company (2024)") +
theme_minimal()
The bar chart depicting average closing prices shows that Reliance had the highest average stock price in 2024, whereas ITC was the most affordable. This visualization offers a quick comparative view of stock value among the selected companies. Price levels provide a context for investment sizing and budgeting.
# Calculate Positive and Negative Days for each company directly in this chunk
positive_days <- c(
sum(hdfc$Daily_Change > 0, na.rm = TRUE),
sum(reliance$Daily_Change > 0, na.rm = TRUE),
sum(infosys$Daily_Change > 0, na.rm = TRUE),
sum(tcs$Daily_Change > 0, na.rm = TRUE),
sum(itc$Daily_Change > 0, na.rm = TRUE)
)
negative_days <- c(
sum(hdfc$Daily_Change < 0, na.rm = TRUE),
sum(reliance$Daily_Change < 0, na.rm = TRUE),
sum(infosys$Daily_Change < 0, na.rm = TRUE),
sum(tcs$Daily_Change < 0, na.rm = TRUE),
sum(itc$Daily_Change < 0, na.rm = TRUE)
)
# Create a new data frame for positive and negative days
trend_summary <- data.frame(
Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
Positive_Days = positive_days,
Negative_Days = negative_days
)
# Pivot data for plotting
trend_long <- trend_summary %>%
pivot_longer(cols = c(Positive_Days, Negative_Days),
names_to = "Type", values_to = "Days")
# Plot the results
ggplot(trend_long, aes(x = Company, y = Days, fill = Type)) +
geom_bar(stat = "identity", position = "dodge") +
ggtitle("Positive vs Negative Days (2024)") +
theme_minimal()
A grouped bar chart was created to compare the number of positive and negative trading days for each company. The visual comparison quickly highlights which stocks performed better on a daily basis. This is useful for assessing the consistency and resilience of stock performance throughout the year.
# Create monthly return data within this chunk to avoid 'monthly_data' not found error
get_monthly_returns <- function(data, company_name) {
data %>%
mutate(Month = format(as.Date(Date), "%b")) %>%
arrange(Date) %>%
mutate(Daily_Return = (Price - lag(Price)) / lag(Price)) %>%
group_by(Month) %>%
summarise(Avg_Monthly_Return = mean(Daily_Return, na.rm = TRUE)) %>%
mutate(Company = company_name)
}
# Calculate monthly returns for all companies
monthly_data <- bind_rows(
get_monthly_returns(hdfc, "HDFC"),
get_monthly_returns(reliance, "Reliance"),
get_monthly_returns(infosys, "Infosys"),
get_monthly_returns(tcs, "TCS"),
get_monthly_returns(itc, "ITC")
)
# Reorder Month as a factor
monthly_data$Month <- factor(monthly_data$Month,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
# Plotting the monthly average return trends
ggplot(monthly_data, aes(x = Month, y = Avg_Monthly_Return, group = Company, color = Company)) +
geom_line(size = 1.2) +
geom_point() +
ggtitle("Monthly Average Return Trends (2024)") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
# Interpretation: The line plot of monthly average returns showcases the
ups and downs in stock performance over the year. Peaks represent months
of strong returns, while dips indicate weaker performance. Analyzing
these trends helps recognize seasonal influences and periods of higher
or lower market optimism.
The pie chart shows the share of total positive return days contributed by each company. Companies with larger slices had more frequent positive trading sessions. This visual representation offers a quick understanding of which stocks had better day-to-day consistency during 2024.
ggplot(reliance, aes(x = Daily_Change)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "white") +
ggtitle("Distribution of Daily Change – Reliance") +
theme_minimal()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
# Interpretation The histogram shows that most daily price changes for
Reliance are clustered around zero, meaning that on most days, the stock
experiences small price movements. However, there are occasional large
positive and negative shifts, indicating periods of higher volatility.
Overall, Reliance exhibits moderate daily volatility, with the majority
of trading days showing minimal price change and fewer days experiencing
extreme fluctuations.