Introduction

This project analyzes the stock prices of five major Indian companies: HDFC Bank, Reliance Industries, Infosys, TCS, and ITC for the year 2024.
We explore their stock trends, perform descriptive statistics, and visualize insights for informed investment decision-making.


1. Importing and Understanding the Datasets

library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
# Set path

path <- "D:/etc/lpu/@4/cap 482/projects/data sets/"

# Import data
reliance <- read_csv(paste0(path, "Reliance Industries Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
infosys  <- read_csv(paste0(path, "Infosys Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tcs      <- read_csv(paste0(path, "Tata Consultancy Stock Price History.csv"))
## Rows: 20 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
itc      <- read_csv(paste0(path, "ITC Stock Price History.csv"))
## Rows: 249 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## dbl (4): Price, Open, High, Low
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hdfc     <- read_csv(paste0(path, "HDFC Bank Stock Price History.csv"))
## Rows: 20 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Date, Vol., Change %
## num (4): Price, Open, High, Low
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Interpretation:

The datasets for five leading Indian companies were successfully imported into R. Each dataset contains date-wise closing stock prices recorded throughout the year 2024. This import step forms the foundation for further analysis. Ensuring accurate and complete data is crucial before proceeding to deeper statistical exploration.

2 Understanding the Structure of Datasets

str(reliance)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date    : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
##  $ Price   : num [1:249] 1215 1211 1221 1217 1223 ...
##  $ Open    : num [1:249] 1208 1216 1218 1224 1222 ...
##  $ High    : num [1:249] 1219 1223 1228 1228 1234 ...
##  $ Low     : num [1:249] 1206 1208 1217 1214 1221 ...
##  $ Vol.    : chr [1:249] "6.41M" "8.82M" "7.00M" "10.02M" ...
##  $ Change %: chr [1:249] "0.39%" "-0.85%" "0.37%" "-0.51%" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Price = col_number(),
##   ..   Open = col_number(),
##   ..   High = col_number(),
##   ..   Low = col_number(),
##   ..   Vol. = col_character(),
##   ..   `Change %` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(infosys)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date    : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
##  $ Price   : num [1:249] 1880 1906 1917 1907 1909 ...
##  $ Open    : num [1:249] 1892 1916 1909 1909 1927 ...
##  $ High    : num [1:249] 1897 1916 1924 1920 1942 ...
##  $ Low     : num [1:249] 1845 1886 1904 1902 1840 ...
##  $ Vol.    : chr [1:249] "3.61M" "7.79M" "3.94M" "3.62M" ...
##  $ Change %: chr [1:249] "-1.36%" "-0.56%" "0.49%" "-0.09%" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Price = col_number(),
##   ..   Open = col_number(),
##   ..   High = col_number(),
##   ..   Low = col_number(),
##   ..   Vol. = col_character(),
##   ..   `Change %` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(tcs)
## spc_tbl_ [20 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date    : chr [1:20] "04/04/2025" "04/03/2025" "04/02/2025" "04/01/2025" ...
##  $ Price   : num [1:20] 3299 3403 3544 3551 3606 ...
##  $ Open    : num [1:20] 3363 3491 3544 3573 3651 ...
##  $ High    : num [1:20] 3400 3502 3564 3584 3661 ...
##  $ Low     : num [1:20] 3296 3396 3530 3525 3592 ...
##  $ Vol.    : chr [1:20] "5.32M" "4.54M" "1.76M" "2.62M" ...
##  $ Change %: chr [1:20] "-3.05%" "-3.97%" "-0.19%" "-1.53%" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Price = col_number(),
##   ..   Open = col_number(),
##   ..   High = col_number(),
##   ..   Low = col_number(),
##   ..   Vol. = col_character(),
##   ..   `Change %` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(itc)
## spc_tbl_ [249 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date    : chr [1:249] "12/31/2024" "12/30/2024" "12/27/2024" "12/26/2024" ...
##  $ Price   : num [1:249] 458 451 453 451 453 ...
##  $ Open    : num [1:249] 453 455 454 455 449 ...
##  $ High    : num [1:249] 458 455 456 456 454 ...
##  $ Low     : num [1:249] 449 450 450 450 447 ...
##  $ Vol.    : chr [1:249] "13.28M" "13.07M" "16.28M" "8.85M" ...
##  $ Change %: chr [1:249] "1.39%" "-0.33%" "0.35%" "-0.31%" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Price = col_double(),
##   ..   Open = col_double(),
##   ..   High = col_double(),
##   ..   Low = col_double(),
##   ..   Vol. = col_character(),
##   ..   `Change %` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(hdfc)
## spc_tbl_ [20 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Date    : chr [1:20] "04/07/2025" "04/04/2025" "04/03/2025" "04/02/2025" ...
##  $ Price   : num [1:20] 1743 1817 1795 1797 1768 ...
##  $ Open    : num [1:20] 1763 1813 1781 1771 1802 ...
##  $ High    : num [1:20] 1780 1842 1804 1799 1810 ...
##  $ Low     : num [1:20] 1738 1811 1777 1771 1765 ...
##  $ Vol.    : chr [1:20] "12.40M" "16.77M" "5.63M" "5.79M" ...
##  $ Change %: chr [1:20] "-4.10%" "1.25%" "-0.11%" "1.64%" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Date = col_character(),
##   ..   Price = col_number(),
##   ..   Open = col_number(),
##   ..   High = col_number(),
##   ..   Low = col_number(),
##   ..   Vol. = col_character(),
##   ..   `Change %` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(reliance)
##      Date               Price           Open           High           Low      
##  Length:249         Min.   :1205   Min.   :1208   Min.   :1219   Min.   :1202  
##  Class :character   1st Qu.:1357   1st Qu.:1360   1st Qu.:1372   1st Qu.:1341  
##  Mode  :character   Median :1455   Median :1457   Median :1470   Median :1445  
##                     Mean   :1423   Mean   :1424   Mean   :1437   Mean   :1411  
##                     3rd Qu.:1482   3rd Qu.:1483   3rd Qu.:1494   3rd Qu.:1468  
##                     Max.   :1601   Max.   :1604   Max.   :1609   Max.   :1586  
##      Vol.             Change %        
##  Length:249         Length:249        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
summary(infosys)
##      Date               Price           Open           High           Low      
##  Length:249         Min.   :1394   Min.   :1377   Min.   :1415   Min.   :1358  
##  Class :character   1st Qu.:1515   1st Qu.:1516   1st Qu.:1529   1st Qu.:1498  
##  Mode  :character   Median :1676   Median :1676   Median :1689   Median :1661  
##                     Mean   :1701   Mean   :1701   Mean   :1717   Mean   :1685  
##                     3rd Qu.:1880   3rd Qu.:1880   3rd Qu.:1900   3rd Qu.:1865  
##                     Max.   :2000   Max.   :1995   Max.   :2006   Max.   :1978  
##      Vol.             Change %        
##  Length:249         Length:249        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
summary(tcs)
##      Date               Price           Open           High           Low      
##  Length:20          Min.   :3299   Min.   :3363   Min.   :3400   Min.   :3296  
##  Class :character   1st Qu.:3510   1st Qu.:3502   1st Qu.:3554   1st Qu.:3478  
##  Mode  :character   Median :3569   Median :3558   Median :3586   Median :3528  
##                     Mean   :3553   Mean   :3552   Mean   :3589   Mean   :3519  
##                     3rd Qu.:3607   3rd Qu.:3602   3rd Qu.:3631   3rd Qu.:3576  
##                     Max.   :3658   Max.   :3651   Max.   :3710   Max.   :3638  
##      Vol.             Change %        
##  Length:20          Length:20         
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
summary(itc)
##      Date               Price            Open            High      
##  Length:249         Min.   :381.6   Min.   :383.1   Min.   :384.6  
##  Class :character   1st Qu.:405.8   1st Qu.:406.1   1st Qu.:409.1  
##  Mode  :character   Median :439.9   Median :439.1   Median :442.6  
##                     Mean   :434.5   Mean   :435.0   Mean   :438.8  
##                     3rd Qu.:461.8   3rd Qu.:461.6   3rd Qu.:466.6  
##                     Max.   :494.5   Max.   :492.5   Max.   :500.0  
##       Low            Vol.             Change %        
##  Min.   :377.8   Length:249         Length:249        
##  1st Qu.:402.8   Class :character   Class :character  
##  Median :435.2   Mode  :character   Mode  :character  
##  Mean   :430.8                                        
##  3rd Qu.:457.9                                        
##  Max.   :492.2
summary(hdfc)
##      Date               Price           Open           High           Low      
##  Length:20          Min.   :1686   Min.   :1682   Min.   :1694   Min.   :1670  
##  Class :character   1st Qu.:1711   1st Qu.:1713   1st Qu.:1723   1st Qu.:1704  
##  Mode  :character   Median :1768   Median :1759   Median :1777   Median :1752  
##                     Mean   :1760   Mean   :1755   Mean   :1774   Mean   :1746  
##                     3rd Qu.:1802   3rd Qu.:1800   3rd Qu.:1814   3rd Qu.:1782  
##                     Max.   :1828   Max.   :1829   Max.   :1844   Max.   :1811  
##      Vol.             Change %        
##  Length:20          Length:20         
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Interpretation:

Upon inspecting the structure and summary of the datasets, it was observed that the closing price values are stored as characters with commas. Additionally, each dataset includes a date column that needs to be appropriately formatted. Understanding the initial structure highlights necessary cleaning steps for accurate analysis.

3 Initial Cleaning (Date Format & NA Removal)

clean_data <- function(data) {
  data$Date <- as.Date(data$Date, format = "%d-%m-%Y")
  data$Price <- as.numeric(gsub(",", "", data$Price))
  data <- data[!is.na(data$Price), ]
  return(data)
}

# Apply cleaning
reliance <- clean_data(reliance)
infosys  <- clean_data(infosys)
tcs      <- clean_data(tcs)
itc      <- clean_data(itc)
hdfc     <- clean_data(hdfc)

Interpretation:

The datasets were cleaned by converting the Date column into a proper date format and transforming the Price column into numeric values by removing commas. Any rows containing NA values were also removed. These cleaning operations ensure that the datasets are ready for numerical and time-series analysis without errors.

4. Descriptive Statistics (Mean, Median, SD, Variance)

calculate_stats <- function(data, company_name) {
  data.frame(
    Company = company_name,
    Mean = round(mean(data$Price), 2),
    Median = round(median(data$Price), 2),
    SD = round(sd(data$Price), 2),
    Variance = round(var(data$Price), 2)
  )
}

# Calculate for all companies
stats <- bind_rows(
  calculate_stats(reliance, "Reliance"),
  calculate_stats(infosys, "Infosys"),
  calculate_stats(tcs, "TCS"),
  calculate_stats(itc, "ITC"),
  calculate_stats(hdfc, "HDFC")
)

print(stats)
##    Company    Mean  Median     SD Variance
## 1 Reliance 1423.19 1455.03  90.43  8177.88
## 2  Infosys 1700.74 1675.81 186.67 34844.34
## 3      TCS 3553.03 3569.23  86.26  7440.82
## 4      ITC  434.48  439.85  31.68  1003.39
## 5     HDFC 1760.35 1768.30  49.95  2494.76

Interpretation:

Descriptive statistics such as mean, median, standard deviation, and variance were calculated for each stock’s closing prices. These metrics provide insights into the central tendency and variability of each stock. Understanding these statistics helps compare the performance and stability of different companies in 2024.

5. Identify Stock with Highest and Lowest Average Price

highest_avg <- stats[which.max(stats$Mean), ]
lowest_avg  <- stats[which.min(stats$Mean), ]

highest_avg
##   Company    Mean  Median    SD Variance
## 3     TCS 3553.03 3569.23 86.26  7440.82
lowest_avg
##   Company   Mean Median    SD Variance
## 4     ITC 434.48 439.85 31.68  1003.39

Interpretation:

From the analysis, the stock with the highest average closing price and the stock with the lowest average closing price in 2024 were identified. This comparison helps investors assess which companies are valued higher or lower relative to each other. It provides a basic understanding of stock pricing levels across the selected companies.

6. Calculate Daily Change in Closing Price

hdfc$Daily_Change <- c(NA, diff(hdfc$Price))
reliance$Daily_Change <- c(NA, diff(reliance$Price))
infosys$Daily_Change <- c(NA, diff(infosys$Price))
tcs$Daily_Change <- c(NA, diff(tcs$Price))
itc$Daily_Change <- c(NA, diff(itc$Price))

Interpretation:

Daily changes in closing prices were computed by taking the difference between consecutive days. This allows observation of how much the stock price fluctuates daily. Understanding daily movements is crucial for evaluating short-term volatility and identifying potential trading opportunities.

7. Compute and Compare Volatility


volatility <- data.frame(
  Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
  Volatility = round(c(
    sd(na.omit(hdfc$Daily_Change)),
    sd(na.omit(reliance$Daily_Change)),
    sd(na.omit(infosys$Daily_Change)),
    sd(na.omit(tcs$Daily_Change)),
    sd(na.omit(itc$Daily_Change))
  ), 2)
)

print(volatility)

Interpretation:

Volatility was measured as the standard deviation of daily changes in closing prices. A higher volatility value indicates larger fluctuations in stock prices, while a lower value suggests more stability. Comparing volatility across companies helps classify stocks into risk categories for investors.

8. Positive vs Negative Return Days

trend_summary <- data.frame(
  Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
  Positive_Days = c(
    sum(hdfc$Daily_Change > 0, na.rm = TRUE),
    sum(reliance$Daily_Change > 0, na.rm = TRUE),
    sum(infosys$Daily_Change > 0, na.rm = TRUE),
    sum(tcs$Daily_Change > 0, na.rm = TRUE),
    sum(itc$Daily_Change > 0, na.rm = TRUE)
  ),
  Negative_Days = c(
    sum(hdfc$Daily_Change < 0, na.rm = TRUE),
    sum(reliance$Daily_Change < 0, na.rm = TRUE),
    sum(infosys$Daily_Change < 0, na.rm = TRUE),
    sum(tcs$Daily_Change < 0, na.rm = TRUE),
    sum(itc$Daily_Change < 0, na.rm = TRUE)
  )
)

print(trend_summary)

Interpretation:

The number of days with positive and negative returns was counted for each company. This analysis reflects the general market sentiment and performance trend of the stocks. Stocks with more positive return days are considered to have better momentum during the year 2024.

9. Monthly Performance (Best & Worst Months)

get_monthly_returns <- function(data, company_name) {
  data %>%
    mutate(Month = format(as.Date(Date), "%b")) %>%
    arrange(Date) %>%
    mutate(Daily_Return = (Price - lag(Price)) / lag(Price)) %>%
    group_by(Month) %>%
    summarise(Avg_Monthly_Return = mean(Daily_Return, na.rm = TRUE)) %>%
    mutate(Company = company_name)
}

monthly_data <- bind_rows(
  get_monthly_returns(hdfc, "HDFC"),
  get_monthly_returns(reliance, "Reliance"),
  get_monthly_returns(infosys, "Infosys"),
  get_monthly_returns(tcs, "TCS"),
  get_monthly_returns(itc, "ITC")
)

print(monthly_data)

Interpretation:

Average monthly returns were calculated for each stock to evaluate performance trends across different months. Identifying the best and worst-performing months helps in understanding seasonal effects or market patterns. It also assists investors in planning entry or exit strategies based on historical trends.

10. Investment Recommendation Based on Analysis

Interpretation:

Based on volatility and positive return analysis, recommendations were made for long-term and short-term investors. Low-volatility stocks are advised for long-term stable investments. High-volatility stocks with frequent positive days are suitable for short-term trading or speculative strategies. The final suggestion aligns investment style with stock characteristics.

11 Average Closing Price (2024)

ggplot(stats, aes(x = Company, y = Mean, fill = Company)) +
  geom_col() +
  ggtitle("Average Closing Price per Company (2024)") +
  theme_minimal()

Interpretation:

The bar chart depicting average closing prices shows that Reliance had the highest average stock price in 2024, whereas ITC was the most affordable. This visualization offers a quick comparative view of stock value among the selected companies. Price levels provide a context for investment sizing and budgeting.

12 Positive vs Negative Return Days

# Calculate Positive and Negative Days for each company directly in this chunk
positive_days <- c(
  sum(hdfc$Daily_Change > 0, na.rm = TRUE),
  sum(reliance$Daily_Change > 0, na.rm = TRUE),
  sum(infosys$Daily_Change > 0, na.rm = TRUE),
  sum(tcs$Daily_Change > 0, na.rm = TRUE),
  sum(itc$Daily_Change > 0, na.rm = TRUE)
)

negative_days <- c(
  sum(hdfc$Daily_Change < 0, na.rm = TRUE),
  sum(reliance$Daily_Change < 0, na.rm = TRUE),
  sum(infosys$Daily_Change < 0, na.rm = TRUE),
  sum(tcs$Daily_Change < 0, na.rm = TRUE),
  sum(itc$Daily_Change < 0, na.rm = TRUE)
)

# Create a new data frame for positive and negative days
trend_summary <- data.frame(
  Company = c("HDFC", "Reliance", "Infosys", "TCS", "ITC"),
  Positive_Days = positive_days,
  Negative_Days = negative_days
)

# Pivot data for plotting
trend_long <- trend_summary %>%
  pivot_longer(cols = c(Positive_Days, Negative_Days),
               names_to = "Type", values_to = "Days")

# Plot the results
ggplot(trend_long, aes(x = Company, y = Days, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  ggtitle("Positive vs Negative Days (2024)") +
  theme_minimal()

Interpretation:

A grouped bar chart was created to compare the number of positive and negative trading days for each company. The visual comparison quickly highlights which stocks performed better on a daily basis. This is useful for assessing the consistency and resilience of stock performance throughout the year.

14 Pie Chart of Positive Days Share

positive_days <- trend_summary$Positive_Days
names(positive_days) <- trend_summary$Company

pie(positive_days, col = rainbow(length(positive_days)),
    main = "Share of Positive Return Days (2024)")

Interpretation:

The pie chart shows the share of total positive return days contributed by each company. Companies with larger slices had more frequent positive trading sessions. This visual representation offers a quick understanding of which stocks had better day-to-day consistency during 2024.

15 Distribution of Daily Change (Reliance)

ggplot(reliance, aes(x = Daily_Change)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white") +
  ggtitle("Distribution of Daily Change – Reliance") +
  theme_minimal()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Interpretation The histogram shows that most daily price changes for Reliance are clustered around zero, meaning that on most days, the stock experiences small price movements. However, there are occasional large positive and negative shifts, indicating periods of higher volatility. Overall, Reliance exhibits moderate daily volatility, with the majority of trading days showing minimal price change and fewer days experiencing extreme fluctuations.