# Load necessary libraries
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyr)
df <- read.csv("C:/Users/aiden/mergedfile.csv")

Group 1

Testable hypothesis

For the first group, a potential hypothesis is that sectors with higher average market capitalization are more stable and have lower volatility compared to sectors with smaller market capitalization. This could be tested by analyzing the historical volatility of stocks within each sector and comparing it to the sector’s average market cap.

Explain

Group 1 focuses on analyzing the relationship between different sectors and the average market capitalization of companies within each sector. By grouping the data based on the sector column and calculating the mean of the marketcap variable for each group, this analysis helps identify which sectors tend to have higher or lower average market capitalizations. The results are visualized using a bar plot, which makes it easy to compare the relative market sizes across sectors. This analysis provides insights into the financial strength and size of companies in different sectors, highlighting potential disparities between industries

group_by_sector <- df |>
  group_by(Sector) |>
  summarise(mean_marketcap = mean(Marketcap, na.rm = TRUE))

# Visualizing: Marketcap by Sector
ggplot(group_by_sector, aes(x = Sector, y = mean_marketcap)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  ggtitle("Average Market Cap by Sector")

Group 2

Testable hypothesis

For Group 2, a hypothesis could be that countries with higher average Revenuegrowth also have higher average Ebtida, indicating that companies in these countries are not only growing quickly but are also more profitable. This hypothesis can be tested by examining the correlation between revenue growth and profitability across countries.

Explain

Group 2 analyzes the performance of companies across different countries by summarizing two key financial metrics: Revenuegrowth and Ebitda. By grouping the data based on the Country column, the analysis calculates the average values for these metrics in each country. A boxplot is used to visualize the distribution of Revenuegrowth across countries, providing insights into the variability and central tendency of company growth in different regions. This analysis highlights which countries have higher or lower revenue growth and profitability (measured by Ebitda), offering a view of geographic trends in financial performance.

# Grouping by Country and summarizing the mean Revenuegrowth and Ebitda
group_by_country <- df |>
  group_by(Country) |>
  summarise(mean_revenuegrowth = mean(Revenuegrowth, na.rm = TRUE),
            mean_ebitda = mean(Ebitda, na.rm = TRUE))

# Visualizing: Boxplot for Revenue Growth by Country
ggplot(df, aes(x = Country, y = Revenuegrowth)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  ggtitle("Revenue Growth by Country")

Group 3

Testable hypothesis

For Group 3, a hypothesis might be that certain stock exchanges specialize in specific industries, leading to higher concentrations of companies in those sectors. For example, the technology sector might be more concentrated in certain exchanges. This can be tested by analyzing the distribution of industries across exchanges and determining if there is a significant concentration of certain industries in particular exchanges.

Explain

Group 3 examines the relationship between stock exchanges and industries by analyzing the Curentprice of companies based on the combination of the Exchange and Industry columns. By grouping the data according to these two categorical variables, the mean Currentprice for each combination is calculated. A heatmap is used to visualize the count of companies in each Exchange and Industry pair, illustrating where the most and least common combinations occur. This analysis helps identify trends in how industries are distributed across different stock exchanges and reveals which exchange-industry combinations have more or fewer companies, offering insights into market concentration and diversity.

# Grouping by Exchange and Industry, summarizing mean Currentprice
group_by_exchange_industry <- df |>
  group_by(Exchange, Industry) |>
  summarise(mean_currentprice = mean(Currentprice, na.rm = TRUE))

## `summarise()` has grouped output by 'Exchange'. You can override using the
## `.groups` argument.

# Creating a pivot table to show the count of combinations
pivot <- df |>
  group_by(Exchange, Industry) |>
  summarise(count = n()) |>
  spread(Industry, count, fill = 0)

## `summarise()` has grouped output by 'Exchange'. You can override using the
## `.groups` argument.

# Convert pivot table to long format for visualization
pivot_long <- pivot |>
  gather(Industry, count, -Exchange)

# Visualizing: Heatmap of Exchange vs. Industry
ggplot(pivot_long, aes(x = Exchange, y = Industry, fill = count)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  ggtitle("Exchange and Industry Combinations")

Final Thoughts

In this project, we explored various dimensions of financial and business data by grouping and summarizing key metrics. Group 1 revealed how sectors differ in terms of average market capitalization, offering insights into which industries dominate in size. Group 2 provided a geographic view of financial performance by analyzing revenue growth and profitability across countries, highlighting regional economic strengths and weaknesses. Group 3 examined the intersection of stock exchanges and industries, uncovering patterns in market distribution and concentration through the analysis of company prices and counts.

Overall, these groupings allow us to better understand the underlying factors that influence company performance and market dynamics across sectors, regions, and industries. The visualizations provide a clear and intuitive way to identify trends, outliers, and areas of interest. These insights can be useful for investors, analysts, and decision-makers looking to explore specific sectors or regions, assess market diversity, or identify potential growth opportunities in less-represented groups.

DataDiveWeek3

Group 1

Testable hypothesis

Explain

Group 2

Testable hypothesis

Explain

Group 3

Testable hypothesis

Explain

Final Thoughts