Forbes <- read.csv("Forbes2000.csv")FinalProject_u1510051
For the final project, I will do project 2. I will start by reading in the data from my directory
Relationship of profits to other variables
I will find the relationship of profits to the other variables (company type, country, and sales) using graphs and tables.
Relationship of profits to company type
First I will get ggpubr and dplyr from my library.
library(ggpubr)Warning: package 'ggpubr' was built under R version 4.4.3
Loading required package: ggplot2
Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)Warning: package 'dplyr' was built under R version 4.4.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Now I will create a bar plot using ggpubr showing the profit for each company type.
comp.type.avg <- Forbes|>
group_by(category)|>
summarize(avgprofit = mean(profits, na.rm = TRUE))
ggbarplot(comp.type.avg, x = "category", y = "avgprofit",
fill = "category", x.text.angle = 90, legend = "none") +
font("x.text", size = 7) From this graph, I can see that conglomerates, drugs & biotechnology, and oil & gas operations are the company types with the highest profits. Telecommunications services, trading companies, and capital goods have the lowest profits.
Relationship between countries and profits
Now I will make a table with the average profits per country listed in descending order.
country.avg <- Forbes|>
group_by(country)|>
summarize(avgprofit = mean(profits, na.rm = TRUE)) |>
arrange(desc(avgprofit))
country.avg# A tibble: 61 × 2
country avgprofit
<chr> <dbl>
1 Netherlands/ United Kingdom 5.32
2 United Kingdom/ Australia 1.64
3 Russia 1.24
4 Kong/China 1.19
5 Panama/ United Kingdom 1.18
6 Australia/ United Kingdom 1.18
7 Islands 0.74
8 United States 0.652
9 Finland 0.641
10 Norway 0.631
# ℹ 51 more rows
Since some of these companies are listed as being in multiple countries, it is hard to get an exact answer for which countries generate the most profits. For companies in multiple countries, the Netherlands/United Kingdom and the United Kingdom/Australia have the highest profits. For companies in only one country, Russia, Islands, and the United States generate the most profits.
Relationship between countries and sales
Here I will make a table with with the average sales per country listed in descending order.
country.avg.s <- Forbes|>
group_by(country)|>
summarize(avgsales = mean(sales, na.rm = TRUE)) |>
arrange(desc(avgsales))
country.avg.s# A tibble: 61 × 2
country avgsales
<chr> <dbl>
1 Netherlands/ United Kingdom 92.1
2 Germany 20.8
3 France 20.1
4 Netherlands 17.0
5 Korea 15.0
6 Luxembourg 14.2
7 Switzerland 12.5
8 Australia/ United Kingdom 11.6
9 Norway 10.8
10 United Kingdom 10.4
# ℹ 51 more rows
This result has the same issue as the one comparing countries and profits. For companies in multiple countries, the Netherlands/United Kingdom and Australia/United Kingdom have the most sales, and for companies in only one country, Germany, France, and the Netherlands have the most sales.
Comparing the USA and Japan
Highest Forbes Rank
To find the country with the highest Forbes ranking, I will find the average ranking of all US companies and the average ranking of all Japanese companies.
US.avg <- filter(Forbes, country == "United States", na.rm = TRUE)
mean(US.avg$rank)[1] 947.2756
Japan.avg <- filter(Forbes, country == "Japan", na.rm = TRUE)
mean(Japan.avg$rank)[1] 1144.329
US-based companies have a higher average Forbes ranking than Japan-based companies.
Most common company type in the US
I will find the most common company type on the Forbes list for US based companies by filtering for US companies and making a table of how many of each category there are.
comp.type.US <- filter(Forbes, country == "United States")
table(comp.type.US$category)
Aerospace & defense Banking
10 83
Business services & supplies Capital goods
31 10
Chemicals Conglomerates
13 11
Construction Consumer durables
18 25
Diversified financials Drugs & biotechnology
60 21
Food drink & tobacco Food markets
28 9
Health care equipment & services Hotels restaurants & leisure
53 17
Household & personal products Insurance
20 46
Materials Media
26 28
Oil & gas operations Retailing
32 53
Semiconductors Software & services
16 21
Technology hardware & equipment Telecommunications services
33 16
Transportation Utilities
17 54
The most common company types are Banking, Diversified Financials, and Utilities.
Most common company type in Japan
I will find the most common company type on the Forbes list in Japan using the same methods.
comp.type.Japan <- filter(Forbes, country == "Japan")
table(comp.type.Japan$category)
Banking Business services & supplies
69 17
Capital goods Chemicals
19 14
Construction Consumer durables
18 22
Diversified financials Drugs & biotechnology
24 9
Food drink & tobacco Food markets
10 2
Health care equipment & services Household & personal products
4 7
Insurance Materials
9 12
Media Oil & gas operations
7 3
Retailing Semiconductors
12 3
Software & services Technology hardware & equipment
3 6
Telecommunications services Trading companies
2 13
Transportation Utilities
20 11
The most common company types are Banking, Diversified Financials, and Consumer Durables.
Multiple Linear Regression Model
Building the model
I will build a model to find profits as a function of assets, sales, and market value.
profit.est <- lm(profits ~ assets + marketvalue + sales, data = Forbes)Find coefficients and goodness-of-fit
I will use the summary function to get more information about the model.
summary(profit.est)
Call:
lm(formula = profits ~ assets + marketvalue + sales, data = Forbes)
Residuals:
Min 1Q Median 3Q Max
-29.2169 -0.0189 0.1160 0.2107 8.9495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1186259 0.0380039 -3.121 0.00183 **
assets -0.0008395 0.0003781 -2.220 0.02651 *
marketvalue 0.0363340 0.0018183 19.982 < 2e-16 ***
sales 0.0098892 0.0024331 4.064 5e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.472 on 1991 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.3059, Adjusted R-squared: 0.3049
F-statistic: 292.5 on 3 and 1991 DF, p-value: < 2.2e-16
The coefficients show that market value and sales are significant, but assets are not very significant in determining profits. The market value and sales coefficients both have small p-values while the assets coefficient has a higher p-value. The model fits the data relatively well. The r-squared value is pretty low, so that indicates that the model isn’t the best, but the f-statistic is high with a low p-value so the model fits the data well from that perspective.
Find variable of effect
Now I will use the anova() function to find the variable that effects the profits the most.
anova(profit.est)Analysis of Variance Table
Response: profits
Df Sum Sq Mean Sq F value Pr(>F)
assets 1 312.8 312.84 144.40 < 2.2e-16 ***
marketvalue 1 1552.8 1552.77 716.71 < 2.2e-16 ***
sales 1 35.8 35.79 16.52 5.001e-05 ***
Residuals 1991 4313.6 2.17
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The data shows that market value has the greatest effect on profits, because the F-value is the largest and the p-value is small.
Plot residuals
Now I will plot residuals to make sure there is no bias.
hist(residuals(profit.est))The data is relatively symmetric around 0, so there the model works well.
Comparing Japanese and American companies
Model for Japan
I will use the same methods as I did before to build a model for Japan and run the anova() function on it. I will also make a plot of residuals to check for bias.
Forbes.Japan <- filter (Forbes, country == "Japan")
profit.est.Japan <- lm(profits ~ assets + marketvalue + sales, data = Forbes.Japan)
anova(profit.est.Japan)Analysis of Variance Table
Response: profits
Df Sum Sq Mean Sq F value Pr(>F)
assets 1 284.846 284.846 433.311 <2e-16 ***
marketvalue 1 162.574 162.574 247.310 <2e-16 ***
sales 1 0.024 0.024 0.036 0.8497
Residuals 312 205.100 0.657
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hist(residuals(profit.est.Japan))Assets seem to be the most importamt variable for Japan because it has the highest F-value and a low p-value. Sales is definitely not important because the p-value is really high. The plot of residuals shows that the model is good because it is symmetrical around 0.
Model for US
I will use the same method that I used for Japan to find the most important variable for the US and graph its residuals to look for bias.
Forbes.US <- filter (Forbes, country == "United States")
profit.est.US <- lm(profits ~ assets + marketvalue + sales, data = Forbes.US)
anova(profit.est.US)Analysis of Variance Table
Response: profits
Df Sum Sq Mean Sq F value Pr(>F)
assets 1 975.23 975.23 1479.783 < 2.2e-16 ***
marketvalue 1 895.96 895.96 1359.498 < 2.2e-16 ***
sales 1 30.41 30.41 46.142 2.253e-11 ***
Residuals 744 490.32 0.66
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hist(residuals(profit.est.US))Assets seem to be the most important variable for the US as well, though its F-value is similar to the F-value for assets, and the p-values are the same. The main difference between the US and Japan is that for Japan, sales is not a significant variable but for the US it is. The histogram for US residuals is symmetrical, so the model works well.