FinalProject_u1510051

Author

Chloe Horn

For the final project, I will do project 2. I will start by reading in the data from my directory

Forbes <- read.csv("Forbes2000.csv")

Relationship of profits to other variables

I will find the relationship of profits to the other variables (company type, country, and sales) using graphs and tables.

Relationship of profits to company type

First I will get ggpubr and dplyr from my library.

library(ggpubr)

Warning: package 'ggpubr' was built under R version 4.4.3

Loading required package: ggplot2

Warning: package 'ggplot2' was built under R version 4.4.3

library(dplyr)

Warning: package 'dplyr' was built under R version 4.4.3


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Now I will create a bar plot using ggpubr showing the profit for each company type.

comp.type.avg <- Forbes|>
  group_by(category)|>
  summarize(avgprofit = mean(profits, na.rm = TRUE))
ggbarplot(comp.type.avg, x = "category", y = "avgprofit",
          fill = "category", x.text.angle = 90, legend = "none") + 
  font("x.text", size = 7)

From this graph, I can see that conglomerates, drugs & biotechnology, and oil & gas operations are the company types with the highest profits. Telecommunications services, trading companies, and capital goods have the lowest profits.

Relationship between countries and profits

Now I will make a table with the average profits per country listed in descending order.

country.avg <- Forbes|>
  group_by(country)|>
  summarize(avgprofit = mean(profits, na.rm = TRUE)) |>
  arrange(desc(avgprofit))
country.avg

# A tibble: 61 × 2
   country                     avgprofit
   <chr>                           <dbl>
 1 Netherlands/ United Kingdom     5.32 
 2 United Kingdom/ Australia       1.64 
 3 Russia                          1.24 
 4 Kong/China                      1.19 
 5 Panama/ United Kingdom          1.18 
 6 Australia/ United Kingdom       1.18 
 7 Islands                         0.74 
 8 United States                   0.652
 9 Finland                         0.641
10 Norway                          0.631
# ℹ 51 more rows

Since some of these companies are listed as being in multiple countries, it is hard to get an exact answer for which countries generate the most profits. For companies in multiple countries, the Netherlands/United Kingdom and the United Kingdom/Australia have the highest profits. For companies in only one country, Russia, Islands, and the United States generate the most profits.

Relationship between countries and sales

Here I will make a table with with the average sales per country listed in descending order.

country.avg.s <- Forbes|>
  group_by(country)|>
  summarize(avgsales = mean(sales, na.rm = TRUE)) |>
  arrange(desc(avgsales))
country.avg.s

# A tibble: 61 × 2
   country                     avgsales
   <chr>                          <dbl>
 1 Netherlands/ United Kingdom     92.1
 2 Germany                         20.8
 3 France                          20.1
 4 Netherlands                     17.0
 5 Korea                           15.0
 6 Luxembourg                      14.2
 7 Switzerland                     12.5
 8 Australia/ United Kingdom       11.6
 9 Norway                          10.8
10 United Kingdom                  10.4
# ℹ 51 more rows

This result has the same issue as the one comparing countries and profits. For companies in multiple countries, the Netherlands/United Kingdom and Australia/United Kingdom have the most sales, and for companies in only one country, Germany, France, and the Netherlands have the most sales.

Comparing the USA and Japan

Highest Forbes Rank

To find the country with the highest Forbes ranking, I will find the average ranking of all US companies and the average ranking of all Japanese companies.

US.avg <- filter(Forbes, country == "United States", na.rm = TRUE)
mean(US.avg$rank)

[1] 947.2756

Japan.avg <- filter(Forbes, country == "Japan", na.rm = TRUE)
mean(Japan.avg$rank)

[1] 1144.329

US-based companies have a higher average Forbes ranking than Japan-based companies.

Most common company type in the US

I will find the most common company type on the Forbes list for US based companies by filtering for US companies and making a table of how many of each category there are.

comp.type.US <- filter(Forbes, country == "United States")
table(comp.type.US$category)


             Aerospace & defense                          Banking 
                              10                               83 
    Business services & supplies                    Capital goods 
                              31                               10 
                       Chemicals                    Conglomerates 
                              13                               11 
                    Construction                Consumer durables 
                              18                               25 
          Diversified financials            Drugs & biotechnology 
                              60                               21 
            Food drink & tobacco                     Food markets 
                              28                                9 
Health care equipment & services     Hotels restaurants & leisure 
                              53                               17 
   Household & personal products                        Insurance 
                              20                               46 
                       Materials                            Media 
                              26                               28 
            Oil & gas operations                        Retailing 
                              32                               53 
                  Semiconductors              Software & services 
                              16                               21 
 Technology hardware & equipment      Telecommunications services 
                              33                               16 
                  Transportation                        Utilities 
                              17                               54

The most common company types are Banking, Diversified Financials, and Utilities.

Most common company type in Japan

I will find the most common company type on the Forbes list in Japan using the same methods.

comp.type.Japan <- filter(Forbes, country == "Japan")
table(comp.type.Japan$category)


                         Banking     Business services & supplies 
                              69                               17 
                   Capital goods                        Chemicals 
                              19                               14 
                    Construction                Consumer durables 
                              18                               22 
          Diversified financials            Drugs & biotechnology 
                              24                                9 
            Food drink & tobacco                     Food markets 
                              10                                2 
Health care equipment & services    Household & personal products 
                               4                                7 
                       Insurance                        Materials 
                               9                               12 
                           Media             Oil & gas operations 
                               7                                3 
                       Retailing                   Semiconductors 
                              12                                3 
             Software & services  Technology hardware & equipment 
                               3                                6 
     Telecommunications services                Trading companies 
                               2                               13 
                  Transportation                        Utilities 
                              20                               11

The most common company types are Banking, Diversified Financials, and Consumer Durables.

Multiple Linear Regression Model

Building the model

I will build a model to find profits as a function of assets, sales, and market value.

profit.est <- lm(profits ~ assets + marketvalue + sales, data = Forbes)

Find coefficients and goodness-of-fit

I will use the summary function to get more information about the model.

summary(profit.est)


Call:
lm(formula = profits ~ assets + marketvalue + sales, data = Forbes)

Residuals:
     Min       1Q   Median       3Q      Max 
-29.2169  -0.0189   0.1160   0.2107   8.9495 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1186259  0.0380039  -3.121  0.00183 ** 
assets      -0.0008395  0.0003781  -2.220  0.02651 *  
marketvalue  0.0363340  0.0018183  19.982  < 2e-16 ***
sales        0.0098892  0.0024331   4.064    5e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.472 on 1991 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.3059,    Adjusted R-squared:  0.3049 
F-statistic: 292.5 on 3 and 1991 DF,  p-value: < 2.2e-16

The coefficients show that market value and sales are significant, but assets are not very significant in determining profits. The market value and sales coefficients both have small p-values while the assets coefficient has a higher p-value. The model fits the data relatively well. The r-squared value is pretty low, so that indicates that the model isn’t the best, but the f-statistic is high with a low p-value so the model fits the data well from that perspective.

Find variable of effect

Now I will use the anova() function to find the variable that effects the profits the most.

anova(profit.est)

Analysis of Variance Table

Response: profits
              Df Sum Sq Mean Sq F value    Pr(>F)    
assets         1  312.8  312.84  144.40 < 2.2e-16 ***
marketvalue    1 1552.8 1552.77  716.71 < 2.2e-16 ***
sales          1   35.8   35.79   16.52 5.001e-05 ***
Residuals   1991 4313.6    2.17                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The data shows that market value has the greatest effect on profits, because the F-value is the largest and the p-value is small.

Plot residuals

Now I will plot residuals to make sure there is no bias.

hist(residuals(profit.est))

The data is relatively symmetric around 0, so there the model works well.

Comparing Japanese and American companies

Model for Japan

I will use the same methods as I did before to build a model for Japan and run the anova() function on it. I will also make a plot of residuals to check for bias.

Forbes.Japan <- filter (Forbes, country == "Japan")
profit.est.Japan <- lm(profits ~ assets + marketvalue + sales, data = Forbes.Japan)
anova(profit.est.Japan)

Analysis of Variance Table

Response: profits
             Df  Sum Sq Mean Sq F value Pr(>F)    
assets        1 284.846 284.846 433.311 <2e-16 ***
marketvalue   1 162.574 162.574 247.310 <2e-16 ***
sales         1   0.024   0.024   0.036 0.8497    
Residuals   312 205.100   0.657                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

hist(residuals(profit.est.Japan))

Assets seem to be the most importamt variable for Japan because it has the highest F-value and a low p-value. Sales is definitely not important because the p-value is really high. The plot of residuals shows that the model is good because it is symmetrical around 0.

Model for US

I will use the same method that I used for Japan to find the most important variable for the US and graph its residuals to look for bias.

Forbes.US <- filter (Forbes, country == "United States")
profit.est.US <- lm(profits ~ assets + marketvalue + sales, data = Forbes.US)
anova(profit.est.US)

Analysis of Variance Table

Response: profits
             Df Sum Sq Mean Sq  F value    Pr(>F)    
assets        1 975.23  975.23 1479.783 < 2.2e-16 ***
marketvalue   1 895.96  895.96 1359.498 < 2.2e-16 ***
sales         1  30.41   30.41   46.142 2.253e-11 ***
Residuals   744 490.32    0.66                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

hist(residuals(profit.est.US))

Assets seem to be the most important variable for the US as well, though its F-value is similar to the F-value for assets, and the p-values are the same. The main difference between the US and Japan is that for Japan, sales is not a significant variable but for the US it is. The histogram for US residuals is symmetrical, so the model works well.