Project 2

Author

u1535008

Project 2: Market Prices

Reading in the csv file

forb <- read.csv("Forbes2000.csv")

Question 1

Using this file, explore the relationship of profits to the other variables. Which company types generate the highest (and lowest) profits? Which countries generate the most profits? What about sales?

To look at high and low profits and sales, I use the average and the total for each of these categories.

head(forb)
  X rank                name        country             category  sales profits
1 1    1           Citigroup  United States              Banking  94.71   17.85
2 2    2    General Electric  United States        Conglomerates 134.19   15.59
3 3    3 American Intl Group  United States            Insurance  76.66    6.46
4 4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96
5 5    5                  BP United Kingdom Oil & gas operations 232.57   10.27
6 6    6     Bank of America  United States              Banking  49.01   10.81
   assets marketvalue
1 1264.03      255.30
2  626.93      328.54
3  647.66      194.87
4  166.99      277.02
5  177.57      173.54
6  736.45      117.55
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Looking at profits

First, I looked at the average profit by company type (using the group_by() function) and then arranged it first in descending order, then ascending order to get the max and min. I did the same afterwards for total profit for company type. After this, I use the same process for country, but I group the data by country instead of company type.

forb |>
  group_by(category) |>
  summarise(
    avgprofit = mean(profits)
  ) |>
  arrange(desc(avgprofit))
# A tibble: 27 × 2
   category                      avgprofit
   <chr>                             <dbl>
 1 Drugs & biotechnology             1.45 
 2 Oil & gas operations              1.31 
 3 Conglomerates                     1.01 
 4 Food drink & tobacco              0.594
 5 Software & services               0.568
 6 Consumer durables                 0.566
 7 Household & personal products     0.550
 8 Diversified financials            0.500
 9 Retailing                         0.476
10 Semiconductors                    0.437
# ℹ 17 more rows
forb |>
  group_by(category) |>
  summarise(
    avgprofit = mean(profits)
  ) |>
  arrange(avgprofit)
# A tibble: 27 × 2
   category                        avgprofit
   <chr>                               <dbl>
 1 Trading companies                  0.028 
 2 Capital goods                      0.0955
 3 Business services & supplies       0.171 
 4 Materials                          0.196 
 5 Construction                       0.198 
 6 Technology hardware & equipment    0.206 
 7 Media                              0.211 
 8 Utilities                          0.211 
 9 Food markets                       0.249 
10 Hotels restaurants & leisure       0.259 
# ℹ 17 more rows

On average, the drugs and biotechnology company type makes the most profit (1.45), while trading companies make the least profit on average (0.028).

forb |>
  group_by(category) |>
  summarise(
    totalprofit = sum(profits)
  ) |>
  arrange(desc(totalprofit))
# A tibble: 27 × 2
   category                         totalprofit
   <chr>                                  <dbl>
 1 Banking                                132. 
 2 Oil & gas operations                   118. 
 3 Diversified financials                  78.9
 4 Drugs & biotechnology                   65.2
 5 Food drink & tobacco                    49.3
 6 Consumer durables                       41.9
 7 Retailing                               41.9
 8 Conglomerates                           31.4
 9 Household & personal products           24.2
10 Health care equipment & services        23.5
# ℹ 17 more rows
forb |>
  group_by(category) |>
  summarise(
    totalprofit = sum(profits)
  ) |>
  arrange(totalprofit)
# A tibble: 27 × 2
   category                        totalprofit
   <chr>                                 <dbl>
 1 Trading companies                      0.7 
 2 Capital goods                          5.06
 3 Aerospace & defense                    5.48
 4 Food markets                           8.22
 5 Hotels restaurants & leisure           9.57
 6 Semiconductors                        11.4 
 7 Business services & supplies          12.0 
 8 Technology hardware & equipment       12.1 
 9 Media                                 12.8 
10 Chemicals                             13.0 
# ℹ 17 more rows

When we look at total profits among a company types, banking made the most total profit (132), while trading companies made the least total profit (0.7).

forb |>
  group_by(country) |>
  summarise(
    avgprofit = mean(profits)
  ) |>
  arrange(desc(avgprofit))
# A tibble: 61 × 2
   country                     avgprofit
   <chr>                           <dbl>
 1 Netherlands/ United Kingdom     5.32 
 2 United Kingdom/ Australia       1.64 
 3 Russia                          1.24 
 4 Kong/China                      1.19 
 5 Panama/ United Kingdom          1.18 
 6 Australia/ United Kingdom       1.18 
 7 Islands                         0.74 
 8 Finland                         0.641
 9 Norway                          0.631
10 China                           0.622
# ℹ 51 more rows
forb |>
  group_by(country) |>
  summarise(
    avgprofit = mean(profits)
  ) |>
  arrange(avgprofit)
# A tibble: 61 × 2
   country                      avgprofit
   <chr>                            <dbl>
 1 France/ United Kingdom        -2.83   
 2 Luxembourg                    -0.125  
 3 United Kingdom/ South Africa  -0.1    
 4 Netherlands                   -0.0389 
 5 Germany                       -0.0382 
 6 Africa                        -0.00500
 7 Japan                          0.0224 
 8 Korea                          0.03   
 9 Israel                         0.0987 
10 Austria                        0.108  
# ℹ 51 more rows

The Netherlands & UK (probably joint companies) have the highest average profit overall (5.32), while a joint company between France & the UK had the lowest average profit overall (-2.83).

forb |>
  group_by(country) |>
  summarise(
    totalprofit = sum(profits)
  ) |>
  arrange(desc(totalprofit))
# A tibble: 61 × 2
   country                     totalprofit
   <chr>                             <dbl>
 1 Canada                            23.3 
 2 South Korea                       15.6 
 3 China                             15.5 
 4 Russia                            14.9 
 5 Switzerland                       13.8 
 6 Spain                             11.7 
 7 Netherlands/ United Kingdom       10.6 
 8 India                              9.37
 9 Bermuda                            9.11
10 Italy                              7.52
# ℹ 51 more rows
forb |>
  group_by(country) |>
  summarise(
    totalprofit = sum(profits)
  ) |>
  arrange(totalprofit)
# A tibble: 61 × 2
   country                      totalprofit
   <chr>                              <dbl>
 1 France/ United Kingdom           -2.83  
 2 Germany                          -2.48  
 3 Netherlands                      -1.09  
 4 Luxembourg                       -0.25  
 5 United Kingdom/ South Africa     -0.1   
 6 Africa                           -0.0100
 7 Peru                              0.11  
 8 Venezuela                         0.12  
 9 Korea                             0.12  
10 United Kingdom/ Netherlands       0.14  
# ℹ 51 more rows

When looking at total profit, Canada has the highest total profit (23.3), while joint companies between France & the UK have the lowest total profit (-2.83).

Looking at sales

forb |>
  group_by(category) |>
  summarise(
    avgsales = mean(sales)
  ) |>
  arrange(desc(avgsales))
# A tibble: 27 × 2
   category                        avgsales
   <chr>                              <dbl>
 1 Trading companies                   29.1
 2 Consumer durables                   24.1
 3 Food markets                        21.3
 4 Oil & gas operations                19.5
 5 Conglomerates                       16.0
 6 Aerospace & defense                 14.4
 7 Retailing                           13.8
 8 Telecommunications services         12.7
 9 Insurance                           12.0
10 Technology hardware & equipment     10.8
# ℹ 17 more rows
forb |>
  group_by(category) |>
  summarise(
    avgsales = mean(sales)
  ) |>
  arrange(avgsales)
# A tibble: 27 × 2
   category                      avgsales
   <chr>                            <dbl>
 1 Banking                           5.31
 2 Software & services               5.38
 3 Hotels restaurants & leisure      5.70
 4 Diversified financials            5.74
 5 Semiconductors                    5.90
 6 Media                             6.35
 7 Materials                         6.54
 8 Business services & supplies      6.70
 9 Construction                      6.70
10 Household & personal products     7.19
# ℹ 17 more rows

On average, trading companies make the most sales (29.1), while banking companies make the least sales (5.31).

forb |>
  group_by(category) |>
  summarise(
    totalsales = sum(sales)
  ) |>
  arrange(desc(totalsales))
# A tibble: 27 × 2
   category                    totalsales
   <chr>                            <dbl>
 1 Consumer durables                1780.
 2 Oil & gas operations             1755.
 3 Banking                          1663.
 4 Insurance                        1340.
 5 Retailing                        1211.
 6 Utilities                         950.
 7 Diversified financials            907.
 8 Telecommunications services       853.
 9 Food drink & tobacco              758.
10 Trading companies                 727.
# ℹ 17 more rows
forb |>
  group_by(category) |>
  summarise(
    totalsales = sum(sales)
  ) |>
  arrange(totalsales)
# A tibble: 27 × 2
   category                      totalsales
   <chr>                              <dbl>
 1 Semiconductors                      153.
 2 Software & services                 167.
 3 Hotels restaurants & leisure        211.
 4 Aerospace & defense                 274.
 5 Household & personal products       316.
 6 Media                               387.
 7 Chemicals                           408.
 8 Drugs & biotechnology               412.
 9 Capital goods                       433.
10 Business services & supplies        469.
# ℹ 17 more rows

When we look at total sales among a company types, consumer durable companies made the most total sales (1780), while semiconductor companies made the least total sales (153).

forb |>
  group_by(country) |>
  summarise(
    avgsales = mean(sales)
  ) |>
  arrange(desc(avgsales))
# A tibble: 61 × 2
   country                     avgsales
   <chr>                          <dbl>
 1 Netherlands/ United Kingdom     92.1
 2 Germany                         20.8
 3 France                          20.1
 4 Netherlands                     17.0
 5 Korea                           15.0
 6 Luxembourg                      14.2
 7 Switzerland                     12.5
 8 Australia/ United Kingdom       11.6
 9 Norway                          10.8
10 United Kingdom                  10.4
# ℹ 51 more rows
forb |>
  group_by(country) |>
  summarise(
    avgsales = mean(sales)
  ) |>
  arrange(avgsales)
# A tibble: 61 × 2
   country                avgsales
   <chr>                     <dbl>
 1 Peru                       0.17
 2 Venezuela                  0.98
 3 France/ United Kingdom     1.01
 4 Pakistan                   1.23
 5 Jordan                     1.33
 6 Bahamas                    1.35
 7 Philippines                1.56
 8 Chile                      1.60
 9 Cayman Islands             1.66
10 Malaysia                   1.72
# ℹ 51 more rows

The Netherlands & UK (joint companies) had the highest average sales overall (92.1), while a Peru had the lowest average sales overall (0.17).

forb |>
  group_by(country) |>
  summarise(
    totalsales = sum(sales)
  ) |>
  arrange(desc(totalsales))
# A tibble: 61 × 2
   country        totalsales
   <chr>               <dbl>
 1 United States       7554.
 2 Japan               3220.
 3 United Kingdom      1431.
 4 Germany             1351.
 5 France              1266.
 6 Netherlands          477.
 7 Switzerland          424.
 8 Italy                419.
 9 Canada               360.
10 South Korea          359.
# ℹ 51 more rows
forb |>
  group_by(country) |>
  summarise(
    totalsales = sum(sales)
  ) |>
  arrange(totalsales)
# A tibble: 61 × 2
   country                      totalsales
   <chr>                             <dbl>
 1 Peru                               0.17
 2 Venezuela                          0.98
 3 France/ United Kingdom             1.01
 4 Pakistan                           1.23
 5 Jordan                             1.33
 6 Bahamas                            1.35
 7 United Kingdom/ South Africa       2.06
 8 New Zealand                        2.64
 9 Philippines                        3.13
10 Czech Republic                     3.61
# ℹ 51 more rows

When looking at the total sales, the United States had the highest sales overall (7554), while Peru had the lowest total sales overall.

Question 2

Compare the USA and Japan. Which country has the highest rank using Forbes ranking? Which company types are more common in the USA? In Japan?

First, I create a new dataset for the United States and then one for Japan where I can see the data arranged by rank. Then I create a table of the company type so that I can see the count for each of these datasets and sort it.

forbusa = forb |>
  filter(country == "United States") |>
  arrange(rank)
head(forbusa)
  X rank                name       country               category  sales
1 1    1           Citigroup United States                Banking  94.71
2 2    2    General Electric United States          Conglomerates 134.19
3 3    3 American Intl Group United States              Insurance  76.66
4 4    4          ExxonMobil United States   Oil & gas operations 222.88
5 6    6     Bank of America United States                Banking  49.01
6 9    9          Fannie Mae United States Diversified financials  53.13
  profits  assets marketvalue
1   17.85 1264.03      255.30
2   15.59  626.93      328.54
3    6.46  647.66      194.87
4   20.96  166.99      277.02
5   10.81  736.45      117.55
6    6.48 1019.17       76.84
forbjapan = forb |>
  filter(country == "Japan") |>
  arrange(rank)
head(forbjapan)
   X rank                 name country                    category  sales
1  8    8         Toyota Motor   Japan           Consumer durables 135.82
2 30   30     Nippon Tel & Tel   Japan Telecommunications services  92.41
3 49   49          Honda Motor   Japan           Consumer durables  67.44
4 51   51         Nissan Motor   Japan           Consumer durables  57.77
5 72   72 Tokyo Electric Power   Japan                   Utilities  41.62
6 82   82                 Sony   Japan           Consumer durables  63.23
  profits assets marketvalue
1    7.99 171.71      115.40
2    2.17 150.87       73.00
3    3.61  63.09       40.61
4    4.19  60.56       41.71
5    1.40 116.68       30.63
6    0.98  68.04       38.00

The United States has the highest rank using the Forbes ranking. It has a company that is ranked 1st, while Japan’s highest ranked company is 8th.

compusa <- table(forbusa$category)
compusa <- sort(compusa)
compusa

                    Food markets              Aerospace & defense 
                               9                               10 
                   Capital goods                    Conglomerates 
                              10                               11 
                       Chemicals                   Semiconductors 
                              13                               16 
     Telecommunications services     Hotels restaurants & leisure 
                              16                               17 
                  Transportation                     Construction 
                              17                               18 
   Household & personal products            Drugs & biotechnology 
                              20                               21 
             Software & services                Consumer durables 
                              21                               25 
                       Materials             Food drink & tobacco 
                              26                               28 
                           Media     Business services & supplies 
                              28                               31 
            Oil & gas operations  Technology hardware & equipment 
                              32                               33 
                       Insurance Health care equipment & services 
                              46                               53 
                       Retailing                        Utilities 
                              53                               54 
          Diversified financials                          Banking 
                              60                               83 
compjapan <- table(forbjapan$category)
compjapan <- sort(compjapan)
compjapan

                    Food markets      Telecommunications services 
                               2                                2 
            Oil & gas operations                   Semiconductors 
                               3                                3 
             Software & services Health care equipment & services 
                               3                                4 
 Technology hardware & equipment    Household & personal products 
                               6                                7 
                           Media            Drugs & biotechnology 
                               7                                9 
                       Insurance             Food drink & tobacco 
                               9                               10 
                       Utilities                        Materials 
                              11                               12 
                       Retailing                Trading companies 
                              12                               13 
                       Chemicals     Business services & supplies 
                              14                               17 
                    Construction                    Capital goods 
                              18                               19 
                  Transportation                Consumer durables 
                              20                               22 
          Diversified financials                          Banking 
                              24                               69 

It looks like banking companies are the most common type of company for both the US and Japan.

Question 3

Now build a multiple linear regression model to estimate profits using assets, market value and sales. Use the summary() function to find the coefficients and goodness-of-fit of the model. Use the anova() function to identify which variable appears to have the greatest effect on profits. Remember to look at the distribution of residuals.

fit1 <- lm(profits ~ assets + marketvalue + sales, data = forb)
summary(fit1)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = forb)

Residuals:
     Min       1Q   Median       3Q      Max 
-29.2169  -0.0189   0.1160   0.2107   8.9495 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1186259  0.0380039  -3.121  0.00183 ** 
assets      -0.0008395  0.0003781  -2.220  0.02651 *  
marketvalue  0.0363340  0.0018183  19.982  < 2e-16 ***
sales        0.0098892  0.0024331   4.064    5e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.472 on 1991 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.3059,    Adjusted R-squared:  0.3049 
F-statistic: 292.5 on 3 and 1991 DF,  p-value: < 2.2e-16

The coefficients for the multiple linear regression model are the following:

  • Assets (-0.0008395): As assets held by a company increase by $1 million, profits decrease by $839.5. This coefficient has a p-value of 0.02651, which means that it is only statistically significant at the 90% confidence level.

  • Market Value (0.0363340): As the market value of a company increases by $1 million, the profits of the company increase by $36,334. This coefficient has a p-value of < 0.001, so it is statistically significant at the 99% confidence level.

  • Sales (0.0098892): As the sales of a company increase by $1 million, profits increase by $9,889.20. This coefficient also has a p-value of < 0.001, so it is statistically significant at the 99% confidence level.

The adjusted R-squared for this model is 0.3049, which means that it explains about 30.49% of the variance in the data.

anova(fit1)
Analysis of Variance Table

Response: profits
              Df Sum Sq Mean Sq F value    Pr(>F)    
assets         1  312.8  312.84  144.40 < 2.2e-16 ***
marketvalue    1 1552.8 1552.77  716.71 < 2.2e-16 ***
sales          1   35.8   35.79   16.52 5.001e-05 ***
Residuals   1991 4313.6    2.17                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the results of the anova(), market value seems to have the biggest effect on profits because it has the highest value for the sum sq and also the highest F-value. It is also statistically significant at the 99% confidence level with a p-value < 0.001.

hist(residuals(fit1))

plot(fit1, which = 1)

The line on the residual plot is fairly flat, which means that we may have a linear relationship between the variables that we chose; however, they do seem to generally be clustered in one area.

Question 4

Build two models using the same variables but for Japanese and American companies. Use the anova() function to look at which variables are the most important for each region? What differences do you observe?

fitusa <- lm(profits ~ assets + marketvalue + sales, data = forbusa)
summary(fitusa)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = forbusa)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4954 -0.0620  0.1217  0.1953  7.8811 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1465337  0.0336610  -4.353 1.53e-05 ***
assets       0.0043831  0.0003655  11.993  < 2e-16 ***
marketvalue  0.0339639  0.0012747  26.645  < 2e-16 ***
sales        0.0138408  0.0020376   6.793 2.25e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8118 on 744 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared:  0.795, Adjusted R-squared:  0.7942 
F-statistic: 961.8 on 3 and 744 DF,  p-value: < 2.2e-16

The coefficients for the multiple linear regression model of the US are the following:

  • Assets (0.0043831): As assets held by a company increase by $1 million, profits increase by $4383.10.

  • Market Value (0.0339639): As the market value of a company increases by $1 million, the profits of the company increase by $33,963.90.

  • Sales (0.0138408): As the sales of a company increase by $1 million, profits increase by $13,840.80.

Each of these coefficients has a p-value of < 0.001, so they are statistically significant at the 99% confidence level.

The adjusted R-squared for this model is 0.7942, which means that it explains about 79.42% of the variance in the data (more than the model that included all countries).

anova(fitusa)
Analysis of Variance Table

Response: profits
             Df Sum Sq Mean Sq  F value    Pr(>F)    
assets        1 975.23  975.23 1479.783 < 2.2e-16 ***
marketvalue   1 895.96  895.96 1359.498 < 2.2e-16 ***
sales         1  30.41   30.41   46.142 2.253e-11 ***
Residuals   744 490.32    0.66                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the results of the anova(), assets seems to have the biggest effect on profits, closely followed by market value. These results are both statistically significant at the 99% confidence level with a p-value < 0.001.

fitjapan <- lm(profits ~ assets + marketvalue + sales, data = forbjapan)
summary(fitjapan)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = forbjapan)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2316 -0.1651  0.0184  0.2160  5.3909 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.0731901  0.0551181  -1.328    0.185    
assets      -0.0126813  0.0004999 -25.368   <2e-16 ***
marketvalue  0.0765111  0.0062517  12.239   <2e-16 ***
sales       -0.0006585  0.0034725  -0.190    0.850    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8108 on 312 degrees of freedom
Multiple R-squared:  0.6857,    Adjusted R-squared:  0.6827 
F-statistic: 226.9 on 3 and 312 DF,  p-value: < 2.2e-16

The coefficients for the multiple linear regression model of Japan are the following:

  • Assets (-0.0128613): As assets held by a company increase by $1 million, profits decrease by $12,861.30.

  • Market Value (0.0765111): As the market value of a company increases by $1 million, the profits of the company increase by $76,511.10.

  • Sales (-0.0006585): As the sales of a company increase by $1 million, profits decrease by $658.50.

Only the assets and market value coefficients have a p-value of < 0.001, so they are statistically significant at the 99% confidence level. The sales coefficient is not statistically significant.

The adjusted R-squared for this model is 0.6827, which means that it explains about 68.27% of the variance in the data (more than the model that included all countries, but less than the model for the US).

anova(fitjapan)
Analysis of Variance Table

Response: profits
             Df  Sum Sq Mean Sq F value Pr(>F)    
assets        1 284.846 284.846 433.311 <2e-16 ***
marketvalue   1 162.574 162.574 247.310 <2e-16 ***
sales         1   0.024   0.024   0.036 0.8497    
Residuals   312 205.100   0.657                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the results of the anova(), assets seems to have the biggest effect on profits. These results are statistically significant at the 99% confidence level with a p-value < 0.001.

There are some slight differences in the results of the model for the US and Japan. First, the sales coefficient was not significant for Japan, but it was significant for the US. Additionally, the assets coefficient for Japan was negative, while it was positive for the US. Finally, the adjusted R-squared was slightly higher for the US compared to Japan, meaning that the model is slightly better at explaining the variance in the data for the US.