Final_Project_Part 2

Author

Cienna Kim

Published

June 18, 2026

Introduction

The file Forbes2000.csv contains information about 2000 companies published by Forbes. The purpose of this analysis is to explore the relationship between profits and other company characteristics, compare companies from the United States and Japan, and build linear regression models to explain profits.

Load Data

Forbes <- read.csv("Forbes2000.csv")

str(Forbes)
'data.frame':   2000 obs. of  9 variables:
 $ X          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ rank       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ name       : chr  "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
 $ country    : chr  "United States" "United States" "United States" "United States" ...
 $ category   : chr  "Banking" "Conglomerates" "Insurance" "Oil & gas operations" ...
 $ sales      : num  94.7 134.2 76.7 222.9 232.6 ...
 $ profits    : num  17.85 15.59 6.46 20.96 10.27 ...
 $ assets     : num  1264 627 648 167 178 ...
 $ marketvalue: num  255 329 195 277 174 ...
summary(Forbes)
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 32 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 54 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 68 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 186 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 278 is
invalid UTF-8
       X               rank               name           country    
 Min.   :   1.0   Min.   :   1.0   Length   :2000   Length   :2000  
 1st Qu.: 500.8   1st Qu.: 500.8   N.unique :2000   N.unique :  61  
 Median :1000.5   Median :1000.5   N.blank  :   0   N.blank  :   0  
 Mean   :1000.5   Mean   :1000.5   Min.nchar:  NA   Min.nchar:   4  
 3rd Qu.:1500.2   3rd Qu.:1500.2   Max.nchar:  NA   Max.nchar:  28  
 Max.   :2000.0   Max.   :2000.0                                    
                                                                    
      category        sales            profits             assets        
 Length   :2000   Min.   :  0.010   Min.   :-25.8300   Min.   :   0.270  
 N.unique :  27   1st Qu.:  2.018   1st Qu.:  0.0800   1st Qu.:   4.025  
 N.blank  :   0   Median :  4.365   Median :  0.2000   Median :   9.345  
 Min.nchar:   5   Mean   :  9.697   Mean   :  0.3811   Mean   :  34.042  
 Max.nchar:  32   3rd Qu.:  9.547   3rd Qu.:  0.4400   3rd Qu.:  22.793  
                  Max.   :256.330   Max.   : 20.9600   Max.   :1264.030  
                                    NAs    :5                            
  marketvalue    
 Min.   :  0.02  
 1st Qu.:  2.72  
 Median :  5.15  
 Mean   : 11.88  
 3rd Qu.: 10.60  
 Max.   :328.54  
                 

Profits by Company Type and Country

Company Types

profits_category <- aggregate(profits ~ category,
                              data = Forbes,
                              mean)

profits_category[order(-profits_category$profits), ]
                           category    profits
10            Drugs & biotechnology  1.4477778
19             Oil & gas operations  1.3055556
6                     Conglomerates  1.0145161
11             Food drink & tobacco  0.5938554
22              Software & services  0.5677419
8                 Consumer durables  0.5663514
15    Household & personal products  0.5497727
9            Diversified financials  0.4995570
20                        Retailing  0.4759091
21                   Semiconductors  0.4365385
2                           Banking  0.4220767
13 Health care equipment & services  0.3609231
16                        Insurance  0.3430000
1               Aerospace & defense  0.2884211
5                         Chemicals  0.2606000
14     Hotels restaurants & leisure  0.2586486
12                     Food markets  0.2490909
27                        Utilities  0.2114545
18                            Media  0.2106557
23  Technology hardware & equipment  0.2055932
7                      Construction  0.1981013
17                        Materials  0.1959794
3      Business services & supplies  0.1707143
26                   Transportation  0.1388462
4                     Capital goods  0.0954717
25                Trading companies  0.0280000
24      Telecommunications services -0.9080303

Interpretation

The table shows that Drugs & biotechnology has the highest average profits, while Telecommunications services has the lowest average profits. This indicates that average profits differ across company types.

Countries

profits_country <- aggregate(profits ~ country,
                             data = Forbes,
                             sum)

profits_country[order(-profits_country$profits), ]
                        country profits
60                United States  487.40
9                        Canada   23.30
56               United Kingdom   21.72
2                     Australia   18.08
49                  South Korea   15.60
12                        China   15.54
46                       Russia   14.87
52                  Switzerland   13.75
50                        Spain   11.74
37  Netherlands/ United Kingdom   10.64
22                        India    9.37
7                       Bermuda    9.11
27                        Italy    7.52
16                       France    7.37
20              Hong Kong/China    7.33
28                        Japan    7.07
15                      Finland    7.05
8                        Brazil    6.78
48                 South Africa    6.33
53                       Taiwan    5.68
14                      Denmark    5.22
39                       Norway    5.05
51                       Sweden    4.78
30                   Kong/China    4.76
24                      Ireland    3.88
47                    Singapore    3.58
6                       Belgium    3.54
35                       Mexico    3.41
34                     Malaysia    3.41
55                       Turkey    2.71
23                    Indonesia    2.69
3     Australia/ United Kingdom    2.35
19                       Greece    2.20
54                     Thailand    1.64
57    United Kingdom/ Australia    1.64
45                     Portugal    1.55
41       Panama/ United Kingdom    1.18
4                       Austria    0.86
26                       Israel    0.79
10               Cayman Islands    0.74
25                      Islands    0.74
11                        Chile    0.65
21                      Hungary    0.55
13               Czech Republic    0.42
38                  New Zealand    0.42
40                     Pakistan    0.41
32                      Liberia    0.28
44                       Poland    0.28
29                       Jordan    0.23
43                  Philippines    0.22
5                       Bahamas    0.20
58  United Kingdom/ Netherlands    0.14
31                        Korea    0.12
61                    Venezuela    0.12
42                         Peru    0.11
1                        Africa   -0.01
59 United Kingdom/ South Africa   -0.10
33                   Luxembourg   -0.25
36                  Netherlands   -1.09
18                      Germany   -2.48
17       France/ United Kingdom   -2.83

Interpretation

The United States generates the highest total profits by a large margin, followed by Canada and the United Kingdom. In contrast, Germany, France/United Kingdom, and the Netherlands have negative total profits. This indicates that profits vary considerably across countries.

Sales by Country

sales_country <- aggregate(sales ~ country,
                           data = Forbes,
                           sum)

sales_country[order(-sales_country$sales), ]
                        country   sales
60                United States 7553.75
28                        Japan 3220.24
56               United Kingdom 1430.98
18                      Germany 1350.79
16                       France 1266.43
36                  Netherlands  476.58
52                  Switzerland  423.53
27                        Italy  418.77
9                        Canada  360.06
49                  South Korea  358.62
50                        Spain  227.46
51                       Sweden  199.31
2                     Australia  194.05
37  Netherlands/ United Kingdom  184.20
7                       Bermuda  136.81
12                        China  127.49
15                      Finland  113.21
22                        India  104.44
53                       Taiwan   96.30
8                        Brazil   95.08
46                       Russia   92.07
6                       Belgium   91.03
39                       Norway   86.24
35                       Mexico   66.94
14                      Denmark   63.49
48                 South Africa   61.86
31                        Korea   60.02
47                    Singapore   58.96
55                       Turkey   56.56
20              Hong Kong/China   40.88
24                      Ireland   38.12
4                       Austria   33.14
19                       Greece   30.34
33                   Luxembourg   28.37
34                     Malaysia   27.46
45                     Portugal   27.19
3     Australia/ United Kingdom   23.19
30                   Kong/China   22.87
54                     Thailand   22.62
23                    Indonesia   17.15
26                       Israel   16.48
1                        Africa   13.64
57    United Kingdom/ Australia   10.01
10               Cayman Islands    8.30
58  United Kingdom/ Netherlands    7.54
21                      Hungary    6.74
25                      Islands    6.67
11                        Chile    6.41
41       Panama/ United Kingdom    5.93
44                       Poland    4.41
32                      Liberia    3.78
13               Czech Republic    3.61
43                  Philippines    3.13
38                  New Zealand    2.64
59 United Kingdom/ South Africa    2.06
5                       Bahamas    1.35
29                       Jordan    1.33
40                     Pakistan    1.23
17       France/ United Kingdom    1.01
61                    Venezuela    0.98
42                         Peru    0.17

Interpretation

The United States generates the highest total sales, followed by Japan, the United Kingdom, Germany, and France. The United States has substantially higher sales than any other country, indicating that sales are concentrated in a small number of countries.

Comparing the United States and Japan

Forbes Rank

min(Forbes$rank[Forbes$country == "United States"])
[1] 1
min(Forbes$rank[Forbes$country == "Japan"])
[1] 8

Interpretation

The United States has the highest-ranked company in the dataset (rank 1), while the highest-ranked Japanese company is ranked 8th. This indicates that the top-ranked company is from the United States.

Company Types

sort(table(Forbes$category[Forbes$country == "United States"]),
     decreasing = TRUE)

                         Banking           Diversified financials 
                              83                               60 
                       Utilities Health care equipment & services 
                              54                               53 
                       Retailing                        Insurance 
                              53                               46 
 Technology hardware & equipment             Oil & gas operations 
                              33                               32 
    Business services & supplies             Food drink & tobacco 
                              31                               28 
                           Media                        Materials 
                              28                               26 
               Consumer durables            Drugs & biotechnology 
                              25                               21 
             Software & services    Household & personal products 
                              21                               20 
                    Construction     Hotels restaurants & leisure 
                              18                               17 
                  Transportation                   Semiconductors 
                              17                               16 
     Telecommunications services                        Chemicals 
                              16                               13 
                   Conglomerates              Aerospace & defense 
                              11                               10 
                   Capital goods                     Food markets 
                              10                                9 
sort(table(Forbes$category[Forbes$country == "Japan"]),
     decreasing = TRUE)

                         Banking           Diversified financials 
                              69                               24 
               Consumer durables                   Transportation 
                              22                               20 
                   Capital goods                     Construction 
                              19                               18 
    Business services & supplies                        Chemicals 
                              17                               14 
               Trading companies                        Materials 
                              13                               12 
                       Retailing                        Utilities 
                              12                               11 
            Food drink & tobacco            Drugs & biotechnology 
                              10                                9 
                       Insurance    Household & personal products 
                               9                                7 
                           Media  Technology hardware & equipment 
                               7                                6 
Health care equipment & services             Oil & gas operations 
                               4                                3 
                  Semiconductors              Software & services 
                               3                                3 
                    Food markets      Telecommunications services 
                               2                                2 

Interpretation

Banking is the most common company type in both countries. However, the United States has more companies in Diversified financials, Utilities, and Health care equipment & services, while Japan has more companies in Consumer durables, Transportation, and Capital goods. This indicates differences in the industrial composition of the two countries.

Multiple Linear Regression

A multiple linear regression model was fit using assets, marketvalue, and sales as predictors of profits.

model1 <- lm(profits ~ assets + marketvalue + sales,
             data = Forbes)

summary(model1)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = Forbes)

Residuals:
     Min       1Q   Median       3Q      Max 
-29.2169  -0.0189   0.1160   0.2107   8.9495 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1186259  0.0380039  -3.121  0.00183 ** 
assets      -0.0008395  0.0003781  -2.220  0.02651 *  
marketvalue  0.0363340  0.0018183  19.982  < 2e-16 ***
sales        0.0098892  0.0024331   4.064    5e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.472 on 1991 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.3059,    Adjusted R-squared:  0.3049 
F-statistic: 292.5 on 3 and 1991 DF,  p-value: < 2.2e-16

ANOVA

anova(model1)
Analysis of Variance Table

Response: profits
              Df Sum Sq Mean Sq F value    Pr(>F)    
assets         1  312.8  312.84  144.40 < 2.2e-16 ***
marketvalue    1 1552.8 1552.77  716.71 < 2.2e-16 ***
sales          1   35.8   35.79   16.52 5.001e-05 ***
Residuals   1991 4313.6    2.17                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residuals

hist(residuals(model1),
     main = "Residual Distribution",
     xlab = "Residuals")

Interpretation

The regression model estimates profits using assets, marketvalue, and sales. The model explains approximately 31% of the variation in profits (R² = 0.306). All three predictors are statistically significant, although marketvalue has the strongest relationship with profits.

The ANOVA results show that marketvalue explains the largest amount of variation in profits (Sum Sq = 1552.8), followed by assets (Sum Sq = 312.8) and sales (Sum Sq = 35.8). This suggests that market value is the most important predictor of profits in the overall dataset.

The residual histogram shows that the residuals are not perfectly symmetric and contain some extreme values, indicating the presence of outliers.

Separate Models for the United States and Japan

United States

model_us <- lm(profits ~ assets + marketvalue + sales,
               data = subset(Forbes,
                             country == "United States"))

summary(model_us)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = subset(Forbes, 
    country == "United States"))

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4954 -0.0620  0.1217  0.1953  7.8811 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1465337  0.0336610  -4.353 1.53e-05 ***
assets       0.0043831  0.0003655  11.993  < 2e-16 ***
marketvalue  0.0339639  0.0012747  26.645  < 2e-16 ***
sales        0.0138408  0.0020376   6.793 2.25e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8118 on 744 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared:  0.795, Adjusted R-squared:  0.7942 
F-statistic: 961.8 on 3 and 744 DF,  p-value: < 2.2e-16
anova(model_us)
Analysis of Variance Table

Response: profits
             Df Sum Sq Mean Sq  F value    Pr(>F)    
assets        1 975.23  975.23 1479.783 < 2.2e-16 ***
marketvalue   1 895.96  895.96 1359.498 < 2.2e-16 ***
sales         1  30.41   30.41   46.142 2.253e-11 ***
Residuals   744 490.32    0.66                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Japan

model_japan <- lm(profits ~ assets + marketvalue + sales,
                  data = subset(Forbes,
                                country == "Japan"))

summary(model_japan)

Call:
lm(formula = profits ~ assets + marketvalue + sales, data = subset(Forbes, 
    country == "Japan"))

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2316 -0.1651  0.0184  0.2160  5.3909 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.0731901  0.0551181  -1.328    0.185    
assets      -0.0126813  0.0004999 -25.368   <2e-16 ***
marketvalue  0.0765111  0.0062517  12.239   <2e-16 ***
sales       -0.0006585  0.0034725  -0.190    0.850    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8108 on 312 degrees of freedom
Multiple R-squared:  0.6857,    Adjusted R-squared:  0.6827 
F-statistic: 226.9 on 3 and 312 DF,  p-value: < 2.2e-16
anova(model_japan)
Analysis of Variance Table

Response: profits
             Df  Sum Sq Mean Sq F value Pr(>F)    
assets        1 284.846 284.846 433.311 <2e-16 ***
marketvalue   1 162.574 162.574 247.310 <2e-16 ***
sales         1   0.024   0.024   0.036 0.8497    
Residuals   312 205.100   0.657                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

The regression models for the United States and Japan show some important differences.

For the United States, assets, marketvalue, and sales are all significant predictors of profits. The ANOVA results indicate that assets and marketvalue are the most important variables, with assets explaining slightly more variation than marketvalue.

For Japan, assets and marketvalue are significant predictors of profits, but sales is not significant (p = 0.850). The ANOVA results show that assets explains the largest amount of variation in profits, followed by marketvalue, while sales contributes almost nothing to the model.

These results suggest that profits are related to company size and market value in both countries, but sales appear to be more important in the United States than in Japan.

Conclusion

This analysis explored how profits vary across company types and countries. Drugs & biotechnology had the highest average profits, while Telecommunications services had the lowest average profits. The United States generated the highest total profits and total sales by a large margin compared with all other countries.

The comparison between the United States and Japan showed that the highest-ranked company in the dataset was from the United States. Banking was the most common company type in both countries, although the distribution of company types differed between them.

The regression analysis showed that marketvalue was the strongest predictor of profits in the overall dataset. Separate models for the United States and Japan indicated that assets and marketvalue were important predictors in both countries, while sales was significant only in the United States. These findings suggest that the factors associated with company profits vary across regions.