Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 32 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 54 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 68 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 186 is
invalid UTF-8
Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 278 is
invalid UTF-8
X rank name country
Min. : 1.0 Min. : 1.0 Length :2000 Length :2000
1st Qu.: 500.8 1st Qu.: 500.8 N.unique :2000 N.unique : 61
Median :1000.5 Median :1000.5 N.blank : 0 N.blank : 0
Mean :1000.5 Mean :1000.5 Min.nchar: NA Min.nchar: 4
3rd Qu.:1500.2 3rd Qu.:1500.2 Max.nchar: NA Max.nchar: 28
Max. :2000.0 Max. :2000.0
category sales profits assets
Length :2000 Min. : 0.010 Min. :-25.8300 Min. : 0.270
N.unique : 27 1st Qu.: 2.018 1st Qu.: 0.0800 1st Qu.: 4.025
N.blank : 0 Median : 4.365 Median : 0.2000 Median : 9.345
Min.nchar: 5 Mean : 9.697 Mean : 0.3811 Mean : 34.042
Max.nchar: 32 3rd Qu.: 9.547 3rd Qu.: 0.4400 3rd Qu.: 22.793
Max. :256.330 Max. : 20.9600 Max. :1264.030
NAs :5
marketvalue
Min. : 0.02
1st Qu.: 2.72
Median : 5.15
Mean : 11.88
3rd Qu.: 10.60
Max. :328.54
# Calculate mean and total profits by company categorycat_profits <-aggregate(profits ~ category, data=forbes, FUN=function(x) c(Mean =mean(x, na.rm=TRUE), Total =sum(x, na.rm=TRUE)))print(cat_profits)
# Aggregate total profits and sales by countrycountry_summary <-aggregate(cbind(profits, sales) ~ country, data=forbes, FUN=sum, na.rm=TRUE)# Sort to find the highest countrieshead(country_summary[order(-country_summary$profits), ], 5)
country profits sales
60 United States 487.40 7540.27
9 Canada 23.30 360.06
56 United Kingdom 21.72 1425.30
2 Australia 18.08 188.65
49 South Korea 15.60 358.62
country profits sales
60 United States 487.40 7540.27
28 Japan 7.07 3220.24
56 United Kingdom 21.72 1425.30
18 Germany -2.48 1350.79
16 France 7.37 1266.43
Profitability varies considerably across company types. Drugs & Biotechnology and Oil & Gas Operations have the highest average profits, while Telecommunications Services has the lowest average profit and is the only category with negative profits. In terms of total profits, Banking generates the highest overall profits.
By country, the United States generates the greatest total profits (487.40), followed by Canada and the United Kingdom. The United States also records the highest total sales (7540.27), followed by Japan and the United Kingdom.
Overall, profits and sales are concentrated in a few industries and countries, with the United States dominating both measures.
2. Country Comparison: USA vs Japan
# Subset data for USA and Japanusa_data <-subset(forbes, country =="United States")japan_data <-subset(forbes, country =="Japan")# 1. Compare the Best (Minimum) Rankmin(usa_data$rank)
[1] 1
min(japan_data$rank)
[1] 8
# 2. Compare the Average (Mean) Rankmean(usa_data$rank)
[1] 947.2756
mean(japan_data$rank)
[1] 1144.329
# Frequency of company categories for USA and Japanhead(sort(table(usa_data$category), decreasing=TRUE), 5)
Banking Diversified financials
83 60
Utilities Health care equipment & services
54 53
Retailing
53
The United States has the highest-ranked company in the dataset, with a rank of 1, while Japan’s highest-ranked company is ranked 8th. In addition, U.S. companies have a better average rank (947.28) than Japanese companies (1144.33), indicating stronger overall performance because lower rank values represent higher positions.
In both countries, Banking is the most common company category. However, the United States has a larger presence of Diversified Financials, Utilities, Health Care Equipment & Services, and Retailing, whereas Japan has relatively more companies in Consumer Durables, Transportation, and Capital Goods. These results suggest that the U.S. economy is more heavily represented by financial and service-oriented industries, while Japan has stronger representation in manufacturing and industrial sectors.
3. Multiple Linear Regression Model: All Companies
# Build the global multiple linear regression modelmodel_all <-lm(profits ~ assets + marketvalue + sales, data=forbes)# Summary for coefficients and R-squared (goodness-of-fit)summary(model_all)
Call:
lm(formula = profits ~ assets + marketvalue + sales, data = forbes)
Residuals:
Min 1Q Median 3Q Max
-29.2169 -0.0189 0.1160 0.2107 8.9495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1186259 0.0380039 -3.121 0.00183 **
assets -0.0008395 0.0003781 -2.220 0.02651 *
marketvalue 0.0363340 0.0018183 19.982 < 2e-16 ***
sales 0.0098892 0.0024331 4.064 5e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.472 on 1991 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.3059, Adjusted R-squared: 0.3049
F-statistic: 292.5 on 3 and 1991 DF, p-value: < 2.2e-16
# Residuals diagnosticshist(residuals(model_all), main="Histogram of Residuals (All Companies)", xlab="Residuals", col="salmon")
plot(model_all, which=1)
Model Summary & Goodness-of-Fit: The multiple linear regression model explains approximately 30.6% of the variation in corporate profits (R² = 0.3059). Market value (β = 0.0363, p < 0.001) and sales (β = 0.0099, p < 0.001) have significant positive effects on profits, while assets have a small but statistically significant negative effect (β = -0.00084, p = 0.0265).
Variable with the Greatest Effect: According to the ANOVA table, market value has the greatest effect on profits, with the largest Sum of Squares (1552.77) and F-statistic (716.71). This suggests that market value is the strongest predictor of profitability among the variables considered.
Residual Diagnostics: The residuals are centered near zero, although the distribution contains several extreme observations. The Residuals vs Fitted plot does not indicate severe departures from linearity, but a few outliers and some variation in residual spread are visible. Overall, the regression assumptions appear reasonably satisfied, though the model may be influenced by several extreme observations.
4. Regional Comparison Models: USA vs Japan
# Build regression model for American companiesmodel_usa <-lm(profits ~ assets + marketvalue + sales, data=usa_data)summary(model_usa)
Call:
lm(formula = profits ~ assets + marketvalue + sales, data = usa_data)
Residuals:
Min 1Q Median 3Q Max
-4.4954 -0.0620 0.1217 0.1953 7.8811
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1465337 0.0336610 -4.353 1.53e-05 ***
assets 0.0043831 0.0003655 11.993 < 2e-16 ***
marketvalue 0.0339639 0.0012747 26.645 < 2e-16 ***
sales 0.0138408 0.0020376 6.793 2.25e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8118 on 744 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.795, Adjusted R-squared: 0.7942
F-statistic: 961.8 on 3 and 744 DF, p-value: < 2.2e-16
Both models indicate that assets and market value are the most important predictors of profits. According to the ANOVA results, assets have the largest effect in both the United States (F = 1479.78) and Japan (F = 433.31). A key difference is that sales significantly affect profits in the U.S. model (p < 0.001) but are not significant in the Japanese model (p = 0.85). The U.S. model also has a higher R² (0.795) than the Japanese model (0.686), indicating stronger predictive performance.