Top 2000 Companies Forbes 2017

An Introduction

This data processing report is a part of task series to complete my course education in Full Stack Academy - Algoritma Data Science Education, Jakarta, Indonesia. This first task is about practising competencies for introduction to Programming in Data Science with base R subject. The data itself, comes from Forbes magazine that contains a report about Top 2000 companies in the world which I downloaded it from Kaggle.com

Data source: https://www.kaggle.com/ash316/forbes-top-2000-companies

Feel free to provide me input for enhancement.

01. Read Data

forbes <- read.csv("Forbes Top2000 2017.csv ")

02. Data Inspection

a. Check data with head command for 15 top records

head(forbes, 15)

b. Check data with tail command for 15 bottom records

tail(forbes, 15)

c. Check data dimension with dim command

dim(forbes)

## [1] 2000   10

d. Column names with names

names(forbes)

##  [1] "X"            "Rank"         "Company"      "Country"      "Sales"       
##  [6] "Profits"      "Assets"       "Market.Value" "Sector"       "Industry"

03. Data Cleansing

a. Check data structure with str command

str(forbes)

## 'data.frame':    2000 obs. of  10 variables:
##  $ X           : logi  NA NA NA NA NA NA ...
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Company     : Factor w/ 1999 levels "3i Group","3M",..: 921 393 277 1021 1927 43 216 219 124 1824 ...
##  $ Country     : Factor w/ 61 levels "Argentina","Australia",..: 10 10 59 59 59 10 59 10 59 26 ...
##  $ Sales       : num  151.4 134.2 222.9 102.5 97.6 ...
##  $ Profits     : num  42 35 24.1 24.2 21.9 27.8 16.6 24.9 45.2 17.1 ...
##  $ Assets      : num  3473 3017 621 2513 1943 ...
##  $ Market.Value: num  230 200 410 307 274 ...
##  $ Sector      : Factor w/ 11 levels "","Consumer Discretionary",..: 5 5 5 5 5 5 5 5 8 2 ...
##  $ Industry    : Factor w/ 81 levels "","Advertising",..: 53 69 50 53 53 69 53 53 19 9 ...

b. Check missing value with colSums is.na and anyNA command

colSums(is.na(forbes))

##            X         Rank      Company      Country        Sales      Profits 
##         2000            0            0            0            0            0 
##       Assets Market.Value       Sector     Industry 
##            0            0            0            0

anyNA(forbes)

## [1] TRUE

c. Removing unnecessary columns: X and assign to forbes_clean

forbes_clean <- forbes [,c(-1)]
head(forbes_clean)

04. Data Explanation

a. Brief Explanation with summary command

summary(forbes_clean)

##       Rank                             Company               Country   
##  Min.   :   1.0   Merck                    :   2   United States :564  
##  1st Qu.: 500.8   3i Group                 :   1   Japan         :229  
##  Median :1000.5   3M                       :   1   China         :200  
##  Mean   :1000.5   77 Bank                  :   1   United Kingdom: 91  
##  3rd Qu.:1500.2   AAC Technologies Holdings:   1   South Korea   : 64  
##  Max.   :2000.0   Aareal Bank              :   1   Hong Kong     : 62  
##                   (Other)                  :1993   (Other)       :790  
##      Sales            Profits            Assets          Market.Value    
##  Min.   :  0.001   Min.   :-13.000   Min.   :   0.001   Min.   :  0.072  
##  1st Qu.:  4.000   1st Qu.:  0.318   1st Qu.:  10.875   1st Qu.:  6.600  
##  Median :  8.800   Median :  0.613   Median :  22.900   Median : 11.950  
##  Mean   : 17.665   Mean   :  1.241   Mean   :  84.534   Mean   : 24.418  
##  3rd Qu.: 17.425   3rd Qu.:  1.300   3rd Qu.:  52.400   3rd Qu.: 24.400  
##  Max.   :485.300   Max.   : 45.200   Max.   :3473.200   Max.   :752.000  
##                                                                          
##                     Sector                    Industry   
##  Financials            :583                       : 491  
##  Consumer Discretionary:237   Regional Banks      : 183  
##  Industrials           :209   Oil & Gas Operations:  67  
##                        :197   Electric Utilities  :  64  
##  Materials             :174   Investment Services :  62  
##  Information Technology:130   Major Banks         :  61  
##  (Other)               :470   (Other)             :1072

Summaries :
1. United States has the highest number of companies, followed by Japan and China
2. Financial Sectors has the highest number of companies
3. Regional Banks has the highest number of companies in Industry
4. Average Sales of total companies is 17.665, Median is 8.800, Max: 485.300
5. Average Profit of total companies is 1.241, Median is 0.613, Max: 45.200
6. Average Total Assets of total companies is 84.534, Median is 22.900, Max: 3.473.200
7. Market value average of total companies is 24.818, Median is 11.950, Max: 752.000

b. Check Outlier

Outlier profit

boxplot(forbes_clean$Profits)

Outlier Sales

boxplot(forbes_clean$Sales)

Outlier Assets

boxplot(forbes_clean$Assets)

Outlier Market Value

boxplot(forbes_clean$Market.Value)

05. Data Transformation

a. Companies with the highest Sales

forbes_clean[forbes_clean$Sales == 485.300,]

b. Companies with the highest Profits

forbes_clean[forbes_clean$Profits == 45.200,]

c. Companies with the highest Assets

forbes_clean[forbes_clean$Assets == 3473.200,]

d. Companies with the highest Market Value

forbes_clean[forbes_clean$Market.Value == 752.000,]

e. List of companies with highest profit

forbes_clean_profit<-order(forbes_clean$Profits, decreasing = TRUE)
pro <- forbes_clean[forbes_clean_profit,]
head(pro)

histpro <- barplot(head(pro$Profits,20),
                  cex.axis = 1.0, cex.names = 0.8,
                  ylim = c(0, 60), main = "Companies with Highest Profit"
)

text(head(pro$Company,20),
     x = histpro,
     offset = -0.1,
     y = -2,
     cex = 0.6,
     srt = 45,
     xpd = TRUE,
     pos = 2 )

f. List of companies with highest market value

forbes_clean_Market<-order(forbes_clean$Market.Value, decreasing = TRUE)
mv <- forbes_clean[forbes_clean_Market,]
head(mv)

histmv <- barplot(head(mv$Market.Value,20),
                  cex.axis = 1.0, cex.names = 0.8,
                  ylim = c(0, 1000), main = "Companies with Highest Market Value"
)

text(head(mv$Company,20),
     x = histmv,
     offset = -0.1,
     y = -60,
     cex = 0.6,
     srt = 45,
     xpd = TRUE,
     pos = 2 )

g. List of companies with highest assets

forbes_clean_assets<-order(forbes_clean$Assets, decreasing = TRUE)
asset <- forbes_clean[forbes_clean_assets,]
head(asset)

histasset <- barplot(head(asset$Assets,20),
                  cex.axis = 1.0, cex.names = 0.8,
                  ylim = c(0, 4000), main = "Companies with Highest Assets")

text(head(asset$Company,20),
     x = histasset,
     offset = -0.1,
     y = -100,
     cex = 0.6,
     srt = 45,
     xpd = TRUE,
     pos = 2 )

06. A Short explanatory

From the data process stage, we could get some preliminary views about some of those top companies. We could see companies in 3 main angle which are:

1st, the companies with highest profits such as Apple, ICBC, China Construction Bank, Agricultural Bank of China and Bank of China

2nd, the companies with highest market value such as Apple, Alphabet, Microsoft, Amazon and Berkshire Hathaway

3rd, the companies with highest assets such as ICBC, Fannie Mae, China Construction Bank, Agricultural Bank of China and Bank Of China

Recommendation for investment:
From this Forbes top 2000 companies we could focus to narrow down our preliminary views in long term investment plan only to this 15 top companies, because it has highest potentials and strength to sustain and expand their business.