This data processing report is a part of task series to complete my course education in Full Stack Academy - Algoritma Data Science Education, Jakarta, Indonesia. This first task is about practising competencies for introduction to Programming in Data Science with base R subject. The data itself, comes from Forbes magazine that contains a report about Top 2000 companies in the world which I downloaded it from Kaggle.com
Data source: https://www.kaggle.com/ash316/forbes-top-2000-companies
Feel free to provide me input for enhancement.
a. Check data with head command for 15 top records
b. Check data with tail command for 15 bottom records
c. Check data dimension with dim command
## [1] 2000 10
d. Column names with names
## [1] "X" "Rank" "Company" "Country" "Sales"
## [6] "Profits" "Assets" "Market.Value" "Sector" "Industry"
a. Check data structure with str command
## 'data.frame': 2000 obs. of 10 variables:
## $ X : logi NA NA NA NA NA NA ...
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Company : Factor w/ 1999 levels "3i Group","3M",..: 921 393 277 1021 1927 43 216 219 124 1824 ...
## $ Country : Factor w/ 61 levels "Argentina","Australia",..: 10 10 59 59 59 10 59 10 59 26 ...
## $ Sales : num 151.4 134.2 222.9 102.5 97.6 ...
## $ Profits : num 42 35 24.1 24.2 21.9 27.8 16.6 24.9 45.2 17.1 ...
## $ Assets : num 3473 3017 621 2513 1943 ...
## $ Market.Value: num 230 200 410 307 274 ...
## $ Sector : Factor w/ 11 levels "","Consumer Discretionary",..: 5 5 5 5 5 5 5 5 8 2 ...
## $ Industry : Factor w/ 81 levels "","Advertising",..: 53 69 50 53 53 69 53 53 19 9 ...
b. Check missing value with colSums is.na and anyNA command
## X Rank Company Country Sales Profits
## 2000 0 0 0 0 0
## Assets Market.Value Sector Industry
## 0 0 0 0
## [1] TRUE
c. Removing unnecessary columns: X and assign to forbes_clean
a. Brief Explanation with summary command
## Rank Company Country
## Min. : 1.0 Merck : 2 United States :564
## 1st Qu.: 500.8 3i Group : 1 Japan :229
## Median :1000.5 3M : 1 China :200
## Mean :1000.5 77 Bank : 1 United Kingdom: 91
## 3rd Qu.:1500.2 AAC Technologies Holdings: 1 South Korea : 64
## Max. :2000.0 Aareal Bank : 1 Hong Kong : 62
## (Other) :1993 (Other) :790
## Sales Profits Assets Market.Value
## Min. : 0.001 Min. :-13.000 Min. : 0.001 Min. : 0.072
## 1st Qu.: 4.000 1st Qu.: 0.318 1st Qu.: 10.875 1st Qu.: 6.600
## Median : 8.800 Median : 0.613 Median : 22.900 Median : 11.950
## Mean : 17.665 Mean : 1.241 Mean : 84.534 Mean : 24.418
## 3rd Qu.: 17.425 3rd Qu.: 1.300 3rd Qu.: 52.400 3rd Qu.: 24.400
## Max. :485.300 Max. : 45.200 Max. :3473.200 Max. :752.000
##
## Sector Industry
## Financials :583 : 491
## Consumer Discretionary:237 Regional Banks : 183
## Industrials :209 Oil & Gas Operations: 67
## :197 Electric Utilities : 64
## Materials :174 Investment Services : 62
## Information Technology:130 Major Banks : 61
## (Other) :470 (Other) :1072
Summaries :
1. United States has the highest number of companies, followed by Japan and China
2. Financial Sectors has the highest number of companies
3. Regional Banks has the highest number of companies in Industry
4. Average Sales of total companies is 17.665, Median is 8.800, Max: 485.300
5. Average Profit of total companies is 1.241, Median is 0.613, Max: 45.200
6. Average Total Assets of total companies is 84.534, Median is 22.900, Max: 3.473.200
7. Market value average of total companies is 24.818, Median is 11.950, Max: 752.000
b. Check Outlier
Outlier profit
Outlier Sales
Outlier Assets
Outlier Market Value
a. Companies with the highest Sales
b. Companies with the highest Profits
c. Companies with the highest Assets
d. Companies with the highest Market Value
e. List of companies with highest profit
forbes_clean_profit<-order(forbes_clean$Profits, decreasing = TRUE)
pro <- forbes_clean[forbes_clean_profit,]
head(pro)
histpro <- barplot(head(pro$Profits,20),
cex.axis = 1.0, cex.names = 0.8,
ylim = c(0, 60), main = "Companies with Highest Profit"
)
text(head(pro$Company,20),
x = histpro,
offset = -0.1,
y = -2,
cex = 0.6,
srt = 45,
xpd = TRUE,
pos = 2 )
f. List of companies with highest market value
forbes_clean_Market<-order(forbes_clean$Market.Value, decreasing = TRUE)
mv <- forbes_clean[forbes_clean_Market,]
head(mv)
histmv <- barplot(head(mv$Market.Value,20),
cex.axis = 1.0, cex.names = 0.8,
ylim = c(0, 1000), main = "Companies with Highest Market Value"
)
text(head(mv$Company,20),
x = histmv,
offset = -0.1,
y = -60,
cex = 0.6,
srt = 45,
xpd = TRUE,
pos = 2 )
g. List of companies with highest assets
forbes_clean_assets<-order(forbes_clean$Assets, decreasing = TRUE)
asset <- forbes_clean[forbes_clean_assets,]
head(asset)
histasset <- barplot(head(asset$Assets,20),
cex.axis = 1.0, cex.names = 0.8,
ylim = c(0, 4000), main = "Companies with Highest Assets")
text(head(asset$Company,20),
x = histasset,
offset = -0.1,
y = -100,
cex = 0.6,
srt = 45,
xpd = TRUE,
pos = 2 )
From the data process stage, we could get some preliminary views about some of those top companies. We could see companies in 3 main angle which are:
1st, the companies with highest profits such as Apple, ICBC, China Construction Bank, Agricultural Bank of China and Bank of China
2nd, the companies with highest market value such as Apple, Alphabet, Microsoft, Amazon and Berkshire Hathaway
3rd, the companies with highest assets such as ICBC, Fannie Mae, China Construction Bank, Agricultural Bank of China and Bank Of China
Recommendation for investment:
From this Forbes top 2000 companies we could focus to narrow down our preliminary views in long term investment plan only to this 15 top companies, because it has highest potentials and strength to sustain and expand their business.