This document is the Final Project submission for CUNY R Bridge Summer 2018. R Bridge Course Final Project This is a final project to show off what you have learned. Select your data set from the list below: http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list). Another good source is found here: https://https://archive.ics.uci.edu/ml/datasets.html The presentation approach is up to you but it should contain the following: 1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text. 2. Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example - if it makes sense you could sum two columns together) 3. Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2. 4. Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end. 5. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
Data Set: For this project, I have used the dataset - Forbes2000 - from the package “HSAUR”. This data set is for - The Forbes 2000 Ranking of the World’s Biggest Companies.
Introduction: The Forbes 2000 list is a ranking of the world’s biggest companies, measured by sales, profits, assets and market value.
Format: data frame with 2000 observations on the following 8 variables.
rank - the ranking of the company. name - the name of the company. country - a factor giving the country the company is situated in. category - a factor describing the products the company produces. sales - the amount of sales of the company in billion USD. profits - the profit of the company in billion USD. assets - the assets of the company in billion USD. marketvalue - the market value of the company in billion USD.
Loading the relevant packages.
Completed the loading of relevant packages.
## [1] "Structure of the data set - Forbes2000 is : "
## 'data.frame': 2000 obs. of 8 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ name : chr "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
## $ country : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
## $ category : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
## $ sales : num 94.7 134.2 76.7 222.9 232.6 ...
## $ profits : num 17.85 15.59 6.46 20.96 10.27 ...
## $ assets : num 1264 627 648 167 178 ...
## $ marketvalue: num 255 329 195 277 174 ...
## [1] "Summary of all vectors of the data set - Forbes 2000 is : "
## rank name country
## Min. : 1.0 Length:2000 United States :751
## 1st Qu.: 500.8 Class :character Japan :316
## Median :1000.5 Mode :character United Kingdom:137
## Mean :1000.5 Germany : 65
## 3rd Qu.:1500.2 France : 63
## Max. :2000.0 Canada : 56
## (Other) :612
## category sales profits
## Banking : 313 Min. : 0.010 Min. :-25.8300
## Diversified financials: 158 1st Qu.: 2.018 1st Qu.: 0.0800
## Insurance : 112 Median : 4.365 Median : 0.2000
## Utilities : 110 Mean : 9.697 Mean : 0.3811
## Materials : 97 3rd Qu.: 9.547 3rd Qu.: 0.4400
## Oil & gas operations : 90 Max. :256.330 Max. : 20.9600
## (Other) :1120 NA's :5
## assets marketvalue
## Min. : 0.270 Min. : 0.02
## 1st Qu.: 4.025 1st Qu.: 2.72
## Median : 9.345 Median : 5.15
## Mean : 34.042 Mean : 11.88
## 3rd Qu.: 22.793 3rd Qu.: 10.60
## Max. :1264.030 Max. :328.54
##
The data set gives relevant information about the Forbes - top 2000 companies. Now that we have the raw information about the top 2000 list of companies, we move forward in this document to present some descriptive analysis and inferences related to this data.
Initial conclusions: a. We can get some really important data points like - Top 5 companies in the world according to market value, top 5 companies in the world according to sales value, etc. b. We will do some categorization based on the sector / countries / continents. c. Finally, we will give some important points in the end.
Suppressing the messages while creating the continent column, as some rows have more than 1 country. The code will display warning messages, which are being suppressed for the documentation. Also, the continent column for such rows will have the text - ‘Located in more than 1 country’.
## [1] "Some of the top rows of the Forbes2000 along with the new column - continent : "
## rank name country category sales
## 1 1 Citigroup United States Banking 94.71
## 2 2 General Electric United States Conglomerates 134.19
## 3 3 American Intl Group United States Insurance 76.66
## 4 4 ExxonMobil United States Oil & gas operations 222.88
## 5 5 BP United Kingdom Oil & gas operations 232.57
## 6 6 Bank of America United States Banking 49.01
## 7 7 HSBC Group United Kingdom Banking 44.33
## 8 8 Toyota Motor Japan Consumer durables 135.82
## 9 9 Fannie Mae United States Diversified financials 53.13
## 10 10 Wal-Mart Stores United States Retailing 256.33
## profits assets marketvalue continent
## 1 17.85 1264.03 255.30 Americas
## 2 15.59 626.93 328.54 Americas
## 3 6.46 647.66 194.87 Americas
## 4 20.96 166.99 277.02 Americas
## 5 10.27 177.57 173.54 Europe
## 6 10.81 736.45 117.55 Americas
## 7 6.66 757.60 177.96 Europe
## 8 7.99 171.71 115.40 Asia
## 9 6.48 1019.17 76.84 Americas
## 10 9.05 104.91 243.74 Americas
## [1] "Top 5 companies according to market value : "
## name country category sales profits
## 2 General Electric United States Conglomerates 134.19 15.59
## 31 Microsoft United States Software & services 34.27 8.88
## 24 Pfizer United States Drugs & biotechnology 40.36 6.20
## 4 ExxonMobil United States Oil & gas operations 222.88 20.96
## 1 Citigroup United States Banking 94.71 17.85
## assets marketvalue continent
## 2 626.93 328.54 Americas
## 31 85.94 287.02 Americas
## 24 120.06 285.27 Americas
## 4 166.99 277.02 Americas
## 1 1264.03 255.30 Americas
## [1] "Top 5 companies according to sales : "
## name country category sales profits
## 10 Wal-Mart Stores United States Retailing 256.33 9.05
## 5 BP United Kingdom Oil & gas operations 232.57 10.27
## 4 ExxonMobil United States Oil & gas operations 222.88 20.96
## 29 General Motors United States Consumer durables 185.52 3.82
## 75 Ford Motor United States Consumer durables 164.20 0.76
## assets marketvalue continent
## 10 104.91 243.74 Americas
## 5 177.57 173.54 Europe
## 4 166.99 277.02 Americas
## 29 450.00 27.47 Americas
## 75 312.56 26.29 Americas
## [1] "Top 5 companies according to profits : "
## name country category sales profits
## 4 ExxonMobil United States Oil & gas operations 222.88 20.96
## 1 Citigroup United States Banking 94.71 17.85
## 2 General Electric United States Conglomerates 134.19 15.59
## 6 Bank of America United States Banking 49.01 10.81
## 5 BP United Kingdom Oil & gas operations 232.57 10.27
## assets marketvalue continent
## 4 166.99 277.02 Americas
## 1 1264.03 255.30 Americas
## 2 626.93 328.54 Americas
## 6 736.45 117.55 Americas
## 5 177.57 173.54 Europe
## [1] "Number of companies in Forbes2000 List - country wise"
## [1] "Number of companies in Forbes2000 List - continent wise"
## [1] "Number of companies in Forbes2000 List - category wise"
## [1] "Market value vs Category"
## [1] "Total assets by category for Forbes2000 companies shown below: "
## category assets
## 2 Banking 29653.55
## 9 Diversified financials 10792.00
## 16 Insurance 7790.93
## 27 Utilities 2296.86
## 8 Consumer durables 2249.91
## 24 Telecommunications services 1908.00
## 19 Oil & gas operations 1793.82
## 6 Conglomerates 1093.81
## 26 Transportation 1076.79
## 18 Media 958.24
## 17 Materials 927.21
## 11 Food drink & tobacco 830.62
## 20 Retailing 718.47
## 10 Drugs & biotechnology 662.42
## 23 Technology hardware & equipment 628.91
## 7 Construction 580.23
## 5 Chemicals 533.25
## 4 Capital goods 499.01
## 3 Business services & supplies 498.53
## 13 Health care equipment & services 465.21
## 25 Trading companies 366.10
## 1 Aerospace & defense 349.81
## 15 Household & personal products 317.51
## 12 Food markets 306.54
## 14 Hotels restaurants & leisure 287.82
## 22 Software & services 260.46
## 21 Semiconductors 237.69
## [1] "Assets vs Category for top 10 categories by assets"
## [1] "Market value density of the Forbes2000 companies"
## [1] "Market value density of the Forbes2000 companies - category wise"
Final Conclusions:
## [1] "Top 3 countries with the maximum presence in Forbes2000 Top companies of the world are:"
## country freq
## 60 United States 751
## 28 Japan 316
## 56 United Kingdom 137
## [1] "Top 3 continents with break-up presence in Forbes2000 are:"
## continent freq
## 2 Americas 871
## 3 Asia 548
## 4 Europe 515
## [1] "Top 10 categories with break-up presence in Forbes2000 are:"
## category freq
## 2 Banking 313
## 9 Diversified financials 158
## 16 Insurance 112
## 27 Utilities 110
## 17 Materials 97
## 19 Oil & gas operations 90
## 20 Retailing 88
## 11 Food drink & tobacco 83
## 26 Transportation 80
## 7 Construction 79
## [1] " Median or the range of market value for the Forbes2000 companies:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02 2.72 5.15 11.88 10.60 328.54
## [1] "List of companies with more than 300 billion USD of market value:"
## name country category marketvalue
## 2 General Electric United States Conglomerates 328.54
## [1] "List of companies with more than 200 billion USD of market value:"
## name country category marketvalue
## 1 Citigroup United States Banking 255.30
## 2 General Electric United States Conglomerates 328.54
## 4 ExxonMobil United States Oil & gas operations 277.02
## 10 Wal-Mart Stores United States Retailing 243.74
## 24 Pfizer United States Drugs & biotechnology 285.27
## 31 Microsoft United States Software & services 287.02
## [1] "Based on the marketvalue density of top2000 comapanies, we see that even though the maximum market value is above 300 billion USD, but there is only 1 such company. And there are only 5 companies with market value of over 200 billion USD. The median is 5 billion USD which is very less compared to the maximum market value. "
Now, the last segment is to read the csv file from the raw github link. We will now read this file into a data.frame thru R code.
Forbes2000_from_github_2 <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HSAUR/Forbes2000.csv", header = TRUE, sep = ",")