This document is the Final Project submission for CUNY R Bridge Summer 2018. R Bridge Course Final Project This is a final project to show off what you have learned. Select your data set from the list below: http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list). Another good source is found here: https://https://archive.ics.uci.edu/ml/datasets.html The presentation approach is up to you but it should contain the following: 1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text. 2. Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example - if it makes sense you could sum two columns together) 3. Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2. 4. Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end. 5. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

Data Set: For this project, I have used the dataset - Forbes2000 - from the package “HSAUR”. This data set is for - The Forbes 2000 Ranking of the World’s Biggest Companies.

Introduction: The Forbes 2000 list is a ranking of the world’s biggest companies, measured by sales, profits, assets and market value.

Format: data frame with 2000 observations on the following 8 variables.

rank - the ranking of the company. name - the name of the company. country - a factor giving the country the company is situated in. category - a factor describing the products the company produces. sales - the amount of sales of the company in billion USD. profits - the profit of the company in billion USD. assets - the assets of the company in billion USD. marketvalue - the market value of the company in billion USD.

Loading the relevant packages.

Completed the loading of relevant packages.

  1. Important Statistics and relevant information about the data set.
## [1] "Structure of the data set - Forbes2000 is : "
## 'data.frame':    2000 obs. of  8 variables:
##  $ rank       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ name       : chr  "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
##  $ country    : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
##  $ category   : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
##  $ sales      : num  94.7 134.2 76.7 222.9 232.6 ...
##  $ profits    : num  17.85 15.59 6.46 20.96 10.27 ...
##  $ assets     : num  1264 627 648 167 178 ...
##  $ marketvalue: num  255 329 195 277 174 ...
## [1] "Summary of all vectors of the data set - Forbes 2000 is : "
##       rank            name                     country   
##  Min.   :   1.0   Length:2000        United States :751  
##  1st Qu.: 500.8   Class :character   Japan         :316  
##  Median :1000.5   Mode  :character   United Kingdom:137  
##  Mean   :1000.5                      Germany       : 65  
##  3rd Qu.:1500.2                      France        : 63  
##  Max.   :2000.0                      Canada        : 56  
##                                      (Other)       :612  
##                    category        sales            profits        
##  Banking               : 313   Min.   :  0.010   Min.   :-25.8300  
##  Diversified financials: 158   1st Qu.:  2.018   1st Qu.:  0.0800  
##  Insurance             : 112   Median :  4.365   Median :  0.2000  
##  Utilities             : 110   Mean   :  9.697   Mean   :  0.3811  
##  Materials             :  97   3rd Qu.:  9.547   3rd Qu.:  0.4400  
##  Oil & gas operations  :  90   Max.   :256.330   Max.   : 20.9600  
##  (Other)               :1120                     NA's   :5         
##      assets          marketvalue    
##  Min.   :   0.270   Min.   :  0.02  
##  1st Qu.:   4.025   1st Qu.:  2.72  
##  Median :   9.345   Median :  5.15  
##  Mean   :  34.042   Mean   : 11.88  
##  3rd Qu.:  22.793   3rd Qu.: 10.60  
##  Max.   :1264.030   Max.   :328.54  
## 

The data set gives relevant information about the Forbes - top 2000 companies. Now that we have the raw information about the top 2000 list of companies, we move forward in this document to present some descriptive analysis and inferences related to this data.

Initial conclusions: a. We can get some really important data points like - Top 5 companies in the world according to market value, top 5 companies in the world according to sales value, etc. b. We will do some categorization based on the sector / countries / continents. c. Finally, we will give some important points in the end.

  1. Some data wrangling relevant to the data set: Creating a new column - continents - in which the country is located. This will help us see some insights regarding how the geography decides the probability of a company to fall under Forbes2000 companies list.

Suppressing the messages while creating the continent column, as some rows have more than 1 country. The code will display warning messages, which are being suppressed for the documentation. Also, the continent column for such rows will have the text - ‘Located in more than 1 country’.

## [1] "Some of the top rows of the Forbes2000 along with the new column - continent : "
##    rank                name        country               category  sales
## 1     1           Citigroup  United States                Banking  94.71
## 2     2    General Electric  United States          Conglomerates 134.19
## 3     3 American Intl Group  United States              Insurance  76.66
## 4     4          ExxonMobil  United States   Oil & gas operations 222.88
## 5     5                  BP United Kingdom   Oil & gas operations 232.57
## 6     6     Bank of America  United States                Banking  49.01
## 7     7          HSBC Group United Kingdom                Banking  44.33
## 8     8        Toyota Motor          Japan      Consumer durables 135.82
## 9     9          Fannie Mae  United States Diversified financials  53.13
## 10   10     Wal-Mart Stores  United States              Retailing 256.33
##    profits  assets marketvalue continent
## 1    17.85 1264.03      255.30  Americas
## 2    15.59  626.93      328.54  Americas
## 3     6.46  647.66      194.87  Americas
## 4    20.96  166.99      277.02  Americas
## 5    10.27  177.57      173.54    Europe
## 6    10.81  736.45      117.55  Americas
## 7     6.66  757.60      177.96    Europe
## 8     7.99  171.71      115.40      Asia
## 9     6.48 1019.17       76.84  Americas
## 10    9.05  104.91      243.74  Americas
  1. Some important and insightful statistics from the data set:
## [1] "Top 5 companies according to market value : "
##                name       country              category  sales profits
## 2  General Electric United States         Conglomerates 134.19   15.59
## 31        Microsoft United States   Software & services  34.27    8.88
## 24           Pfizer United States Drugs & biotechnology  40.36    6.20
## 4        ExxonMobil United States  Oil & gas operations 222.88   20.96
## 1         Citigroup United States               Banking  94.71   17.85
##     assets marketvalue continent
## 2   626.93      328.54  Americas
## 31   85.94      287.02  Americas
## 24  120.06      285.27  Americas
## 4   166.99      277.02  Americas
## 1  1264.03      255.30  Americas
## [1] "Top 5 companies according to sales : "
##               name        country             category  sales profits
## 10 Wal-Mart Stores  United States            Retailing 256.33    9.05
## 5               BP United Kingdom Oil & gas operations 232.57   10.27
## 4       ExxonMobil  United States Oil & gas operations 222.88   20.96
## 29  General Motors  United States    Consumer durables 185.52    3.82
## 75      Ford Motor  United States    Consumer durables 164.20    0.76
##    assets marketvalue continent
## 10 104.91      243.74  Americas
## 5  177.57      173.54    Europe
## 4  166.99      277.02  Americas
## 29 450.00       27.47  Americas
## 75 312.56       26.29  Americas
## [1] "Top 5 companies according to profits : "
##               name        country             category  sales profits
## 4       ExxonMobil  United States Oil & gas operations 222.88   20.96
## 1        Citigroup  United States              Banking  94.71   17.85
## 2 General Electric  United States        Conglomerates 134.19   15.59
## 6  Bank of America  United States              Banking  49.01   10.81
## 5               BP United Kingdom Oil & gas operations 232.57   10.27
##    assets marketvalue continent
## 4  166.99      277.02  Americas
## 1 1264.03      255.30  Americas
## 2  626.93      328.54  Americas
## 6  736.45      117.55  Americas
## 5  177.57      173.54    Europe
## [1] "Number of companies in Forbes2000 List - country wise"

## [1] "Number of companies in Forbes2000 List - continent wise"

## [1] "Number of companies in Forbes2000 List - category wise"

## [1] "Market value vs Category"

## [1] "Total assets by category for Forbes2000 companies shown below: "
##                            category   assets
## 2                           Banking 29653.55
## 9            Diversified financials 10792.00
## 16                        Insurance  7790.93
## 27                        Utilities  2296.86
## 8                 Consumer durables  2249.91
## 24      Telecommunications services  1908.00
## 19             Oil & gas operations  1793.82
## 6                     Conglomerates  1093.81
## 26                   Transportation  1076.79
## 18                            Media   958.24
## 17                        Materials   927.21
## 11             Food drink & tobacco   830.62
## 20                        Retailing   718.47
## 10            Drugs & biotechnology   662.42
## 23  Technology hardware & equipment   628.91
## 7                      Construction   580.23
## 5                         Chemicals   533.25
## 4                     Capital goods   499.01
## 3      Business services & supplies   498.53
## 13 Health care equipment & services   465.21
## 25                Trading companies   366.10
## 1               Aerospace & defense   349.81
## 15    Household & personal products   317.51
## 12                     Food markets   306.54
## 14     Hotels restaurants & leisure   287.82
## 22              Software & services   260.46
## 21                   Semiconductors   237.69
## [1] "Assets vs Category for top 10 categories by assets"

## [1] "Market value density of the Forbes2000 companies"

## [1] "Market value density of the Forbes2000 companies - category wise"

Final Conclusions:

## [1] "Top 3 countries with the maximum presence in Forbes2000 Top companies of the world are:"
##           country freq
## 60  United States  751
## 28          Japan  316
## 56 United Kingdom  137
## [1] "Top 3 continents with break-up presence in Forbes2000 are:"
##   continent freq
## 2  Americas  871
## 3      Asia  548
## 4    Europe  515
## [1] "Top 10 categories with break-up presence in Forbes2000 are:"
##                  category freq
## 2                 Banking  313
## 9  Diversified financials  158
## 16              Insurance  112
## 27              Utilities  110
## 17              Materials   97
## 19   Oil & gas operations   90
## 20              Retailing   88
## 11   Food drink & tobacco   83
## 26         Transportation   80
## 7            Construction   79
## [1] " Median or the range of market value for the Forbes2000 companies:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.02    2.72    5.15   11.88   10.60  328.54
## [1] "List of companies with more than 300 billion USD of market value:"
##               name       country      category marketvalue
## 2 General Electric United States Conglomerates      328.54
## [1] "List of companies with more than 200 billion USD of market value:"
##                name       country              category marketvalue
## 1         Citigroup United States               Banking      255.30
## 2  General Electric United States         Conglomerates      328.54
## 4        ExxonMobil United States  Oil & gas operations      277.02
## 10  Wal-Mart Stores United States             Retailing      243.74
## 24           Pfizer United States Drugs & biotechnology      285.27
## 31        Microsoft United States   Software & services      287.02
## [1] "Based on the marketvalue density of top2000 comapanies, we see that even though the maximum market value is above 300 billion USD, but there is only 1 such company. And there are only 5 companies with market value of over 200 billion USD. The median is 5 billion USD which is very less compared to the maximum market value. "

Now, the last segment is to read the csv file from the raw github link. We will now read this file into a data.frame thru R code.

Forbes2000_from_github_2 <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HSAUR/Forbes2000.csv", header = TRUE, sep = ",")