1 Brief History Of Formula One

Formula 1 (a.k.a. F1 or Formula One) is the highest class of single-seater auto racing sanctioned by the Fédération Internationale de l’Automobile (FIA) and owned by the Formula One Group. The FIA Formula One World Championship has been one of the premier forms of racing around the world since its inaugural season in 1950. The word “formula” in the name refers to the set of rules to which all participants’ cars must conform. A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on purpose-built circuits and on public roads.

2 Input Data

We will be using the data of Formula One Drivers.

drivers<-read.csv("data_input/drivers.csv")

2.1 Data Inspection

head(drivers)
##   driverId  driverRef number code forename    surname        dob nationality
## 1        1   hamilton     44  HAM    Lewis   Hamilton 1985-01-07     British
## 2        2   heidfeld    \\N  HEI     Nick   Heidfeld 1977-05-10      German
## 3        3    rosberg      6  ROS     Nico    Rosberg 1985-06-27      German
## 4        4     alonso     14  ALO Fernando     Alonso 1981-07-29     Spanish
## 5        5 kovalainen    \\N  KOV   Heikki Kovalainen 1981-10-19     Finnish
## 6        6   nakajima    \\N  NAK   Kazuki   Nakajima 1985-01-11    Japanese
##                                              url
## 1    http://en.wikipedia.org/wiki/Lewis_Hamilton
## 2     http://en.wikipedia.org/wiki/Nick_Heidfeld
## 3      http://en.wikipedia.org/wiki/Nico_Rosberg
## 4   http://en.wikipedia.org/wiki/Fernando_Alonso
## 5 http://en.wikipedia.org/wiki/Heikki_Kovalainen
## 6   http://en.wikipedia.org/wiki/Kazuki_Nakajima
tail(drivers)
##     driverId         driverRef number code forename    surname        dob
## 849      850 pietro_fittipaldi     51  FIT   Pietro Fittipaldi 1996-06-25
## 850      851            aitken     89  AIT     Jack     Aitken 1995-09-23
## 851      852           tsunoda     22  TSU     Yuki    Tsunoda 2000-05-11
## 852      853           mazepin      9  MAZ   Nikita    Mazepin 1999-03-02
## 853      854   mick_schumacher     47  MSC     Mick Schumacher 1999-03-22
## 854      855              zhou    \\N  ZHO   Guanyu       Zhou 1999-05-30
##     nationality                                            url
## 849   Brazilian http://en.wikipedia.org/wiki/Pietro_Fittipaldi
## 850     British       http://en.wikipedia.org/wiki/Jack_Aitken
## 851    Japanese      http://en.wikipedia.org/wiki/Yuki_Tsunoda
## 852     Russian    http://en.wikipedia.org/wiki/Nikita_Mazepin
## 853      German   http://en.wikipedia.org/wiki/Mick_Schumacher
## 854     Chinese      https://en.wikipedia.org/wiki/Guanyu_Zhou
dim(drivers)
## [1] 854   9
names(drivers)
## [1] "driverId"    "driverRef"   "number"      "code"        "forename"   
## [6] "surname"     "dob"         "nationality" "url"

From our inspection we can conclude : * drivers data contain 854 of rows and 9 of columns * Each of column name : “driverId”, “driverRef”, “number”, “code”, “forename”, “surname”, “dob”, “nationality”, “url”

2.2 Data Cleansing & Coercions

Check data type for each column

str(drivers)
## 'data.frame':    854 obs. of  9 variables:
##  $ driverId   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ driverRef  : chr  "hamilton" "heidfeld" "rosberg" "alonso" ...
##  $ number     : chr  "44" "\\N" "6" "14" ...
##  $ code       : chr  "HAM" "HEI" "ROS" "ALO" ...
##  $ forename   : chr  "Lewis" "Nick" "Nico" "Fernando" ...
##  $ surname    : chr  "Hamilton" "Heidfeld" "Rosberg" "Alonso" ...
##  $ dob        : chr  "1985-01-07" "1977-05-10" "1985-06-27" "1981-07-29" ...
##  $ nationality: chr  "British" "German" "German" "Spanish" ...
##  $ url        : chr  "http://en.wikipedia.org/wiki/Lewis_Hamilton" "http://en.wikipedia.org/wiki/Nick_Heidfeld" "http://en.wikipedia.org/wiki/Nico_Rosberg" "http://en.wikipedia.org/wiki/Fernando_Alonso" ...

From this result, we found some of the data types are not in the correct type. we need to convert it into correct type (data coercion)

drivers$dob <- as.Date(drivers$dob, format = "%Y-%m-%d")
drivers$nationality <- as.factor(drivers$nationality)

str(drivers)
## 'data.frame':    854 obs. of  9 variables:
##  $ driverId   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ driverRef  : chr  "hamilton" "heidfeld" "rosberg" "alonso" ...
##  $ number     : chr  "44" "\\N" "6" "14" ...
##  $ code       : chr  "HAM" "HEI" "ROS" "ALO" ...
##  $ forename   : chr  "Lewis" "Nick" "Nico" "Fernando" ...
##  $ surname    : chr  "Hamilton" "Heidfeld" "Rosberg" "Alonso" ...
##  $ dob        : Date, format: "1985-01-07" "1977-05-10" ...
##  $ nationality: Factor w/ 42 levels "American","American-Italian",..: 9 20 20 37 18 26 19 18 32 20 ...
##  $ url        : chr  "http://en.wikipedia.org/wiki/Lewis_Hamilton" "http://en.wikipedia.org/wiki/Nick_Heidfeld" "http://en.wikipedia.org/wiki/Nico_Rosberg" "http://en.wikipedia.org/wiki/Fernando_Alonso" ...

Each of column already changed into desired data type

Cek for missing value

colSums(is.na(drivers))
##    driverId   driverRef      number        code    forename     surname 
##           0           0           0           0           0           0 
##         dob nationality         url 
##           0           0           0
anyNA(drivers)
## [1] FALSE

Great!! No missing value

Create fullname column by concatenating forename and surname columns

drivers$fullname <- paste(drivers$forename,drivers$surname)
head(drivers)
##   driverId  driverRef number code forename    surname        dob nationality
## 1        1   hamilton     44  HAM    Lewis   Hamilton 1985-01-07     British
## 2        2   heidfeld    \\N  HEI     Nick   Heidfeld 1977-05-10      German
## 3        3    rosberg      6  ROS     Nico    Rosberg 1985-06-27      German
## 4        4     alonso     14  ALO Fernando     Alonso 1981-07-29     Spanish
## 5        5 kovalainen    \\N  KOV   Heikki Kovalainen 1981-10-19     Finnish
## 6        6   nakajima    \\N  NAK   Kazuki   Nakajima 1985-01-11    Japanese
##                                              url          fullname
## 1    http://en.wikipedia.org/wiki/Lewis_Hamilton    Lewis Hamilton
## 2     http://en.wikipedia.org/wiki/Nick_Heidfeld     Nick Heidfeld
## 3      http://en.wikipedia.org/wiki/Nico_Rosberg      Nico Rosberg
## 4   http://en.wikipedia.org/wiki/Fernando_Alonso   Fernando Alonso
## 5 http://en.wikipedia.org/wiki/Heikki_Kovalainen Heikki Kovalainen
## 6   http://en.wikipedia.org/wiki/Kazuki_Nakajima   Kazuki Nakajima

Now, Elips dataset is ready to be processed and analyzed

3 Data Explanation

summary(drivers)
##     driverId      driverRef            number              code          
##  Min.   :  1.0   Length:854         Length:854         Length:854        
##  1st Qu.:214.2   Class :character   Class :character   Class :character  
##  Median :427.5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :427.6                                                           
##  3rd Qu.:640.8                                                           
##  Max.   :855.0                                                           
##                                                                          
##    forename           surname               dob                nationality 
##  Length:854         Length:854         Min.   :1896-12-28   British  :165  
##  Class :character   Class :character   1st Qu.:1922-12-29   American :157  
##  Mode  :character   Mode  :character   Median :1936-12-28   Italian  : 99  
##                                        Mean   :1941-03-29   French   : 73  
##                                        3rd Qu.:1956-12-28   German   : 50  
##                                        Max.   :2000-05-11   Brazilian: 32  
##                                                             (Other)  :278  
##      url              fullname        
##  Length:854         Length:854        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Summary :

  1. The oldest F1 driver in history was born in 28 December 1896, while the youngest F1 driver in history was born in 11 May 2000.

  2. Britain has the highest number of F1 drivers in the world with 165 drivers, followed by America with 157 drivers.

4 Data Manipulation & Transformation

Who is the oldest F1 driver and what’s his nationality?

drivers[drivers$dob == "1896-12-28",]
##     driverId driverRef number code forename    surname        dob nationality
## 742      741 etancelin    \\N  \\N Philippe Étancelin 1896-12-28      French
##                                                      url            fullname
## 742 http://en.wikipedia.org/wiki/Philippe_%C3%89tancelin Philippe Étancelin

Answer: Philippe Étancelin from France.

Who is the youngest F1 driver and what’s his nationality?

drivers[drivers$dob == "2000-05-11",]
##     driverId driverRef number code forename surname        dob nationality
## 851      852   tsunoda     22  TSU     Yuki Tsunoda 2000-05-11    Japanese
##                                           url     fullname
## 851 http://en.wikipedia.org/wiki/Yuki_Tsunoda Yuki Tsunoda

Answer: Yuki Tsunoda from Japan.

How is the proportion(in percentage) of F1 drivers based on nationality?

round(prop.table(sort(table(drivers$nationality),decreasing = TRUE))*100,2)
## 
##           British          American           Italian            French 
##             19.32             18.38             11.59              8.55 
##            German         Brazilian         Argentine           Belgian 
##              5.85              3.75              2.81              2.69 
##     South African             Swiss          Japanese        Australian 
##              2.69              2.69              2.34              1.99 
##             Dutch          Austrian           Spanish          Canadian 
##              1.99              1.76              1.76              1.64 
##           Swedish           Finnish     New Zealander           Mexican 
##              1.17              1.05              1.05              0.70 
##            Danish             Irish        Monegasque        Portuguese 
##              0.59              0.59              0.47              0.47 
##         Rhodesian           Russian         Uruguayan         Colombian 
##              0.47              0.47              0.47              0.35 
##       East German        Venezuelan            Indian              Thai 
##              0.35              0.35              0.23              0.23 
##  American-Italian Argentine-Italian           Chilean           Chinese 
##              0.12              0.12              0.12              0.12 
##             Czech         Hungarian        Indonesian   Liechtensteiner 
##              0.12              0.12              0.12              0.12 
##         Malaysian            Polish 
##              0.12              0.12

Answer: We can see the proportion above and only 3 countries have over 10% of driver proportion which is Britain, America and Italy.

Does Indonesia has F1 driver?

drivers[drivers$nationality == "Indonesian",]
##     driverId driverRef number code forename  surname        dob nationality
## 837      837  haryanto     88  HAR      Rio Haryanto 1993-01-22  Indonesian
##                                           url     fullname
## 837 http://en.wikipedia.org/wiki/Rio_Haryanto Rio Haryanto

Answer: Rio Haryanto is the only Indonesian F1 driver in history

5 Explanatory Text

Throughout F1 history since 1950, Britain is dominating the F1 driver proportions with 165 drivers(19.32%). The range of F1 driver’s date of birth is from the year 1896 until the year 2000. Indonesia only has one F1 driver in history which is Rio Haryanto. The lack of Indonesia F1 drivers might be caused of F1 low popularity in Indonesia compared to other racing competitions such as Moto GP and the amount of money that need to be invested to be successful in F1. We might noticed that some F1 drivers do not have numbers or codes, this is due to changes in regulations throughout F1 history.