Formula 1 (a.k.a. F1 or Formula One) is the highest class of single-seater auto racing sanctioned by the Fédération Internationale de l’Automobile (FIA) and owned by the Formula One Group. The FIA Formula One World Championship has been one of the premier forms of racing around the world since its inaugural season in 1950. The word “formula” in the name refers to the set of rules to which all participants’ cars must conform. A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on purpose-built circuits and on public roads.
We will be using the data of Formula One Drivers.
drivers<-read.csv("data_input/drivers.csv")head(drivers)## driverId driverRef number code forename surname dob nationality
## 1 1 hamilton 44 HAM Lewis Hamilton 1985-01-07 British
## 2 2 heidfeld \\N HEI Nick Heidfeld 1977-05-10 German
## 3 3 rosberg 6 ROS Nico Rosberg 1985-06-27 German
## 4 4 alonso 14 ALO Fernando Alonso 1981-07-29 Spanish
## 5 5 kovalainen \\N KOV Heikki Kovalainen 1981-10-19 Finnish
## 6 6 nakajima \\N NAK Kazuki Nakajima 1985-01-11 Japanese
## url
## 1 http://en.wikipedia.org/wiki/Lewis_Hamilton
## 2 http://en.wikipedia.org/wiki/Nick_Heidfeld
## 3 http://en.wikipedia.org/wiki/Nico_Rosberg
## 4 http://en.wikipedia.org/wiki/Fernando_Alonso
## 5 http://en.wikipedia.org/wiki/Heikki_Kovalainen
## 6 http://en.wikipedia.org/wiki/Kazuki_Nakajima
tail(drivers)## driverId driverRef number code forename surname dob
## 849 850 pietro_fittipaldi 51 FIT Pietro Fittipaldi 1996-06-25
## 850 851 aitken 89 AIT Jack Aitken 1995-09-23
## 851 852 tsunoda 22 TSU Yuki Tsunoda 2000-05-11
## 852 853 mazepin 9 MAZ Nikita Mazepin 1999-03-02
## 853 854 mick_schumacher 47 MSC Mick Schumacher 1999-03-22
## 854 855 zhou \\N ZHO Guanyu Zhou 1999-05-30
## nationality url
## 849 Brazilian http://en.wikipedia.org/wiki/Pietro_Fittipaldi
## 850 British http://en.wikipedia.org/wiki/Jack_Aitken
## 851 Japanese http://en.wikipedia.org/wiki/Yuki_Tsunoda
## 852 Russian http://en.wikipedia.org/wiki/Nikita_Mazepin
## 853 German http://en.wikipedia.org/wiki/Mick_Schumacher
## 854 Chinese https://en.wikipedia.org/wiki/Guanyu_Zhou
dim(drivers)## [1] 854 9
names(drivers)## [1] "driverId" "driverRef" "number" "code" "forename"
## [6] "surname" "dob" "nationality" "url"
From our inspection we can conclude : * drivers data contain 854 of rows and 9 of columns * Each of column name : “driverId”, “driverRef”, “number”, “code”, “forename”, “surname”, “dob”, “nationality”, “url”
Check data type for each column
str(drivers)## 'data.frame': 854 obs. of 9 variables:
## $ driverId : int 1 2 3 4 5 6 7 8 9 10 ...
## $ driverRef : chr "hamilton" "heidfeld" "rosberg" "alonso" ...
## $ number : chr "44" "\\N" "6" "14" ...
## $ code : chr "HAM" "HEI" "ROS" "ALO" ...
## $ forename : chr "Lewis" "Nick" "Nico" "Fernando" ...
## $ surname : chr "Hamilton" "Heidfeld" "Rosberg" "Alonso" ...
## $ dob : chr "1985-01-07" "1977-05-10" "1985-06-27" "1981-07-29" ...
## $ nationality: chr "British" "German" "German" "Spanish" ...
## $ url : chr "http://en.wikipedia.org/wiki/Lewis_Hamilton" "http://en.wikipedia.org/wiki/Nick_Heidfeld" "http://en.wikipedia.org/wiki/Nico_Rosberg" "http://en.wikipedia.org/wiki/Fernando_Alonso" ...
From this result, we found some of the data types are not in the correct type. we need to convert it into correct type (data coercion)
drivers$dob <- as.Date(drivers$dob, format = "%Y-%m-%d")
drivers$nationality <- as.factor(drivers$nationality)
str(drivers)## 'data.frame': 854 obs. of 9 variables:
## $ driverId : int 1 2 3 4 5 6 7 8 9 10 ...
## $ driverRef : chr "hamilton" "heidfeld" "rosberg" "alonso" ...
## $ number : chr "44" "\\N" "6" "14" ...
## $ code : chr "HAM" "HEI" "ROS" "ALO" ...
## $ forename : chr "Lewis" "Nick" "Nico" "Fernando" ...
## $ surname : chr "Hamilton" "Heidfeld" "Rosberg" "Alonso" ...
## $ dob : Date, format: "1985-01-07" "1977-05-10" ...
## $ nationality: Factor w/ 42 levels "American","American-Italian",..: 9 20 20 37 18 26 19 18 32 20 ...
## $ url : chr "http://en.wikipedia.org/wiki/Lewis_Hamilton" "http://en.wikipedia.org/wiki/Nick_Heidfeld" "http://en.wikipedia.org/wiki/Nico_Rosberg" "http://en.wikipedia.org/wiki/Fernando_Alonso" ...
Each of column already changed into desired data type
Cek for missing value
colSums(is.na(drivers))## driverId driverRef number code forename surname
## 0 0 0 0 0 0
## dob nationality url
## 0 0 0
anyNA(drivers)## [1] FALSE
Great!! No missing value
Create fullname column by concatenating forename and surname columns
drivers$fullname <- paste(drivers$forename,drivers$surname)
head(drivers)## driverId driverRef number code forename surname dob nationality
## 1 1 hamilton 44 HAM Lewis Hamilton 1985-01-07 British
## 2 2 heidfeld \\N HEI Nick Heidfeld 1977-05-10 German
## 3 3 rosberg 6 ROS Nico Rosberg 1985-06-27 German
## 4 4 alonso 14 ALO Fernando Alonso 1981-07-29 Spanish
## 5 5 kovalainen \\N KOV Heikki Kovalainen 1981-10-19 Finnish
## 6 6 nakajima \\N NAK Kazuki Nakajima 1985-01-11 Japanese
## url fullname
## 1 http://en.wikipedia.org/wiki/Lewis_Hamilton Lewis Hamilton
## 2 http://en.wikipedia.org/wiki/Nick_Heidfeld Nick Heidfeld
## 3 http://en.wikipedia.org/wiki/Nico_Rosberg Nico Rosberg
## 4 http://en.wikipedia.org/wiki/Fernando_Alonso Fernando Alonso
## 5 http://en.wikipedia.org/wiki/Heikki_Kovalainen Heikki Kovalainen
## 6 http://en.wikipedia.org/wiki/Kazuki_Nakajima Kazuki Nakajima
Now, Elips dataset is ready to be processed and analyzed
summary(drivers)## driverId driverRef number code
## Min. : 1.0 Length:854 Length:854 Length:854
## 1st Qu.:214.2 Class :character Class :character Class :character
## Median :427.5 Mode :character Mode :character Mode :character
## Mean :427.6
## 3rd Qu.:640.8
## Max. :855.0
##
## forename surname dob nationality
## Length:854 Length:854 Min. :1896-12-28 British :165
## Class :character Class :character 1st Qu.:1922-12-29 American :157
## Mode :character Mode :character Median :1936-12-28 Italian : 99
## Mean :1941-03-29 French : 73
## 3rd Qu.:1956-12-28 German : 50
## Max. :2000-05-11 Brazilian: 32
## (Other) :278
## url fullname
## Length:854 Length:854
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
Summary :
The oldest F1 driver in history was born in 28 December 1896, while the youngest F1 driver in history was born in 11 May 2000.
Britain has the highest number of F1 drivers in the world with 165 drivers, followed by America with 157 drivers.
Who is the oldest F1 driver and what’s his nationality?
drivers[drivers$dob == "1896-12-28",]## driverId driverRef number code forename surname dob nationality
## 742 741 etancelin \\N \\N Philippe Étancelin 1896-12-28 French
## url fullname
## 742 http://en.wikipedia.org/wiki/Philippe_%C3%89tancelin Philippe Étancelin
Answer: Philippe Étancelin from France.
Who is the youngest F1 driver and what’s his nationality?
drivers[drivers$dob == "2000-05-11",]## driverId driverRef number code forename surname dob nationality
## 851 852 tsunoda 22 TSU Yuki Tsunoda 2000-05-11 Japanese
## url fullname
## 851 http://en.wikipedia.org/wiki/Yuki_Tsunoda Yuki Tsunoda
Answer: Yuki Tsunoda from Japan.
How is the proportion(in percentage) of F1 drivers based on nationality?
round(prop.table(sort(table(drivers$nationality),decreasing = TRUE))*100,2)##
## British American Italian French
## 19.32 18.38 11.59 8.55
## German Brazilian Argentine Belgian
## 5.85 3.75 2.81 2.69
## South African Swiss Japanese Australian
## 2.69 2.69 2.34 1.99
## Dutch Austrian Spanish Canadian
## 1.99 1.76 1.76 1.64
## Swedish Finnish New Zealander Mexican
## 1.17 1.05 1.05 0.70
## Danish Irish Monegasque Portuguese
## 0.59 0.59 0.47 0.47
## Rhodesian Russian Uruguayan Colombian
## 0.47 0.47 0.47 0.35
## East German Venezuelan Indian Thai
## 0.35 0.35 0.23 0.23
## American-Italian Argentine-Italian Chilean Chinese
## 0.12 0.12 0.12 0.12
## Czech Hungarian Indonesian Liechtensteiner
## 0.12 0.12 0.12 0.12
## Malaysian Polish
## 0.12 0.12
Answer: We can see the proportion above and only 3 countries have over 10% of driver proportion which is Britain, America and Italy.
Does Indonesia has F1 driver?
drivers[drivers$nationality == "Indonesian",]## driverId driverRef number code forename surname dob nationality
## 837 837 haryanto 88 HAR Rio Haryanto 1993-01-22 Indonesian
## url fullname
## 837 http://en.wikipedia.org/wiki/Rio_Haryanto Rio Haryanto
Answer: Rio Haryanto is the only Indonesian F1 driver in history
Throughout F1 history since 1950, Britain is dominating the F1 driver proportions with 165 drivers(19.32%). The range of F1 driver’s date of birth is from the year 1896 until the year 2000. Indonesia only has one F1 driver in history which is Rio Haryanto. The lack of Indonesia F1 drivers might be caused of F1 low popularity in Indonesia compared to other racing competitions such as Moto GP and the amount of money that need to be invested to be successful in F1. We might noticed that some F1 drivers do not have numbers or codes, this is due to changes in regulations throughout F1 history.