## Our Dataset

We used a public-domain dataset from Kaggle that contains information about international football matches between 1872 and 2018 . It contains information about 38759 matches (Fifa World Cup matches, Friendly matches, etc).


Our dataset looks like this:


## Data Preprocessing

We performed some prepropcessing operations on the data:

Data Cleaning

Some dates in the dataset were represented as yyyy-mm-dd, and some as yyyy/mm/dd. We changed all to yyyy/mm/dd format using Python and Pandas library.

Adding new columns to the dataset

  • a column that represents the winner team
  • a column to represent the difference in score
  • a column to show the continent of the home team
  • a column to show the continent of the away team

So the modified dataset looks like this:

We used Excel, Python, and Pandas library to do that.

Creating a new dataset

We created a new dataset that is derived from the original dataset. It looks like this:

## Insights From Data

We used Tableau to analyze the data and get insights from it.

  • Some of the insights are found on the following link
  • Interactive Tableau session

We also created a Python program that receives the names of two teams as an input, and produces a summary of all the matches between those two teams as an output.