Kaggle:

Kaggle is a well-known data source. With a massive volume of datasets in the Kaggle repository, it allows the public access to its resources (Dataset). Apart from being a data source, Kaggle is also known for its competitions and also as the home of passionate Data Scientists. The term passionate were used to recognize their work and for their constant contribution to the Analytics community. One can view their work done through the written Blogs and Kaggle notebooks. Data scientists take pride in participating in Kaggle competitions and hosting their solutions on Kaggle. These competitions are not just a way of showcasing an individual’s skills but serve a unique way of building a portfolio. The access to a Kaggle data is free and the data are publicly available on the data section of Kaggle website under “https://www.kaggle.com/datasets”. To have data in your PC, one doesn’t require to create/register to Kaggle. The data is a few clicks away to have it on a computer.

Another option is accessing the Kaggle API through the command line, the way a coder likes it. However, to do so, one requires to create/have an account with Kaggle. In this tutorial, I will provide a guide to write a few lines of command to import the kaggle competition dataset in your computer.

Step-1: Getting started on Kaggle

Signing/Registering to Kaggle

Signing/Registering to Kaggle

If you have already registered to Kaggle, you need to sign in every time you are accessing kaggle repository for importing competition dataset through the command-line interface (CLI). Press the Sign-in button and select the appropriate login options.

Signing options

Signing options

For users willing to have an account with Kaggle. Please press the Register button and select the appropriate sign-up options.

Registration

Registration

Creating an API token for authentication:

Once you are sign-in to Kaggle, you are required to create an API token for your command to access the Kaggle API. The API token is a credential file in “json” format. During the process, the command will use the credential file as authentication to access the Kaggle API. Each profile will get unique credentials.

To download the API token, you need to sign-in first. Then, on your homepage, click on your profile picture to appear a list.

Press on My Account to enter into the webpage that has your profile details. Once you are in, you will be able to view your username, display name, email address and other details. You need to scroll down to the API section to create and download the API tokens.

Press the “Create New API Token” and it will automatically download a json file named “kaggle.json”. Once you download the file a message will display asking you to move the “kaggle.json” file to “.kaggle directory”.

Note that in most cases the “.kaggle” directory will be unavailable on your home directory. You can check if .kaggle is present in your computer by viewing the hidden folder option. If you do not see the folder, you need to create the folder through command prompt as the computer will restrict you to manually create a folder having a special character as a prefix character.

mkdir .kaggle # Create the directory

cp ~/download/kaggle.json ~/.kaggle # Mac user

cp c:\Downloads\kaggle.json <c:\Users\<Username>\.kaggle> # Windows user 

Note that during the authentication process, the Kaggle API will check for the “kaggle.json” file in “.kaggle directory”. Hence, the naming should be done as it is.

Once the execution of these steps are ensured. You may install the Kaggle API client in your system through the command line/command prompt.

Installing Kaggle API through command prompt/Terminal:

The step assumes that you already have the latest version of Python installed on your computer. If you are required to install Python please follow the below provided links.

Mac user may follow the below link “https://www.youtube.com/watch?v=8BiYGIDCvvA

Windows user may find the instruction in the below given link “https://www.youtube.com/watch?v=ndrCfBJkkvE

To install the kaggle API client, open the command prompt/terminal your computer and type

pip install kaggle # The command will run and install the API client in your computer.

Next, you need to enter in a kaggle competition to download the dataset. Kaggle is very typical about their rule. Importing data from Kaggle API requires you to participate in a competition and Kaggle API allows access to the competition data only.

To participate in a competition, once you have logged-in you need to do the following steps

“hit the competition button on your homepage > select a competition > Hit the rule button and select”I understand and Accept" > Note down the competition_id"

Click the highlighted button

Select from the list of competitions

Accept the competition rule by clicking “I Understand and Accept”. Once you accept the rules, all you need to do is note down the competition_id from the HTML link at the address bar.

For example, we have entered into “Severstal: Steel Defect Detection” by accepting the rule of the competition. Now, we can find the competition_id from the address bar (highlighted). For this competition, the id is “severstal-steel-defect-detection”. Likewise, every other competition will have its id. So, every time we are importing a competition data through the command-line, we need to use the id as a reference.

Now, run the below command to import the data.

kaggle competition download <competition name>

Upon successful execution, you will see something similar on your command prompt window

The data will be stored at your home directory.