Introduction to Python

Aprenentatge automàtic 1, Grau d’Estadística Aplicada

Python progamming language

Integrated Development Environments

Python can be used in batch mode, but there are a number of excellent integrated development environments (IDEs) for Python, among the most used:

  • JupyterLab / Jupyter Notebook
  • Spyder
  • PyCharm
  • RStudio

JupyterLab installation (Windows)

  • Download latest python release from Microsoft Store

  • Install it the usual way

  • Run system symbol

    • Install the python package manager pip:
python -m ensurepip –upgrade
  • Install JupyterLab with pip:
pip install jupyterlab
  • Set a new System environment variable to the path (adapt it to fit your particular installation):
C:\Users\dmori\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\Scripts

JupyterLab installation (Windows)

  • Run jupyterlab from the system symbol:
jupyter lab

A new instance of the default browser will launch. If it returns an error, go to the system symbol and copy and paste the address that looks similar to:

http://localhost:8888/lab?token=e1c50f3b897c86536a6ecd7b7dcb148b5de74d007e22d83f

JupyterLab installation (MacOS)

  • Install homebrew https://brew.sh/

  • Install python and JupyterLab from brew using the Terminal app:

brew install python
python -m ensurepip --upgrade
brew install jupyterlab

JupyterLab installation (Debian based Linux)

  • Install python and JupyterLab:
sudo apt install python3
sudo snap install jupyterlab-desktop --classic

JupyterLab

A Jupyter notebook is divided into individual, vertically arranged cells, which can be executed separately:

JupyterLab screenshot

Essential concepts

Python evolution

Python timeline

Getting help

Information on Python objects can be obtained quickly in an interactive environment:

help(len)
Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.

Basic programming

Programs can be implemented very quickly – this is a pretty minimal example. You can write this command to a text file of your choice and run it directly on your system:

print("Hello there!")
Hello there!
  • Only one function print() (shown here as a keyword),

  • Function displays argument (a string) on screen,

  • Arguments are passed to the function in parentheses,

  • A string must be wrapped in ” ” or ’ ’,

  • No semicolon at the end.

Basic programming

Types of objects

  • Numbers (integers, floating-point numbers, and complex numbers)

  • Booleans

  • The “null” type (NA in some other languages)

  • Strings

  • Lists

Basic programming

Operations

  • Sum: x + y

  • Difference: x - y

  • Product: x * y

  • Quotient: x / y

  • Remainder: x % y

  • Power: x ** y

  • Absolute value: abs(x)

Basic programming

Comparison

  • x == y

  • x != y

  • x > y / x >= y

  • x < y / x <= y

  • And: &

  • Or: |

Basic programming

Importing additional packages

import pandas as pd

Importing only a function from a package

from datetime import datetime

Basic programming

Operations (extension)

import math

math.factorial(6)
720

Basic programming

Operations (extension)

import math

math.log(1)
0.0

Basic programming

Operations (extension)

import math

math.exp(1)
2.718281828459045

Basic programming

Comments in python (jupyterLab)

# This is
# a multiline comment

Basic programming

Declaring new variables

a = 2
vec = [] * 1000 # Array of size 1000

Basic programming

Changing the working directory

import os

os.getcwd()
'/home/dmorina/Insync/dmorina@ub.edu/OneDrive Biz/Docència/UAB/2024-2025/Aprenentatge Automàtic 1 (GEA)/Pràctiques/Pràctica 1'
import os

os.chdir('/home/dmorina/')
os.getcwd()
'/home/dmorina'

Basic programming

Defining (and using) new functions:

def newFunction(x, y):
  return x % y

newFunction(3, 2)
1

Basic programming

Defining (and using) new functions:

def newFunction2(x):
  if x > 5:
    return x+5
  elif x == 5:
    return x+10
  else:
    return x+100
newFunction2(3)
103
newFunction2(5)
15
newFunction2(20)
25

Basic programming

Defining (and using) new functions:

def newFunction3():
  for x in range(6):
    if x == 3: continue
    print(x)
  else:
    print("Finally finished!")
newFunction3()
0
1
2
4
5
Finally finished!

Basic programming

Defining (and using) new functions:

def newFunction4():
  for x in range(6):
    if x == 3: break
    print(x)
  else:
    print("Finally finished!")
newFunction4()
0
1
2

Basic programming

pandas import and export

Basic programming

Reading and basic work with data (pandas!)

import pandas as pd
import os

os.chdir("/home/dmorina/Insync/dmorina@ub.edu/OneDrive Biz/Docència/UAB/2024-2025/Aprenentatge Automàtic 1 (GEA)/Pràctiques/Pràctica 1/examples/")
newData = pd.read_csv("titanic.csv")

Basic programming

Reading and basic work with data (pandas!)

newData.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C

Basic programming

Reading and basic work with data (pandas!)

newData.tail(5)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.00 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.00 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.45 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.00 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.75 NaN Q

Basic programming

Reading and basic work with data (pandas!)

newData.shape
(891, 12)
len(newData)
891
newData.size
10692
newData.ndim
2
newData.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

Basic programming

Reading and basic work with data (pandas!)

newData.count()
PassengerId    891
Survived       891
Pclass         891
Name           891
Sex            891
Age            714
SibSp          891
Parch          891
Ticket         891
Fare           891
Cabin          204
Embarked       889
dtype: int64
newData['Age'].count()
np.int64(714)

Basic programming

Reading and basic work with data (pandas!)

newData.describe()
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

Basic programming

Reading and basic work with data (pandas!)

newData.groupby(["Sex", "Pclass"])["Fare"].describe()
count mean std min 25% 50% 75% max
Sex Pclass
female 1 94.0 106.125798 74.259988 25.9292 57.24480 82.66455 134.500000 512.3292
2 76.0 21.970121 10.891796 10.5000 13.00000 22.00000 26.062500 65.0000
3 144.0 16.118810 11.690314 6.7500 7.85420 12.47500 20.221875 69.5500
male 1 122.0 67.226127 77.548021 0.0000 27.72810 41.26250 78.459375 512.3292
2 108.0 19.741782 14.922235 0.0000 12.33125 13.00000 26.000000 73.5000
3 347.0 12.661633 11.681696 0.0000 7.75000 7.92500 10.008300 69.5500

Basic programming

Selecting rows

newData.iloc[:3]
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S

Basic programming

Selecting rows (conditionally)

newData.query('Age>40 & Sex=="female"').head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
11 12 1 1 Bonnell, Miss. Elizabeth female 58.0 0 0 113783 26.55 C103 S
15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 0 248706 16.00 NaN S

Basic programming

Selecting rows (conditionally)

newData[(newData.Age > 40) & (newData.Sex == "female")].head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
11 12 1 1 Bonnell, Miss. Elizabeth female 58.0 0 0 113783 26.55 C103 S
15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 0 248706 16.00 NaN S

Basic programming

Selecting rows (randomly)

newData.sample(n=2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
463 464 0 2 Milling, Mr. Jacob Christian male 48.0 0 0 234360 13.0 NaN S
808 809 0 2 Meyer, Mr. August male 39.0 0 0 248723 13.0 NaN S
newData.sample(frac=0.001)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
140 141 0 3 Boulos, Mrs. Joseph (Sultana) female NaN 0 2 2678 15.2458 NaN C

Basic programming

Selecting columns

newData[['Age', 'Sex']].head(2)
Age Sex
0 22.0 male
1 38.0 female

Basic programming

Selecting columns

newData.loc[:, 'Age':'Ticket'].head(2)
Age SibSp Parch Ticket
0 22.0 1 0 A/5 21171
1 38.0 1 0 PC 17599

Basic programming

Selecting columns

newData[['Age', 'Sex']].head(2)
Age Sex
0 22.0 male
1 38.0 female

Basic programming

Rename columns

newData.rename(columns={'Age': 'age'}).head(3)
PassengerId Survived Pclass Name Sex age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S

Basic programming

Drop columns

newData.drop(['Age', 'Sex'], axis=1).head(2)
PassengerId Survived Pclass Name SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 0 PC 17599 71.2833 C85 C

Basic programming

Drop duplicates

newData.drop_duplicates().head(3)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S

Basic programming

Create a new column

newData["AgeGroup"] = pd.cut(newData.Age, range(0, 105, 10), right=False)
newData[['Age', 'AgeGroup']].head(8)
Age AgeGroup
0 22.0 [20.0, 30.0)
1 38.0 [30.0, 40.0)
2 26.0 [20.0, 30.0)
3 35.0 [30.0, 40.0)
4 35.0 [30.0, 40.0)
5 NaN NaN
6 54.0 [50.0, 60.0)
7 2.0 [0.0, 10.0)

Basic programming

Join DataFrames vertically

less50 = newData[newData.Age <= 50]
over50 = newData[newData.Age > 50]
total = pd.concat([less50, over50])
total.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked AgeGroup
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S [20, 30)
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C [30, 40)

Basic programming

Join DataFrames horizontally

df1 = pd.DataFrame({
    'A': [1,2,3,4,5],
    'B': [1,2,3,4,5]
})

df2 = pd.DataFrame({
    'C': [1,2,3,4,5],
    'D': [1,2,3,4,5]
})

df_concat = pd.concat([df1, df2], axis=1)
df_concat.head(2)
A B C D
0 1 1 1 1
1 2 2 2 2

Basic programming

Merge DataFrames

df1 = pd.DataFrame({
    'id': [1,2,3,4,5],
    'col1': [1,2,3,4,5]
})

df2 = pd.DataFrame({
    'id': [1,2,3,4,5],
    'col2': [6,7,8,9,10]
})

df_merge = df1.merge(df2, on='id')
df_merge.head(2)
id col1 col2
0 1 1 6
1 2 2 7

Basic programming

Exporting data (pandas!)

import pandas as pd
newData.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

Basic programming

Generating basic graphs with pandas

import matplotlib.pyplot as plt

newData.plot()
plt.show()

Basic programming

newData.plot.scatter(x="Age", y="Fare", alpha=0.5)
plt.show()

Basic programming

newData.plot.box(y="Fare")
plt.show()

Basic programming

newData.plot.hist(y="Fare")
plt.show()

Basic programming

newData2 = newData.groupby(['Sex','AgeGroup']).size()
newData2 = newData2.unstack()
newData2.plot.bar()
plt.show()

Basic programming

newData3 = newData.groupby(['AgeGroup']).size()
newData3.plot.pie(y="AgeGroup", title="Age group", legend=False,
                   autopct='%1.1f%%', 
                   shadow=True, startangle=0)
<Axes: title={'center': 'Age group'}>

Basic programming

Remove objects

del [[newData, newData2, newData3]]

Basic programming

More information on pandas: https://pandas.pydata.org

Practice

Exercise 1

  • Download all csv files in the practice/data folder and define the proper working directory

  • Import all the files to the JupyterLab session

Exercise 2

  • Combine all the files in a single DataFrame

  • Sort the resulting file by date (‘fecha Siniestro Acto’)

  • Rename the column fecha Siniestro Acto to Date

Exercise 3

  • Group by week number (Hint: Use first to_datetime function from pandas library to make the column datetime and then isocalendar().week from datatable library) and sum the values of ‘Unidades Acto’.

Exercise 4

  • Plot the evolution of the variable ‘Unidades Acto’.

Exercise 5

  • Export the final DataFrame to a csv file