Aprenentatge automàtic 1, Grau d’Estadística Aplicada
Python can be used in batch mode, but there are a number of excellent integrated development environments (IDEs) for Python, among the most used:
Download latest python release from Microsoft Store
Install it the usual way
Run system symbol
A new instance of the default browser will launch. If it returns an error, go to the system symbol and copy and paste the address that looks similar to:
Install homebrew https://brew.sh/
Install python and JupyterLab from brew using the Terminal app:
A Jupyter notebook is divided into individual, vertically arranged cells, which can be executed separately:
JupyterLab screenshot
Python timeline
Information on Python objects can be obtained quickly in an interactive environment:
Programs can be implemented very quickly – this is a pretty minimal example. You can write this command to a text file of your choice and run it directly on your system:
Only one function print() (shown here as a keyword),
Function displays argument (a string) on screen,
Arguments are passed to the function in parentheses,
A string must be wrapped in ” ” or ’ ’,
No semicolon at the end.
Types of objects
Numbers (integers, floating-point numbers, and complex numbers)
Booleans
The “null” type (NA in some other languages)
Strings
Lists
Operations
Sum: x + y
Difference: x - y
Product: x * y
Quotient: x / y
Remainder: x % y
Power: x ** y
Absolute value: abs(x)
Comparison
x == y
x != y
x > y / x >= y
x < y / x <= y
And: &
Or: |
Importing additional packages
Importing only a function from a package
Operations (extension)
Operations (extension)
Operations (extension)
Comments in python (jupyterLab)
Declaring new variables
Changing the working directory
'/home/dmorina/Insync/dmorina@ub.edu/OneDrive Biz/Docència/UAB/2024-2025/Aprenentatge Automàtic 1 (GEA)/Pràctiques/Pràctica 1'
Defining (and using) new functions:
Defining (and using) new functions:
Defining (and using) new functions:
Defining (and using) new functions:
pandas import and export
Reading and basic work with data (pandas!)
Reading and basic work with data (pandas!)
Reading and basic work with data (pandas!)
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.00 | NaN | S |
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.00 | B42 | S |
| 888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.45 | NaN | S |
| 889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.00 | C148 | C |
| 890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.75 | NaN | Q |
Reading and basic work with data (pandas!)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
Reading and basic work with data (pandas!)
PassengerId 891
Survived 891
Pclass 891
Name 891
Sex 891
Age 714
SibSp 891
Parch 891
Ticket 891
Fare 891
Cabin 204
Embarked 889
dtype: int64
Reading and basic work with data (pandas!)
| PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
|---|---|---|---|---|---|---|---|
| count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
| mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
| std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
| min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
| 50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
| 75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
| max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
Reading and basic work with data (pandas!)
| count | mean | std | min | 25% | 50% | 75% | max | ||
|---|---|---|---|---|---|---|---|---|---|
| Sex | Pclass | ||||||||
| female | 1 | 94.0 | 106.125798 | 74.259988 | 25.9292 | 57.24480 | 82.66455 | 134.500000 | 512.3292 |
| 2 | 76.0 | 21.970121 | 10.891796 | 10.5000 | 13.00000 | 22.00000 | 26.062500 | 65.0000 | |
| 3 | 144.0 | 16.118810 | 11.690314 | 6.7500 | 7.85420 | 12.47500 | 20.221875 | 69.5500 | |
| male | 1 | 122.0 | 67.226127 | 77.548021 | 0.0000 | 27.72810 | 41.26250 | 78.459375 | 512.3292 |
| 2 | 108.0 | 19.741782 | 14.922235 | 0.0000 | 12.33125 | 13.00000 | 26.000000 | 73.5000 | |
| 3 | 347.0 | 12.661633 | 11.681696 | 0.0000 | 7.75000 | 7.92500 | 10.008300 | 69.5500 |
Selecting rows
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Selecting rows (conditionally)
Selecting rows (conditionally)
Selecting rows (randomly)
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 463 | 464 | 0 | 2 | Milling, Mr. Jacob Christian | male | 48.0 | 0 | 0 | 234360 | 13.0 | NaN | S |
| 808 | 809 | 0 | 2 | Meyer, Mr. August | male | 39.0 | 0 | 0 | 248723 | 13.0 | NaN | S |
Selecting columns
Selecting columns
Selecting columns
Rename columns
| PassengerId | Survived | Pclass | Name | Sex | age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Drop columns
Drop duplicates
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Create a new column
Join DataFrames vertically
less50 = newData[newData.Age <= 50]
over50 = newData[newData.Age > 50]
total = pd.concat([less50, over50])
total.head(2)| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | AgeGroup | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | [20, 30) |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | [30, 40) |
Join DataFrames horizontally
Merge DataFrames
Exporting data (pandas!)
Generating basic graphs with pandas
Remove objects
More information on pandas: https://pandas.pydata.org
Download all csv files in the practice/data folder and define the proper working directory
Import all the files to the JupyterLab session
Combine all the files in a single DataFrame
Sort the resulting file by date (‘fecha Siniestro Acto’)
Rename the column fecha Siniestro Acto to Date