Before anything else, you need to set Python up on your computer. The version of Python we will use is Anaconda Python - see https://www.anaconda.com/download/. Using instructions on this page, install Python (it’s free). You will want the Python 3.7 version (not 2.7). If you are offered a choice between 64-bit and 32-bit versions, unless your laptop is very old you’ll want 64-bit. The download is large (~614Mb) so it may take a while. It may also ask you if you want to install something called VS Code - you won’t need this, so I would suggest saying ‘no’.
There are a number of ways to actually run Python. Here, we will use Navigator to launch the Jupyter app. This is done by first stating Navigator, and from there launching Jupyter. To start up Navigator, follow the instructions here: https://docs.anaconda.com/anaconda/navigator/getting-started/#navigator-starting-navigator. When it has started, you will see a window similar to this:
Anaconda Navigator Window
If the JupyterLab window has a button that says Install click on this - it installs some software on to your machine. Follow any instructions you get. When installed the button will say Launch instead of Install. When it says Launch then click on it, to start up the JupyterLab app. This actually runs Python on a web page. First up, you may see a number of options - click on the Python 3 option in the Notebook section.
Next, you should see something like this:
JupyterLab Entry Screen
Although it is capable of quite a lot more than this, the console can function as a REPL interface (Read Evaluate Print Loop)- so you type a line in to the box at the top and press shift+enter, Python reads in the line and evaluates it as a Python expression, and then prints the result (or an error message) - it then loops back to read in the next entry - in this case by creating a new box to type Python code into. This is a good way to find out about Python expressions, as in the next few sections. However, before you begin, you may want to re-start Python in your own folder (the default one is not in a a particularly helpful location). Firstly, close the Python tab. THen click on the folfer icon to the left, and navigate to a folder you would like to work in. The ‘folder +’ icon at the top allows you to create a nuew folder if you wish. This will be called Untitled Folder
but if you right click (or control click for Macs) a new menu comes up allowing you to rename it. If you make a new folder, navigate into it, then click on the ‘+’ option. This starts a new Python session. Click on ‘Python 3’ in Notebook and finally use File / Rename to rename the tab. Finally got there. I think this was designed by the same person who thought up the interfaces for 1980s video recorders and microwaves.
First off, Python can do simple arithmetic.
print(3 + 7)
## 10
print(4.5 * 6.1)
## 27.45
#
# A hashtag makes a comment - try some division
#
print(4 / 2)
## 2.0
print(5.7 / 8.8)
## 0.6477272727272727
Powers are possible, using **
print(2 ** 6)
## 64
print(2 ** 0.5)
## 1.4142135623730951
Self-test: what do you think %
does here?
print(8 % 3)
## 2
print(9 % 3)
## 0
print(17 % 8)
## 1
print(17 % 6)
## 5
print(17 % 4)
## 1
Up until now, most attentionm has been on representing information in Python, rather than doing very much with it. In this section Python programming will be introduced. From one viewpoint, Python programs can simply be lists of instructions such as introduced above, stored in a text file and executed in sequence - although some more sophisticated ideas will need to be considered. However, it is certainly the case that rather than just typing things into Jupyter line by line, it is a good idea to create your code in a single chunk, and store the programs. You can do this in Jupyter. Noting that you need to press shift+enter to send the Python code to be run, just pressing enter on its own creates a new line of code in the box, without running it. As an example of multiple line entry, note that you can also store the results of calculations in variables:
x = 6
print(x)
## 6
print(x * 6)
## 36
y = 9 / 4
print(y)
## 2.25
Can you explain the following? What do you think //
does?
z = 9 // 4
print(z)
## 2
There are different data types in Python - the two you saw there were float and int types. To write an int in Python, simply write an integer without any decimal - eg 25
or -6
. To write a float, enter a decimal (even if the value being written is a whole number) - eg 3.0
or -9.54
. If you mix an int and a float in a calculation, the result is a float.
print(8 + 1.0)
## 9.0
If both numbers are floats, then unsurprisingly the result here is a float.
print(8.0 - 7.0)
## 1.0
If both numbers are ints then the result is an int.
print(8 - 12)
## -4
Note that this is not the case for division when the result of dividing one int into another might be a float:
print(23 / 8)
## 2.875
The answer to the simple ‘how many times does 8 go into 23’ version of division is provided by the //
operator:
print(23 // 8)
## 2
This time the result is an int
.
Python also has functions - these are similar to those in R - for example
a_number = -7
another_number = 67
print(abs(a_number))
## 7
print(abs(another_number))
## 67
As well as int and float types, Python handles quite a few other data types. string is another commonly used one.
my_name = "Chris Brunsdon"
print(my_name)
## Chris Brunsdon
The operator +
also works for strings - it joins them together.
first_name = "Chris"
surname = "Brunsdon"
print(first_name + surname)
# Note that '+' doesn't insert spaces between strings
# unless you explicitly tell it to
## ChrisBrunsdon
print(first_name + " " + surname)
## Chris Brunsdon
The *
operator works when it has one term as a string and the other as an int, where it repeats the string argument n
times, if n
is the int:
print("Ha " * 3)
# Order of int and string doesn't matter
## Ha Ha Ha
underline = 20 * "-"
print(underline)
## --------------------
Note that the numerical term here must be an int - floats give an error. This makes sense, as you can’t repeat a string 3.4 times, for example. However, note that you get an error even for a float value that is a whole number, such as 4.0
.
The %
operator also has a use if the first argument is a string. Here it represents a printing format. For example
print("%6.4f" % (1/3))
## 0.3333
Reformats the result of 1/3
as a floating point number (hence the f
) taking up 6 characters in total, with 4 digits after the decimal point. To find out more about possible formats, see this link: https://pyformat.info.
In most languages, other functions - such as log, sin and so on are provided. They are available in Python as well, but they are part of a library rather than in the core Python made available when you start up the REPL interpreter - as you did earlier on. Individual items in the library are called packages and you can access them via the import
statement. Functions called sin
, cos
and so on are available in a library called math
- a number of constants are also provided, such as pi
. Here is an example of their use:
import math
theta = math.pi / 3.0
print(math.cos(theta))
## 0.5000000000000001
print(math.sin(theta))
## 0.8660254037844386
Note that although the functions fpr sin
,cos
and so on are quite accurate, they aren’t perfect, hence the slight error in cos(theta)
here. Its probably a good idea to round the number of decimals when printing out results like this - for example:
print('%7.4f' % math.cos(theta))
## 0.5000
From the example, you can see that functions imported from math
are named math.<fn>
where <fn>
is the function name. Using functions from packages generally works this way. However, sometimes if a lot of functions from a package are used this gets cumbersome - particular if thre package has a long name. One way to get round this is to use the import ... as
variation:
import math as m
theta = m.pi / 3.0
print('%7.4f' % m.cos(theta))
## 0.5000
print('%7.4f' % m.sin(theta))
## 0.8660
Here we tell Python to import the math
package, but to refer to it as m
once it is imported. Another approach is to use the from ... import
variation. Here the functions to be used from the package are directly stated, and afterwards they may be referred to directly.
from math import sin, cos, pi
theta = pi / 3.0
print('%7.4f' % cos(theta))
## 0.5000
print('%7.4f' % sin(theta))
## 0.8660
However, the complication with the above approach is that there is nothing to stop several packages having functions with the same names - it then becomes hard to distinguish between which one is to be used. Thus the last approach, although the most convenient in some ways, is best confined to short snippets of Python.
Until now, you have seen Python variables containing a single value - either float, int or string. Python also has variables that contain lists of values. Items in a list are separated by commas, and enclosed in square brackets:
ages = [52,21,43,23,19]
print(ages)
## [52, 21, 43, 23, 19]
A function that applies to lists is len
- returning the length of the list (ie the number of items):
print(len(ages))
## 5
You can also pick out individual items in the list like an array in R:
print(ages[1], ages[0])
## 21 52
However, unlike R (but like C++ and Java) the first element in the list is indexed at zero, not one. Thus, the last element in ages
is ages[4]
not ages[5]
- forgetting this is a common source of errors in Python, particularly if you are used to coding in R or FORTRAN.
You can also use negative numbers to select items relative to the end of the list: ages[-1]
refers to the last item in ages
, ages[-2]
the one before that, and so on:
print(ages[-1], ages[-3])
## 19 43
You can also pick out sub-lists using slicing - specifying a sequential list of indices:
print(ages[0:2])
## [52, 21]
Note that this picks out elements 0 and 1 - but not 2 - slicing operators do not include the final element. What this does mean is that, for example, ages[0:len(ages)]
selects the entire list ages
-
print(ages[0:len(ages)])
## [52, 21, 43, 23, 19]
However, a classic Python ‘gotcha’ is that ages[len(ages)]
without slicing gives an error, as it refers to ages[5]
. It is also possible to omit the expression before or after the :
. If the left hand is omitted it is assumed to be element zero; if the last is omitted, it is assumed to be the last element1.
print(ages[:2])
## [52, 21]
print(ages[1:])
## [21, 43, 23, 19]
print(ages[-2:])
## [23, 19]
print(ages[:-1])
## [52, 21, 43, 23]
Note the last example returns all of the elements in ages except the last one - the term to the right of the :
isn’t included, as with the ages[:2]
example. You can also mix positive and negative indexes in a slice:
# print(all elements except the first and last)
print(ages[1:-1])
## [21, 43, 23]
The expression []
refers to an empty list. Sometimes you can get this as a result of a slice in which the left hand term exceeds the right.
dead_list = ages[2:1]
print(dead_list)
## []
print(len(dead_list))
## 0
They are useful in other situations - as will be seen later.
Another function for lists is sorted
. This sorts the items in the list.
print(sorted(ages))
## [19, 21, 23, 43, 52]
You can apply slicing to results of functions, provided they are also lists. The following prints all values of ages
except the largest and smallest.
print(sorted(ages)[1:-1])
## [21, 23, 43]
It is possible for the individual elements of lists to be of different types.
mixed_up = ['Chris','Brunsdon',1,'Jan',2014]
print(mixed_up)
## ['Chris', 'Brunsdon', 1, 'Jan', 2014]
print(mixed_up[0:2])
## ['Chris', 'Brunsdon']
Interestingly, it is also possible to have other lists as elements of lists.
my_details = [['Chris','Brunsdon'],[1,'Jan',2014]]
print(my_details[0])
# Use succesive indexing to access elements
# of lists inside other lists
## ['Chris', 'Brunsdon']
print(my_details[1][2])
## 2014
You can use this to represent contiguity between geographical zones - if the provinces of Ireland are indexed by the numbers 0 to 3 for Ulster, Connaught, Leinster, and Munster respectively then the province contiguities (ie the information as to which pairs of provinces share a boundary) can be represented by a list of lists:
province_nbrs = [[1,2],[0,2,3],[0,1,3],[1,2]]
print(province_nbrs[0])
## [1, 2]
The list of neighbours of province 0 (Ulster) is the first element of the list province_nbrs
and is itself a list - [1,2]
- meaning that the neighbouring provinces are Connaught and Leinster.
Python treats strings as lists of a kind - it is possible to access individual characters in a string via list item indexing.
name = 'Chris Brunsdon'
print(name[0])
## C
print(name[0:5])
## Chris
print(name[-1])
## n
Functions that work on lists often work on strings by treating them as a list of characters:
print(len(name))
## 14
print(sorted(name))
## [' ', 'B', 'C', 'd', 'h', 'i', 'n', 'n', 'o', 'r', 'r', 's', 's', 'u']
Note that in the sorted
example, the result is literally a list of characters - they are sorted, but not reconstructed into a string. As before, the slicing operator could be applied to the right of any expression resulting in a string - for example to pick out the initial for my first name:
print(my_details[0][0][0])
## C
and to create a list with both my initials:
my_inits = [my_details[0][0][0],my_details[0][1][0]]
print(my_inits)
## ['C', 'B']
As well as functions that return valus as lists, there are also methods. Methods differ from functions in a number of ways - but in some ways are similar. If x
is some kind of Python object, then a method is called by x.<meth>()
where <meth>
is the method name. For example, the append
method adds a new item to the end of a list.
print(ages)
## [52, 21, 43, 23, 19]
ages.append(34)
print(ages)
## [52, 21, 43, 23, 19, 34]
A key point here is that append
modified the actual list. Whereas functions such as sorted
left the variable ages
unaltered, append actually changed it. This is not always the case with methods, but quite often it is. There is also a sort
method which sorts the items in a list, but actually changes the list, rather than providing a new list.
# Use the 'sort' method on 'ages'
ages.sort()
# Check 'ages' has actually been permanently altered
print(ages)
## [19, 21, 23, 34, 43, 52]
Some methods also return values - pop
returns the value of a particular index in a list, but then removes that value.
# Pop the last value from 'ages'
oldest = ages.pop(-1)
print(oldest)
# Show that the last value has been removed
## 52
print(ages)
## [19, 21, 23, 34, 43]
Another useful methods is insert
- this places a new value inside an existing list before a specified position
# Put the oldest value back in the 'ages' list, but just before position 2
ages.insert(2,oldest)
print(ages)
# Now add a new age at the beginning
## [19, 21, 52, 23, 34, 43]
ages.insert(0,63)
print(ages)
# Sort it to keep it in order
## [63, 19, 21, 52, 23, 34, 43]
ages.sort()
print(ages)
## [19, 21, 23, 34, 43, 52, 63]
Recall that +
was used for joining strings together. It can also be used for joining lists
list1 = ['Chris','Brunsdon']
list2 = ['NUIM',2014]
print(list1 + list2)
## ['Chris', 'Brunsdon', 'NUIM', 2014]
A dictionary
is similar to a list, but a key difference is that items are referred to by a name, rather than by location:
new_car = {'colour':'blue','cylinders':4, 'capacity':1200}
The variable new_car
is a kind of list, but each of the three items are referred to by names. The items in the dictionary are accessed in a similar way to lists, except that a string containing the name is used, rather than an int, as with lists.
print(new_car['colour'])
## blue
print(new_car['cylinders'])
## 4
These are useful data types when you wish to associate items of information with a list of people, places and so on. For example they can associate geographical data with the names of locations
population = {'Ulster':294803,'Connaught':542547,'Leinster':2504814,
'Munster':1246088}
print(population['Leinster'])
## 2504814
The two entries in each dictionary element (ie the lookup-up name and the associate value) are called the key
and value
respectively. As with lists, the value can take any form - including a list or another dictionary. For example, the contiguity information for provinces stored earlier as a list could also be stored as a dictionary:
neighbours = {'Ulster':['Connaught','Leinster'],
'Connaught':['Ulster','Leinster','Munster'],
'Leinster':['Ulster','Connaught','Munster'],
'Munster':['Connaught','Leinster']}
print(neighbours['Munster'])
## ['Connaught', 'Leinster']
As there is the idea of an empty list, there is also an empty dictionary, represented by {}
. This is useful, as new items in a dictionary can be created by statements of the form dict[key]=value
. If key
already exists in the dictionary the item will be overwritten, but if it isn’t, a new key/value pair is added. Thus, another way to enter the neighbours of the provinces in Ireland is
# Start with an empty dictionary
neighbours = {}
# Add entries one by one
neighbours['Ulster'] = ['Connaught','Leinster']
neighbours['Connaught'] = ['Ulster','Leinster','Munster']
neighbours['Leinster'] = ['Ulster','Connaught','Munster']
neighbours['Munster'] = ['Connaught','Leinster']
# Prove the dictionary works as before
print(neighbours['Munster'])
## ['Connaught', 'Leinster']
Two methods sometimes helpful for dictionaries are keys
- which extracts all of the key fields for a dictionary as a list. Similarly the method values
extracts all of the values.
print(population.keys())
## dict_keys(['Ulster', 'Connaught', 'Leinster', 'Munster'])
print(population.values())
## dict_values([294803, 542547, 2504814, 1246088])
Note that although the order of the values corresponds to the values of the keys, the order they are extracted is not necessarily the order in which they were added to the dictionary. Dictionaries associate keys to values, but unlike lists, no specific order for the items is implied.
You will have already encountered loops in other languages. In Python a basic for
loop looks like this:
for n in [2,4,6,9]:
print(n)
## 2
## 4
## 6
## 9
The main ingredients are the looping variable n
and the list to loop through - here [2,4,6,9]
. Note that the ‘body’ of the loop is indented by a tab. Typing the tab is essential - its actually part of Python’s syntax. In this case, the loop takes every value in this list and prints it out. Also note that this is two lines of Python, and that each line on its own is insufficient to create the loop. Both lines must be entered in a box in Jupyter (lines separated by ‘enter’) and then shift+enter to run the loop.
For each cycle of the loop, n
refers to the corresponding item in the list. Looping through lists is a useful tool if you want to add up their values. Consider the following code:
age_total = 0.0
for age in ages:
age_total = age_total + age
print(age_total)
## 255.0
Note that the last line in the code has no indent (ie the first character is not tab) - this tells Python that the line is executed after the loop is completed - if it had been indented, Python would execute it on every cycle of the loop. This would result in all of the running totals to be printed as well as the final result. In your editor, add a tab to the last line, copy the modified code to the clipoboard and paste and run again.
age_total = 0.0
for age in ages:
age_total = age_total + age
print(age_total)
## 19.0
## 40.0
## 63.0
## 97.0
## 140.0
## 192.0
## 255.0
The output now shows the running total as predicted. If you had wanted an average age rather than a total age, the code is relatively easy to modify - again do this by editing the code in your text editor, and run it.
age_total = 0.0
for age in ages:
age_total = age_total + age
age_average = age_total / len(ages)
print(age_average)
## 36.42857142857143
Finally on this code snippet, a useful tool is the use of +=
- this operator adds the right hand value to the left hand variable, and stores it in that variable. For example, x = x + 1
can be replaced by x += 1
. The code now becomes
age_total = 0.0
for age in ages:
age_total += age
age_average = age_total / len(ages)
print(age_average)
## 36.42857142857143
Now you may see why the keys
method for dictionaries is useful - it provides a list of keys in a dictionary to loop through.
for province in neighbours.keys():
print(province,"has", len(neighbours[province]), "bordering provinces")
## Ulster has 2 bordering provinces
## Connaught has 3 bordering provinces
## Leinster has 3 bordering provinces
## Munster has 2 bordering provinces
However, the output is rather messy. A new use of the %
operator is as a formating tool. The expression fmt % x
creates a string in which the variable x
is formatted according to a specification in the string fmt
- the range of possible formats is large, but for now note that the format '%10s'
takes a string and pads it out with spaces to have a length of 10, if it is shorter than this beforehand. The code below adds a statement in the loop to do this.
for province in neighbours.keys():
fmt_prov = '%10s' % province
print(fmt_prov,"has", len(neighbours[province]), "bordering provinces")
## Ulster has 2 bordering provinces
## Connaught has 3 bordering provinces
## Leinster has 3 bordering provinces
## Munster has 2 bordering provinces
Also, if it is possible to left-justify the province names, by replacing 10
with -10
in the format statements.
for province in neighbours.keys():
fmt_prov = '%-10s' % province
print(fmt_prov,"has", len(neighbours[province]), "bordering provinces")
## Ulster has 2 bordering provinces
## Connaught has 3 bordering provinces
## Leinster has 3 bordering provinces
## Munster has 2 bordering provinces
As well as the functions that are built in to Python (such as len
) it is possible to define your own functions. For example, to define a function to compute the average of a list of numbers, the following can be used:
def average(data_set):
total = 0.0
for item in data_set:
total += item
return total / len(data_set)
As with for
loops the indents (via tabbing) are actually part of the syntax. The indented code under the def
statement is part of the function - and when the indenting stops, the function definition is complete. Note also the loop inside the function. The loop body is doubly indented, because
If you enter this into a a Jupyter box and send to to Python by hitting shift+enter, you have the added a new function to Python, called average
. At this stage you will see no print-out because you have only defined the function - not used it. In the next box enter the following to test it out:
height = [160.0,163.0,157.0,171.0,168.0,176.0]
print(average(height))
## 165.83333333333334
You can see that average
now works like any other Python function. This could be made to look neater by formatting the result:
print('%8.2f' % average(height))
## 165.83
Note that functions can return lists and dictionaries as well as single values - for example the built-in function range
returns a list of numbers from 0 to n - 1
.
print(range(5))
## range(0, 5)
This is useful in basic loops that count through the value of some index:
for i in range(10):
print(i, i*i)
## 0 0
## 1 1
## 2 4
## 3 9
## 4 16
## 5 25
## 6 36
## 7 49
## 8 64
## 9 81
Again, watch out for the zero indexing, people often expect the index to run from 1
to 10
not 0
to 9
.
range
is also useful for loops that have to be run a fixed number of times, but without actually referring to the index variable. For example, to create a list with 10 entries, all equal to zero, use
l = []
for i in range(10):
l += [0]
print(l)
## [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The +=
here works in list mode, ie the statement is equivalent to l = l + [0]
, which appends a new element 0
to the existing list l
. Because the list is initially empty, doing this 10 times gives a list of 10 zeroes. The value of i
is not used in the loop, but because it loops over 10 values, the desired effect is achieved.
Putting some of these ideas together, the function below returns a list of length n
where each element in the list is twice the value of its predecessor. Add the following to average.py
and run it.
def doubling_up(n):
latest = 1
result = []
for i in range(n):
result += [latest]
latest *= 2
return result
print(doubling_up(4))
## [1, 2, 4, 8]
You can now combine the two functions you have written:
print(average(doubling_up(12)))
## 341.25
When you create a function via def
you often use variables inside the function. These are known as local
variables. An interesting characteristic of these are that they only exist inside the function definition - so in the function doubling_up
the variables latest
and result
do not exist once the function has been run. The other characteristic of local variables is that if in the main program there are variables with the same names, they won’t be altered when you call the function.
Hence:
latest = 'Hello'
print(doubling_up(6))
## [1, 2, 4, 8, 16, 32]
print(latest)
## Hello
You can also verify that print(result
leads to an error, as result
is undefined outside of the function.)
Python also supports conditional statements - these are sections of code that are only run if some condition is true or false. To begin with, note that Python can also evaluate logical expressions:
x = 6
y = 9
print(x < 8)
## True
print(x == 6)
## True
print(x >= y)
## False
These are expressions that have the value True
or False
depending on the truth of the statement. The general comparison operators are:
Operator | Meaning |
---|---|
== | Equal to |
!= | Not equal to |
< | Less than |
> | Greater than |
<= | Less than or equal to |
>= | Greater than or equal to |
Note the difference between =
and ==
. x == 6
is an expression having the value True
or False
depending on whether x
has the value 6
or not, but x = 6
assigns the value 6
to x
, overwriting any previous value.
In addition, these can be combined using not
, and
and or
. For example
print(not x < 8)
## False
print(x == 6 or y == 5)
## True
print(x > 5 or y > 15)
## True
Another useful operator is in
:
print(x in [5,7,9])
## False
print(x in range(12))
## True
These can be used in conjunction with the if
statement - this works using the tabbing approach, in the same way as def
and for
:
z = 4
if z < 8 :
print('z is less than 4')
print('So it must be pretty small...')
## z is less than 4
## So it must be pretty small...
The lines that are indented with tabs after the if
statement are only executed if the logical expression is true. Once the tabbing stops, the lines are executed regardless of the test condition:
if z < 8 :
print('z is less than 8')
print('So it must be pretty small...')
## z is less than 8
## So it must be pretty small...
print('This gets printed anyway')
## This gets printed anyway
z = 12
if z < 8 :
print('z is less than 8')
print('So it must be pretty small...')
print('This gets printed anyway')
## This gets printed anyway
There is also an else
statement - this specifies code to be executed if the test in the if
statement isn’t true. Its use is demonstrated here:
z = 12
if z < 8 :
print('z is less than 8')
print('So it must be pretty small...')
else:
print('z is at least 8')
print('So it is fairly big')
## z is at least 8
## So it is fairly big
print('This gets printed anyway')
## This gets printed anyway
Once again, the code associated with the else
statement is indented with a tab. Now set z
to some value less than 8
and try running the code above again.
As before, you can incorporate all of the ideas together. For example, define a function to compute factorials. The factorial of a number n
is defined as n * (n -1) * (n - 2) * ... * 2 * 1
, so factorial(4)
is 4 * 3 * 2 * 1 = 24
. An exception is that the factorial of zero is one. A function to compute factorials is
def factorial(n):
if n == 0:
return 1
else:
result = 1
for i in range(n):
result *= i + 1
return result
If you define this in a Jupyter box, you can then try it out.
print(factorial(5))
## 120
print(factorial(0))
## 1
There are a few things to note about the function definition. Firstly, note the multiple tab nesting - there is a for
loop inside an else
statement inside a def
of a new function. Another thing to note is that the result *= (i + 1)
statement - this makes a running result of the multiplications, but because range
gives a list going from 0
to n - 1
, it is necessary to use i + 1
as the multiplier. As a self test question, what would happen if you used result *= i
instead?
Next, an interesting aside - factorials can get very large quite rapidly - for example the factorial of 20 is 2,432,902,008,176,640,000. Try to compute the factorial of 50:
print(factorial(50))
## 30414093201713378043612608166064768844377641568960512000000000000
Python has another data type called long
- these are basically integers of arbitrary length. When an integer calculation gets too large for standard 4-byte integers, the result converts to a long. Here is a more extreme example
print(factorial(1000))
## 402387260077093773543702433923003985719374864210714632543799910429938512398629020592044208486969404800479988610197196058631666872994808558901323829669944590997424504087073759918823627727188732519779505950995276120874975462497043601418278094646496291056393887437886487337119181045825783647849977012476632889835955735432513185323958463075557409114262417474349347553428646576611667797396668820291207379143853719588249808126867838374559731746136085379534524221586593201928090878297308431392844403281231558611036976801357304216168747609675871348312025478589320767169132448426236131412508780208000261683151027341827977704784635868170164365024153691398281264810213092761244896359928705114964975419909342221566832572080821333186116811553615836546984046708975602900950537616475847728421889679646244945160765353408198901385442487984959953319101723355556602139450399736280750137837615307127761926849034352625200015888535147331611702103968175921510907788019393178114194545257223865541461062892187960223838971476088506276862967146674697562911234082439208160153780889893964518263243671616762179168909779911903754031274622289988005195444414282012187361745992642956581746628302955570299024324153181617210465832036786906117260158783520751516284225540265170483304226143974286933061690897968482590125458327168226458066526769958652682272807075781391858178889652208164348344825993266043367660176999612831860788386150279465955131156552036093988180612138558600301435694527224206344631797460594682573103790084024432438465657245014402821885252470935190620929023136493273497565513958720559654228749774011413346962715422845862377387538230483865688976461927383814900140767310446640259899490222221765904339901886018566526485061799702356193897017860040811889729918311021171229845901641921068884387121855646124960798722908519296819372388642614839657382291123125024186649353143970137428531926649875337218940694281434118520158014123344828015051399694290153483077644569099073152433278288269864602789864321139083506217095002597389863554277196742822248757586765752344220207573630569498825087968928162753848863396909959826280956121450994871701244516461260379029309120889086942028510640182154399457156805941872748998094254742173582401063677404595741785160829230135358081840096996372524230560855903700624271243416909004153690105933983835777939410970027753472000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Factorials are not defined for negative numbers. However the current function does not check for this -
print(factorial(-4))
## 1
so the answer does not make sense. It might be better to modify the function to test whether the number is negative, and instead of returning a numerical result, return the string 'Undefined'
. One way to do this is to test whether the number is negative, return 'Undefined'
if that is true, and then put the existing code in an else
clause.
def factorial(n):
if n < 0 :
return 'Undefined'
else:
if n == 0:
return 1
else:
result = 1
for i in range(n):
result *= i + 1
return result
It is possible to click on a Jupyter box you have already entered, and edit the code. When you have done that, pressing shift+enter re-submits the function to Python. Doing this to the factorial function, so the edited version is the one above, results in new behaviour:
print(factorial(-4))
## Undefined
print(factorial(4))
## 24
note that the code above is checking for three statuses of n
- either n < 0
, or n == 0
or n > 0
. The above approach deals with this, but requires that you nest several if
statements. A shorthand version uses elif
- the template here is
\begin{enumerate} 1. if
first condition to test 2. ‘tabbed in’ code to execute if above is true 3. elif
Next condition (if first condition not true) 4. ‘tabbed in’ code to execute (if above condition is true) 5. repeat steps 3 and 4 if no previous conditions are true 6. else
Do this if none of the above conditions are true - this is the catch-all 7. ‘tabbed in’ code to execute
Steps 5 and 6 can be omitted if no catch-all code is required.
def factorial(n):
if n < 0 :
return 'Undefined'
elif n == 0:
return 1
else:
result = 1
for i in range(n):
result *= i + 1
return result
The factorial function now checks for negative numbers - but an alternative to returning a value if one is found is to cause an error to be raised. Python has its own errors, but it is also possible to create new ones, via the raise
statement - as in this code:
def factorial(n):
if n < 0 :
raise Exception('Factorial not defined for negative integers')
elif n == 0:
return 1
else:
result = 1
for i in range(n):
result *= i + 1
return result
If you edit the function again then you can test it out:
print(factorial(-6))
You will see an error of the form
Exception: Factorial not defined for negative integers
returned. This functions in the same way as a Python built in error - but is more helpful in identifying the problem, since it relates directly to the function you are defining.
The while loop is another kind of loop, making use of a logical expression. The number of times a for
loop cycles is determined when the loop is started - it is just the length of the list in the expression for i in list :
. A while loop begins with a logical expression and loops as long as the expression is true. In this case, the number of times the loop cycles is not known. Below a while loop is used to contruct a ‘doubling up’ sequence, as before, but this time instead of going for a fixed length it carries on until the value exceeds 1000 -
latest = 1
result = []
while latest <= 1000:
result += [latest]
latest *= 2
print(result)
## [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
This can also be used in functions - again note the use of tabbing. In this code block, the function double_up_until
is defined, and then also run.
def double_up_until(n_max):
latest = 1
result = []
while latest <= n_max:
result += [latest]
latest *= 2
return result
print(double_up_until(10000))
## [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]
When you have finshed the exercise (or at any stage during the exercise) it is possible to save your status. You do this by clicking on File and then Save Notebook as… from the menu. Choose a suitable name for your notebook (it should have the file ending .ipynb
) and click Save. I suugest something like pythonweek1.ipynb
as a good name. When you come back to this click on the file icon, and select the file you saved. This loads all of the commands you entered but at this stage they won’t have been run. To run all of the Jupyter boxes in the same way you put them in, click on Run and then Run all cells. You can now see everything you already entered, and any printouts they produced. Also any variables you created will exist for use in future code you put in.
If you have got this far you will have a reasonable grasp of the key ideas of programming in Python. To finish, here are a few exercises that you can use to practice your programming skills for next week. Also, to learn more Python ideas, try visiting .
Write a Python function to add up all of the numbers from 1
to n
.
Euclid’s algorithm to find the greatest common divisor (GCD) of two integers is one of the oldest documented algorithms in the world, dating back to around 300BC. The GCD is the largest number that divides exactly into the two numbers supplied - so for example the GCD of 18 and 15 is 3. The algorithm can be described as follows:
and
b` - the aim is to find their GCDb
is greater than zero, repeat the following two steps:
b
with the remainder when a
is divided by b
a
with the old value of b
b
is zero, a
is the GCDWrite a Python function, gcd(a,b)
that returns the GCD of a
and b
.
actually plus one, see above.↩