Course Description
NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, you’ll become a master wrangler of NumPy’s core object: arrays! Using data from New York City’s tree census, you’ll create, sort, filter, and update arrays. You’ll discover why NumPy is so efficient and use broadcasting and vectorization to make your NumPy code even faster. By the end of the course, you’ll be using 3D arrays to alter a Claude Monet painting, and you’ll understand why such array alterations are essential tools for machine learning.
Meet the incredible NumPy array! Learn how to create and change array shapes to suit your needs. Finally, discover NumPy’s many data types and how they contribute to speedy array operations.
Once you’re comfortable with NumPy, you’ll find yourself converting Python lists into NumPy arrays all the time for increased speed and access to NumPy’s excellent array methods.
sudoku_list
is a Python list containing a sudoku
game:
[[0, 0, 4, 3, 0, 0, 2, 0, 9],
[0, 0, 5, 0, 0, 9, 0, 0, 1],
[0, 7, 0, 0, 6, 0, 0, 4, 3],
[0, 0, 6, 0, 0, 2, 0, 8, 7],
[1, 9, 0, 0, 0, 7, 4, 0, 0],
[0, 5, 0, 0, 8, 3, 0, 0, 0],
[6, 0, 0, 0, 0, 0, 1, 0, 5],
[0, 0, 3, 5, 0, 8, 6, 9, 0],
[0, 4, 2, 9, 1, 0, 3, 0, 0]]
You’re going to change sudoku_list
into a NumPy array so
you can practice with it in later lessons, for example by creating a 4D
array of sudoku games along with their solutions!
sudoku_list
into a NumPy array called
sudoku_array
.type()
of sudoku_array
to
check that your code has worked properly.# edited/added
import numpy as np
sudoku_list = np.load('sudoku_game.npy')
# Import NumPy
import numpy as np
# Convert sudoku_list into an array
sudoku_array = np.array(sudoku_list)
# Print the type of sudoku_array
print(type(sudoku_array))
It can be helpful to know how to create quick NumPy arrays from scratch in order to test your code. For example, when you are doing math with large multi-dimensional arrays, it’s nice to check whether the math works as expected on small test arrays before applying your code to the larger arrays. NumPy has many options for creating smaller synthetic arrays.
With this in mind, it’s time for you to create some arrays from
scratch! numpy
is imported for you as np
.
zero_array
, which has two rows and four columns.0
and 1
called random_array
, which has three
rows and six columns.# Create an array of zeros which has four columns and two rows
zero_array = np.zeros((2, 4))
print(zero_array)
# Create an array of random floats which has six columns and three rows
random_array = np.random.random((3, 6))
print(random_array)
np.arange()
has especially useful applications in
graphing. Your task is to create a scatter plot with the values from
doubling_array
on the y-axis.
doubling_array = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
Recall that a scatter plot can be created using the following code:
plt.scatter(x_values, y_values)
plt.show()
With doubling_array
on the y-axis, you now need values
for the x-axis, which you can create with np.arange()
!
numpy
is loaded for you as np
, and
matplotlib.pyplot
is imported as plt
.
np.arange()
, create a 1D array called
one_to_ten
which holds all integers from one to ten
(inclusive).doubling_array
as the y
values and one_to_ten
as the x values.# edited/added
from matplotlib import pyplot as plt
doubling_array = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
# Create an array of integers from one to ten
one_to_ten = np.arange(1, 11)
# Create your scatterplot
plt.scatter(one_to_ten, doubling_array)
plt.show()
In the first lesson, you created a sudoku_game
two-dimensional NumPy array. Perhaps you have hundreds of sudoku game
arrays, and you’d like to save the solution for this one,
sudoku_solution
, as part of the same array as its
corresponding game in order to organize your sudoku data better. You
could accomplish this by stacking the two 2D arrays on top of each other
to create a 3D array.
numpy
is loaded as np
, and the
sudoku_game
and sudoku_solution
arrays are
available.
game_and_solution
by stacking
the two 2D arrays, sudoku_game
and
sudoku_solution
, on top of one another; in the final array,
sudoku_game
should appear before
sudoku_solution
.game_and_solution
.# edited/added
sudoku_solution = np.load('sudoku_solution.npy')
sudoku_list = np.load('sudoku_game.npy')
sudoku_game = np.array(sudoku_list)
# Create the game_and_solution 3D array
game_and_solution = np.array([sudoku_game, sudoku_solution])
# Print game_and_solution
print(game_and_solution)
Printing arrays is a good way to check code output for small arrays
like sudoku_game_and_solution
, but it becomes unwieldy when
dealing with bigger arrays and those with higher dimensions. Another
important check is to look at the array’s .shape
.
Now, you’ll create a 4D array that contains two sudoku games and
their solutions. numpy
is loaded as np
. The
game_and_solution
3D array you created in the previous
example is available, along with new_sudoku_game
and
new_sudoku_solution
.
new_game_and_solution
with a different 2D game and 2D solution pair:
new_sudoku_game
and new_sudoku_solution
.
new_sudoku_game
should appear before
new_sudoku_solution
.games_and_solutions
by making
an array out of the two 3D arrays: game_and_solution
and
new_game_and_solution
, in that order.games_and_solutions
.# edited/added
new_sudoku_game = np.load('new_sudoku_game.npy')
new_sudoku_solution = np.load('new_sudoku_solution.npy')
game_and_solution = np.load('game_and_solution.npy')
# Create a second 3D array of another game and its solution
new_game_and_solution = np.array([new_sudoku_game, new_sudoku_solution])
# Create a 4D array of both game and solution 3D arrays
games_and_solutions = np.array([game_and_solution, new_game_and_solution])
# Print the shape of your 4D array
print(games_and_solutions.shape)
You’ve learned to change not only array shape but also the number of
dimensions that an array has. To test these skills, you’ll change
sudoku_game
from a 2D array to a 1D array and back again.
Can we trust NumPy to keep the array elements in the same order after
being flattened and reshaped? Time to find out.
numpy
is imported as np
, and
sudoku_game
is loaded for you.
sudoku_game
so that it is a 1D array, and save
it as flattened_game
..shape
of flattened_game
.flattened_game
back to its original shape
of nine rows and nine columns; save the new array as
reshaped_game
.# edited/added
sudoku_game = np.load('sudoku_game_new.npy')
# Flatten sudoku_game
flattened_game = sudoku_game.flatten()
# Print the shape of flattened_game
print(flattened_game.shape)
# Flatten sudoku_game
flattened_game = sudoku_game.flatten()
# Print the shape of flattened_game
print(flattened_game.shape)
# Reshape flattened_game back to a nine by nine array
reshaped_game = flattened_game.reshape((9, 9))
# Print sudoku_game and reshaped_game
print(sudoku_game)
print(reshaped_game)
One way to control the data type of a NumPy array is to declare it
when the array is created using the dtype
keyword argument.
Take a look at the data type NumPy uses by default when creating an
array with np.zeros()
. Could it be updated?
numpy
is loaded as np
.
np.zeros()
, create an array of zeros that has
three rows and two columns; call it zero_array
.zero_array
.zero_int_array
,
which will also have three rows and two columns, but the data type
should be np.int32
.zero_int_array
.# Create an array of zeros with three rows and two columns
zero_array = np.zeros((3, 2))
# Print the data type of zero_array
print(zero_array.dtype)
# Create an array of zeros with three rows and two columns
zero_array = np.zeros((3, 2))
# Print the data type of zero_array
print(zero_array.dtype)
# Create a new array of int32 zeros with three rows and two columns
zero_int_array = np.zeros((3, 2), dtype=np.int32)
# Print the data type of zero_int_array
print(zero_int_array.dtype)
Anticipating what data type an array will have is very important since some NumPy functionality only works with certain data types. Let’s see what you’ve got.
np.array([78.988, "NumPy", True])
np.array([9, 1.12, True]).astype("<U5")
np.array([34.62, 70.13, 9]).astype(np.int64)
np.array([45.67, True], dtype=np.int8)
np.array([[6, 15.7], [True, False]])
np.random.random((4, 5))
NumPy data types, which emphasize speed, are more specific than Python data types, which emphasize flexibility. When working with large amounts of data in NumPy, it’s good practice to check the data type and consider whether a smaller data type is large enough for your data, since smaller data types use less memory.
It’s time to make your sudoku game more memory-efficient using your
knowledge of data types! sudoku_game
has been loaded for
you as a NumPy array. numpy
is imported as
np
.
sudoku_game
.The current data type of sudoku_game
is
int64
. Which of the following NumPy integers is the
smallest bitsize that is still large enough to hold the data in
sudoku_game
? If you have never played sudoku, know that
sudoku games only ever store integers from one to nine.
np.int64
np.int32
np.int16
np.int8
Change the data type of sudoku_game
to be int8, an
8-bit integer; name the new array
small_sudoku_game
.
Print the data type of small_sudoku_game
to be sure
that your change to int8
is reflected.
# Print the data type of sudoku_game
print(sudoku_game.dtype)
# Print the data type of sudoku_game
print(sudoku_game.dtype)
# Change the data type of sudoku_game to int8
small_sudoku_game = sudoku_game.astype(np.int8)
# Print the data type of small_sudoku_game
print(small_sudoku_game.dtype)
Sharpen your NumPy data wrangling skills by slicing, filtering, and sorting New York City’s tree census data. Create new arrays by pulling data based on conditional statements, and add and remove data along any dimension to suit your purpose. Along the way, you’ll learn the shape and dimension compatibility principles to prepare for super-fast array math.
Imagine you are a researcher working with data from New York City’s
tree census. Each row of the tree_census
2D array lists
information for a different tree: the tree ID, block ID, trunk diameter,
and stump diameter in that order. Living trees do not have stump
diameters, which explains why there are so many zeros in that column.
Column order is important because NumPy does not have column names! The
first and last three rows of tree_census
are shown
below.
array([[ 3, 501451, 24, 0],
[ 4, 501451, 20, 0],
[ 7, 501911, 3, 0],
...,
[ 1198, 227387, 11, 0],
[ 1199, 227387, 11, 0],
[ 1210, 227386, 6, 0]])
In this exercise, you’ll be working specifically with the second
column, representing block IDs: your research requires you to select
specific city blocks for further analysis using NumPy slicing and
indexing. numpy
is loaded as np
, and the
tree_census
2D array is available.
block_ids
.block_ids
.block_ids
, saving the
result as tenth_block_id
.block_ids
,
starting with the tenth ID, and save as block_id_slice
# edited/added
tree_census = np.load('tree_census.npy')
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Print the first five block_ids
print(block_ids[:5])
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Select the tenth block ID from block_ids
tenth_block_id = block_ids[9]
print(tenth_block_id)
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Select five block IDs from block_ids starting with the tenth ID
block_id_slice = block_ids[9:14]
print(block_id_slice)
Now assume that your research requires you to take an admittedly
unrepresentative sample of trunk diameters, which are located in the
third column of tree_census
. Getting just a selection of
trunk diameters can be done with NumPy’s slicing and stepping
functionality.
numpy
is loaded as np
, and the
tree_census
2D array is available.
hundred_diameters
which contains
the first 100 trunk diameters in tree_census
.every_other_diameter
, which contains
only trunk diameters for trees with even row indices
from 50 to 100, inclusive.# Create an array of the first 100 trunk diameters from tree_census
hundred_diameters = tree_census[:100, 2]
print(hundred_diameters)
# Create an array of trunk diameters with even row indices from 50 to 100 inclusive
every_other_diameter = tree_census[50:101:2, 2]
print(every_other_diameter)
Sometimes it’s easiest to understand data when it is sorted according to the value you are most interested in. Your new research task is to create an array containing the trunk diameters in the New York City tree census, sorted in order from smallest to largest.
numpy
is loaded as np
, and the
tree_census
2D array is available.
sorted_trunk_diameters
which
selects only the trunk diameter column from tree_census
and
sorts it so that the smallest trunk diameters are at the top of the
array and the largest at the bottom.# Extract trunk diameters information and sort from smallest to largest
sorted_trunk_diameters = np.sort(tree_census[:, 2])
print(sorted_trunk_diameters)
In the last lesson, you sorted trees from smallest to largest. Now,
you’ll use fancy indexing to return the row of data representing the
largest tree in tree_census
. You’ll also examine other
trees located on the same block as the largest tree: are they also
large?
numpy
is loaded as np
, and the
tree_census
array is available. As a reminder, the
tree_census
columns in order refer to a tree’s ID, its
block ID, its trunk diameter, and its stump diameter.
largest_tree_data
, which contains the row of data on the
largest tree in tree_census
corresponding to the tree with
a diameter of 51
.# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Slice largest_tree_data to get only the block id
largest_tree_block_id = largest_tree_data[:, 1]
print(largest_tree_block_id)
# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Slice largest_tree_data to get only the block ID
largest_tree_block_id = largest_tree_data[:, 1]
print(largest_tree_block_id)
# Create an array which contains row data on all trees with largest_tree_block_id
trees_on_largest_tree_block = tree_census[tree_census[:, 1] == largest_tree_block_id]
print(trees_on_largest_tree_block)
You and your tree research team are double-checking collection data by visiting a few trees in person to confirm their measurements. You’ve been assigned to check the data for trees on block 313879, and you’d like to make a small array of just the tree data that relates to your work.
numpy
is loaded as np
, and the
tree_census
array is available. As a reminder, the
tree_census
columns in order refer to a tree’s ID, its
block ID, its trunk diameter, and its stump diameter.
block_313879
which only contains data for trees with a
block ID of 313879.np.where()
, create an array of
row_indices
for trees with a block ID of 313879.row_indices
, create block_313879
,
which contains data for trees on block 313879.# Create the block_313879 array containing trees on block 313879
block_313879 = tree_census[tree_census[:, 1] == 313879]
print(block_313879)
# Create an array of row_indices for trees on block 313879
row_indices = np.where(tree_census[:, 1] == 313879)
# Create an array which only contains data for trees on block 313879
block_313879 = tree_census[row_indices]
print(block_313879)
Currently, the stump diameter and trunk diameter values in
tree_census
are in two different columns. Living trees have
a stump diameter of zero while stumps have a trunk diameter of zero. If
you’d like to include both living trees and stumps in certain research
calculations, it might be useful to have their diameters together in
just one column.
numpy
is loaded as np
, and the
tree_census
array is available. As a reminder, the tree
census columns in order refer to a tree’s ID, its block ID, its trunk
diameter, and its stump diameter.
trunk_stump_diameters
, which replaces a tree’s trunk
diameter with its stump diameter if the trunk diameter is zero.# Create and print a 1D array of tree and stump diameters
trunk_stump_diameters = np.where(tree_census[:, 2] == 0, tree_census[:, 3], tree_census[:, 2])
print(trunk_stump_diameters)
Before concatenating, it’s important to check whether two arrays can be concatenated together. If not, the array may need to be reshaped before concatenation.
(4, 2)
and (6, 2)
(15, 5)
and (100, 5)
(4, 2)
and (4, 3)
(5, 2)
and (7, 4)
(4, 2)
and (4,)
(4, 2)
and (2,)
The research team has discovered two trees that were left off the
tree_census
. Your task is to add rows containing the data
for these new trees to the end of the tree_census
array.
The new trees’ data is saved in a 2D array called
new_trees
:
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
numpy
is loaded as np
, and the
tree_census
and new_trees
arrays are
available.
tree_census
and
new_trees
to confirm they are compatible to
concatenate.tree_census
containing data for
the new trees from the new_trees
2D array; save the new
array as updated_tree_census
.# edited/added
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
# Print the shapes of tree_census and new_trees
print(tree_census.shape, new_trees.shape)
# Print the shapes of tree_census and new_trees
print(tree_census.shape, new_trees.shape)
# Add rows to tree_census which contain data for the new trees
updated_tree_census = np.concatenate((tree_census, new_trees))
print(updated_tree_census)
You finished the last set of exercises by creating an array called
trunk_stump_diameters
, which combined data from the trunk
diameter and stump diameter columns into a 1D array. Now, you’ll add
that 1D array as a column to the tree_census
array.
numpy
is loaded as np
, and both the
tree_census
and trunk_stump_diameters
arrays
are available.
tree_census
and
trunk_stump_diameters
.trunk_stump_diameters
so that it can be
appended as the last column in tree_census
; call the
reshaped array reshaped_diameters
.reshaped_diameters
to the end of
tree_census
so that it becomes the last column; call the
new array concatenated_tree_census
.# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Reshape trunk_stump_diameters
reshaped_diameters = trunk_stump_diameters.reshape((1000, 1))
# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Reshape trunk_stump_diameters
reshaped_diameters = trunk_stump_diameters.reshape((1000, 1))
# Concatenate reshaped_diameters to tree_census as the last column
concatenated_tree_census = np.concatenate((tree_census, reshaped_diameters), axis=1)
print(concatenated_tree_census)
What if your tree research focuses only on living trees on publicly-owned city blocks? It might be helpful to delete some unneeded data like the stump diameter column and some trees located on private blocks.
You’ve learned that NumPy’s np.delete()
function takes
three arguments: the original array, the index or indices to be deleted,
and the axis to delete along. If you don’t know the index or indices of
the array you’d like to delete, recall that when it is only passed one
argument,np.where()
returns an array of indices where a
condition is met!
numpy
is loaded as np
, and the
tree_census
2D array is available. The columns in order
refer to a tree’s ID, block number, trunk diameter, and stump
diameter.
tree_census
, and
save the new 2D array as tree_census_no_stumps
.np.where()
, find the indices of any trees on
block 313879, a private block. Save the indices in an array called
private_block_indices
. Using the indices you just found
using np.where()
, delete the rows for trees on block 313879
from tree_census_no_stumps
, saving the new 2D array as
tree_census_clean
. Print the shape of
tree_census_clean
.# Delete the stump diameter column from tree_census
tree_census_no_stumps = np.delete(tree_census, 3, axis=1)
# Save the indices of the trees on block 313879
private_block_indices = np.where(tree_census[:, 1] == 313879)
# Delete the stump diameter column from tree_census
tree_census_no_stumps = np.delete(tree_census, 3, axis=1)
# Save the indices of the trees on block 313879
private_block_indices = np.where(tree_census[:,1] == 313879)
# Delete the rows for trees on block 313879 from tree_census_no_stumps
tree_census_clean = np.delete(tree_census_no_stumps, private_block_indices, axis=0)
# Print the shape of tree_census_clean
print(tree_census_clean.shape)
Leverage NumPy’s speedy vectorized operations to gather summary insights on sales data for American liquor stores, restaurants, and department stores. Vectorize Python functions for use in your NumPy code. Finally, use broadcasting logic to perform mathematical operations between arrays of different sizes.
The dataset you’ll be working with during this chapter is one year’s
sales data by month for three different industries. Each row in this
monthly_sales
array represents a month from January to
December. The first column has monthly sales data for liquor stores, the
second column has data for restaurants, and the last column tracks sales
for department stores.
array([[ 4134, 23925, 8657],
[ 4116, 23875, 9142],
[ 4673, 27197, 10645],
[ 4580, 25637, 10456],
[ 5109, 27995, 11299],
[ 5011, 27419, 10625],
[ 5245, 27305, 10630],
[ 5270, 27760, 11550],
[ 4680, 24988, 9762],
[ 4913, 25802, 10456],
[ 5312, 25405, 13401],
[ 6630, 27797, 18403]])
Your task is to create an array with all the information from
monthly_sales
as well as a fourth column which totals the
monthly sales across industries for each month.
numpy
is loaded for you as np
, and the
monthly_sales
array is available.
monthly_industry_sales
.monthly_industry_sales
with
monthly_sales
into a new array called
monthly_sales_with_total
, with the monthly cross-industry
sales information in the final column.# edited/added
monthly_sales = np.load('monthly_sales.npy')
# Create a 2D array of total monthly sales across industries
monthly_industry_sales = monthly_sales.sum(axis=1, keepdims=True)
print(monthly_industry_sales)
# Create a 2D array of total monthly sales across industries
monthly_industry_sales = monthly_sales.sum(axis=1, keepdims=True)
print(monthly_industry_sales)
# Add this column as the last column in monthly_sales
monthly_sales_with_total = np.concatenate((monthly_sales, monthly_industry_sales), axis=1)
print(monthly_sales_with_total)
Perhaps you have a hunch that department stores see greater increased sales than average during the end of the year as people rush to buy gifts. You’d like to test this theory by comparing monthly department store sales to average sales across all three industries.
numpy
is loaded for you as np
, and the
monthly_sales
array is available. The
monthly_sales
columns in order refer to liquor store,
restaurant, and department store sales.
avg_monthly_sales
, which
contains an average sales amount for each month across the three
industries.avg_monthly_sales
on the y-axis.monthly_sales
on the y-axis.# Create the 1D array avg_monthly_sales
avg_monthly_sales = monthly_sales.mean(axis=1)
print(avg_monthly_sales)
# Create the 1D array avg_monthly_sales
avg_monthly_sales = monthly_sales.mean(axis=1)
print(avg_monthly_sales)
# Plot avg_monthly_sales by month
plt.plot(np.arange(1, 13), avg_monthly_sales, label="Average sales across industries")
# Plot department store sales by month
plt.plot(np.arange(1, 13), monthly_sales[:, 2], label="Department store sales")
plt.legend()
plt.show()
In the last exercise, you established that December is a big month for department stores. Are there other months where sales increase or decrease significantly?
Your task now is to look at monthly cumulative sales for each industry. The slope of the cumulative sales line will explain a lot about how steady sales are over time: a straight line will indicate steady growth, and changes in slope will indicate relative increases or decreases in sales.
numpy
is loaded for you as np
, and the
monthly_sales
array is available. The
monthly_sales
columns in order refer to liquor store,
restaurant, and department store sales.
cumulative_monthly_industry_sales
.# Find cumulative monthly sales for each industry
cumulative_monthly_industry_sales = monthly_sales.cumsum(axis=0)
print(cumulative_monthly_industry_sales)
# Find cumulative monthly sales for each industry
cumulative_monthly_industry_sales = monthly_sales.cumsum(axis=0)
print(cumulative_monthly_industry_sales)
# Plot each industry's cumulative sales by month as separate lines
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 0], label="Liquor Stores")
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 1], label="Restaurants")
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 2], label="Department stores")
plt.legend()
plt.show()
It’s possible to use vectorized operations to calculate taxes for the liquor, restaurant, and department store industries! We’ll simplify the tax calculation process here and assume that government collects 5% of all sales across these industries as tax revenue.
Your task is to calculate the tax owed by each industry related to
each month’s sales. numpy
is loaded for you as
np
, and the monthly_sales
array is
available.
tax_collected
which calculates
tax collected by industry and month by multiplying each element in
monthly_sales
by 0.05.total_tax_and_revenue
collected by each industry and month
by adding each element in tax_collected
with its
corresponding element in monthly_sales
.# Create an array of tax collected by industry and month
tax_collected = monthly_sales * 0.05
print(tax_collected)
# Create an array of tax collected by industry and month
tax_collected = monthly_sales * 0.05
print(tax_collected)
# Create an array of sales revenue plus tax collected by industry and month
total_tax_and_revenue = tax_collected + monthly_sales
print(total_tax_and_revenue)
You’d like to be able to plan for next year’s operations by
projecting what sales will be, and you’ve gathered multipliers specific
to each month and industry. These multipliers are saved in an array
called monthly_industry_multipliers
. For example, the
multiplier at monthly_industry_multipliers[0, 0]
of
0.98
means that the liquor store industry is projected to
have 98% of this January’s sales in January of next year.
array([[0.98, 1.02, 1. ],
[1.00, 1.01, 0.97],
[1.06, 1.03, 0.98],
[1.08, 1.01, 0.98],
[1.08, 0.98, 0.98],
[1.1 , 0.99, 0.99],
[1.12, 1.01, 1. ],
[1.1 , 1.02, 1. ],
[1.11, 1.01, 1.01],
[1.08, 0.99, 0.97],
[1.09, 1. , 1.02],
[1.13, 1.03, 1.02]])
numpy
is loaded for you as np
, and the
monthly_sales
and monthly_industry_multipliers
arrays are available. The monthly_sales
columns in order
refer to liquor store, restaurant, and department store sales.
projected_monthly_sales
which
contains projected sales for all three industries based on the
multipliers you have gathered.# edited/added
monthly_industry_multipliers = np.load('monthly_industry_multipliers.npy')
# Create an array of monthly projected sales for all industries
projected_monthly_sales = monthly_sales * monthly_industry_multipliers
print(projected_monthly_sales)
# Create an array of monthly projected sales for all industries
projected_monthly_sales = monthly_sales * monthly_industry_multipliers
print(projected_monthly_sales)
# Graph current liquor store sales and projected liquor store sales by month
plt.plot(np.arange(1, 13), monthly_sales[:, 0], label="Current liquor store sales")
plt.plot(np.arange(1, 13), projected_monthly_sales[:, 0], label="Projected liquor store sales")
plt.legend()
plt.show()
There are many situations where you might want to use Python methods
and functions on array elements in NumPy. You can always write a
for
loop to do this, but vectorized operations are much
faster and more efficient, so consider using
np.vectorize()
!
You’ve got an array called names
which contains first
and last names:
names = np.array([["Izzy", "Monica", "Marvin"],
["Weber", "Patel", "Hernandez"]])
You’d like to use one of the Python methods you learned on DataCamp,
.upper()
, to make all the letters of every name in the
array uppercase. As a reminder, .upper()
is a string
method, meaning that it must be called on an instance of a string:
str.upper()
.
Your task is to vectorize this Python method. numpy
is
loaded for you as np
, and the names
array is
available.
vectorized_upper
from the Python .upper()
string method.vectorized_upper()
to the names
array and save the resulting array as uppercase_names
.# edited/added
names = np.array([["Izzy", "Monica", "Marvin"],
["Weber", "Patel", "Hernandez"]])
# Vectorize the .upper() string method
vectorized_upper = np.vectorize(str.upper)
# Apply vectorized_upper to the names array
uppercase_names = vectorized_upper(names)
print(uppercase_names)
Broadcasting takes the power of vectorized operations in NumPy one step further, saving memory and computing power. But before broadcasting, you’ll need to check whether it’s even possible to use broadcasting in your mathematical operations!
(3, 4)
and (1, 4)
(3, 4)
and (4, )
(3, 4)
and (3, 1)
(3, 4)
and (1, 2)
(3, 4)
and (4, 1)
(3, 4)
and (3, )
Recall that when broadcasting across columns, NumPy requires you to be explicit that it should broadcast a vertical array, and horizontal and vertical 1D arrays do not exist in NumPy. Instead, you must first create a 2D array to declare that you have vertical data. Then, NumPy creates a copy of this 2D vertical array for each column and applies the desired operation.
A Python list called monthly_growth_rate
with
len()
of 12
is available. This list represents
monthly year-over-year expected growth for the economy; your task is to
use broadcasting to multiply this list by each column in the
monthly_sales
array. The monthly_sales
array
is loaded, along with numpy
as np
.
monthly_growth_rate
, currently a Python list,
into a one-dimensional NumPy array called
monthly_growth_1D
.monthly_growth_1D
so that it is broadcastable
column-wise across monthly_sales
; call the new array
monthly_growth_2D
.monthly_sales
by monthly_growth_2D
.# edited/added
monthly_growth_rate = [1.01, 1.03, 1.03, 1.02, 1.05, 1.03, 1.06, 1.04, 1.03, 1.04, 1.02, 1.01]
# Convert monthly_growth_rate into a NumPy array
monthly_growth_1D = np.array(monthly_growth_rate)
# Reshape monthly_growth_1D
monthly_growth_2D = monthly_growth_1D.reshape((12, 1))
# Multiply each column in monthly_sales by monthly_growth_2D
print(monthly_growth_2D * monthly_sales)
In the last set of exercises, you used
monthly_industry_multipliers
, to create sales predictions.
Recall that monthly_industry_multipliers
looks like
this:
array([[0.98, 1.02, 1. ],
[1.00, 1.01, 0.97],
[1.06, 1.03, 0.98],
[1.08, 1.01, 0.98],
[1.08, 0.98, 0.98],
[1.1 , 0.99, 0.99],
[1.12, 1.01, 1. ],
[1.1 , 1.02, 1. ],
[1.11, 1.01, 1.01],
[1.08, 0.99, 0.97],
[1.09, 1. , 1.02],
[1.13, 1.03, 1.02]])
Assume you’re not comfortable being so specific with your estimates.
Rather, you’d like to use monthly_industry_multipliers
to
find a single average multiplier for each industry. Then you’ll use that
multiplier to project next year’s sales.
numpy
is loaded for you as np
, and the
monthly_sales
and monthly_industry_multipliers
arrays are available. The monthly_sales
columns in order
refer to liquor store, restaurant, and department store sales.
mean_multipliers
.mean_multipliers
and
monthly_sales
to ensure they are suitable for
broadcasting.projected_sales
.# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Print the shapes of mean_multipliers and monthly_sales
print(mean_multipliers.shape, monthly_sales.shape)
# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Print the shapes of mean_multipliers and monthly_sales
print(mean_multipliers.shape, monthly_sales.shape)
# Multiply each value by the multiplier for that industry
projected_sales = monthly_sales * mean_multipliers
print(projected_sales)
NumPy meets the art world in this final chapter as we use image data from a Monet masterpiece to explore how you can use to augment image data. You’ll use flipping and transposing functionality to quickly transform our masterpiece. Next, you’ll pull the Monet array apart, make changes, and reconstruct it using array stacking to see the results.
The exercises for this chapter will use a NumPy array holding an
image in RGB format. Which image? You’ll have to load the array from the
mystery_image.npy
file to find out!
numpy
is loaded as np
, and
mystery_image.npy
is available.
mystery_image.npy
file using the alias
f
, saving the contents as an array called
rgb_array
.# Load the mystery_image.npy file
with open("mystery_image.npy", "rb") as f:
rgb_array = np.load(f)
plt.imshow(rgb_array)
plt.show()
You’ll need to use the .astype()
array method we covered
in the first chapter of this course for the next exercise. If you forget
exactly how .astype()
works, you could check out the course
slides or NumPy’s documentation on numpy.org. There is, however, an even
faster way to jog your memory…
numpy
is loaded as np
.
.astype()
.# Display the documentation for .astype()
help(np.ndarray.astype)
Perhaps you are training a machine learning model to recognize ocean
scenes. You’d like the model to understand that oceans are not only
associated with bright, summery colors, so you’re careful to include
images of oceans in bad weather or evening light as well. You may have
to manually transform some images in order to balance the data, so your
task is to darken the Monet ocean scene rgb_array
.
Recall from the video that white is associated with the maximum RGB
value of 255, while darker colors are associated with lower values.
numpy
is loaded as np
, and the 3D Monet
rgb_array
that you loaded in the last exercise is
available.
rgb_array
by 50 percent, saving
the resulting array as darker_rgb_array
.darker_rgb_array
into an array of integers called
darker_rgb_int_array
so that it can be plotted.darker_rgb_int_array
as an .npy
file
called darker_monet.npy
using the alias
f
.# edited/added
rgb_array = np.load('rgb_array.npy')
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Convert darker_rgb_array into an array of integers
darker_rgb_int_array = darker_rgb_array.astype(np.int8)
plt.imshow(darker_rgb_int_array)
plt.show()
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Convert darker_rgb_array into an array of integers
darker_rgb_int_array = darker_rgb_array.astype(np.int8)
plt.imshow(darker_rgb_int_array)
plt.show()
# Save darker_rgb_int_array to an .npy file called darker_monet.npy
with open("darker_monet.npy", "wb") as f:
np.save(f, darker_rgb_int_array)
Perhaps you’re still working on that machine learning model that identifies ocean scenes in paintings. You’d like to generate a few extra images to augment your existing data. After all, a human can tell that a painting is of an ocean even if the painting is upside-down: why shouldn’t your machine learning model?
numpy
is loaded as np
, and the 3D Monet
rgb_array
is available.
rgb_array
so that it is the mirror image of the
original, with the ocean on the right and grassy knoll on the left.rgb_array
so that it is upside down but otherwise
remains the same.# Flip rgb_array so that it is the mirror image of the original
mirrored_monet = np.flip(rgb_array, axis=1)
plt.imshow(mirrored_monet)
plt.show()
# Flip rgb_array so that it is upside down
upside_down_monet = np.flip(rgb_array, axis=(0, 1))
plt.imshow(upside_down_monet)
plt.show()
You’ve learned that transposing an array reverses the order of the
array’s axes. To transpose the axes in a different order, you can pass
the desired axes order as arguments. You’ll practice with the 3D Monet
rgb_array
, loaded for you. numpy
has been
imported as np
.
rgb_array
so that the image appears
rotated 90 degrees left and as a mirror image of itself.# Transpose rgb_array
transposed_rgb = np.transpose(rgb_array, axes=(1, 0, 2))
plt.imshow(transposed_rgb)
plt.show()
Splitting and stacking skills aren’t just useful with 3D RGB arrays: they are excellent for subsetting and organizing data of any type and dimension!
You’ll now take a quick trip down memory lane to reorganize the
monthly_sales
array as a 3D array. Recall that the first
dimension of monthly_sales
is rows of a single month’s
sales across three industries, and the second dimension is columns of
monthly sales data for a single industry.
Your task is to split this data into quarterly sales data and stack
the quarterly sales data so that the new third dimension represents the
four 2D arrays of quarterly sales.numpy
is loaded as
np
, and the monthly_sales
array is
available.
monthly_sales
into four arrays representing
quarterly data across industries; print q1_sales
.quarterly_sales
, made up of the four quarterly 2D arrays in
order from the first to last quarter.# Split monthly_sales into quarterly data
q1_sales, q2_sales, q3_sales, q4_sales = np.split(monthly_sales, 4)
print(q1_sales)
# Split monthly_sales into quarterly data
q1_sales, q2_sales, q3_sales, q4_sales = np.split(monthly_sales, 4)
print(q1_sales)
# Stack the four quarterly sales arrays
quarterly_sales = np.stack([q1_sales, q2_sales, q3_sales, q4_sales])
print(quarterly_sales)
Perhaps you’d like to better understand Monet’s use of the color
blue. Your task is to create a version of the Monet
rgb_array
that emphasizes parts of the painting that use
lots of blue by making them even bluer! You’ll perform the splitting
portion of this task in this exercise and the stacking portion in the
next.
numpy
is loaded as np
, and the Monet
rgb_array
is available.
rgb_array
into red, green, and blue
only pixel data; save the results as as red_array
,
green_array
, and blue_array
.emphasized_blue_array
, which replaces
blue_array
values with 255 if they are higher than the mean
value of blue_array
; otherwise, the value remains the
same..shape
of
emphasized_blue_array
.emphasized_blue_array
to remove the trailing
third dimension; save as emphasized_blue_array_2D
.# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Create emphasized_blue_array
emphasized_blue_array = np.where(blue_array > blue_array.mean(), 255, blue_array)
# Print the shape of emphasized_blue_array
print(emphasized_blue_array.shape)
# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Create emphasized_blue_array
emphasized_blue_array = np.where(blue_array > blue_array.mean(), 255, blue_array)
# Print the shape of emphasized_blue_array
print(emphasized_blue_array.shape)
# Remove the trailing dimension from emphasized_blue_array
emphasized_blue_array_2D = emphasized_blue_array.reshape((675, 843)) # edited/added
Now you’ll combine red_array
, green_array
,
and emphasized_blue_array_2D
to see what Monet’s painting
looks like with the blues emphasized!
numpy
is loaded as np
, and the
red_array
, green_array
,
blue_array
and emphasized_blue_array_2D
objects that you created in the last exercise are available.
blue_array
and
emphasized_blue_array_2D
.red_array
and green_array
so that
they can be stacked with emphasized_blue_array_2D
.red_array_2D
, green_array_2D
, and
emphasized_blue_array_2D
together (in that order) into a 3D
array called emphasized_blue_monet
.# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Reshape red_array and green_array
red_array_2D = red_array.reshape((675, 843)) # edited/added
green_array_2D = green_array.reshape((675, 843)) # edited/added
# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Reshape red_array and green_array
red_array_2D = red_array.reshape((675, 843)) # edited/added
green_array_2D = green_array.reshape((675, 843)) # edited/added
# Stack red_array_2D, green_array_2D, and emphasized_blue_array_2D
emphasized_blue_monet = np.stack([red_array_2D, green_array_2D, emphasized_blue_array_2D], axis=2)
plt.imshow(emphasized_blue_monet)
plt.show()
You made it!
You’re now a card-carrying member of Team NumPy. I hope I’ve convinced you that NumPy is amazing.
It leverages the immense calculating power of C while keeping the friendly syntax of Python. In some places, it even makes Python syntax simpler by vectorizing operations!
What’s next on your NumPy journey? You could check out one of the many libraries built on top of NumPy’s API. Perhaps you’d like to explore data analysis with pandas, data visualization with Seaborn, parallel programming with Dask, machine learning with scikit-learn, or deep learning with TensorFlow. Every one of these libraries is built on top of NumPy, and DataCamp has courses dedicated to all of them.
No matter where you take your NumPy knowledge, thanks for taking this course all the way to the end. I’m Izzy Weber, and on behalf of all of us at DataCamp, congratulations on your wide new array of NumPy skills!
Thanks again!