Course Description
NumPy is an essential Python library. TensorFlow and scikit-learn use NumPy arrays as inputs, and pandas and Matplotlib are built on top of NumPy. In this Introduction to NumPy course, you’ll become a master wrangler of NumPy’s core object: arrays! Using data from New York City’s tree census, you’ll create, sort, filter, and update arrays. You’ll discover why NumPy is so efficient and use broadcasting and vectorization to make your NumPy code even faster. By the end of the course, you’ll be using 3D arrays to alter a Claude Monet painting, and you’ll understand why such array alterations are essential tools for machine learning.
Meet the incredible NumPy array! Learn how to create and change array shapes to suit your needs. Finally, discover NumPy’s many data types and how they contribute to speedy array operations.
Once you’re comfortable with NumPy, you’ll find yourself converting Python lists into NumPy arrays all the time for increased speed and access to NumPy’s excellent array methods.
sudoku_list is a Python list containing a sudoku
game:
[[0, 0, 4, 3, 0, 0, 2, 0, 9],
[0, 0, 5, 0, 0, 9, 0, 0, 1],
[0, 7, 0, 0, 6, 0, 0, 4, 3],
[0, 0, 6, 0, 0, 2, 0, 8, 7],
[1, 9, 0, 0, 0, 7, 4, 0, 0],
[0, 5, 0, 0, 8, 3, 0, 0, 0],
[6, 0, 0, 0, 0, 0, 1, 0, 5],
[0, 0, 3, 5, 0, 8, 6, 9, 0],
[0, 4, 2, 9, 1, 0, 3, 0, 0]]
You’re going to change sudoku_list into a NumPy array so
you can practice with it in later lessons, for example by creating a 4D
array of sudoku games along with their solutions!
sudoku_list into a NumPy array called
sudoku_array.type() of sudoku_array to
check that your code has worked properly.# edited/added
import numpy as np
sudoku_list = np.load('sudoku_game.npy')
# Import NumPy
import numpy as np
# Convert sudoku_list into an array
sudoku_array = np.array(sudoku_list)
# Print the type of sudoku_array
print(type(sudoku_array))
It can be helpful to know how to create quick NumPy arrays from scratch in order to test your code. For example, when you are doing math with large multi-dimensional arrays, it’s nice to check whether the math works as expected on small test arrays before applying your code to the larger arrays. NumPy has many options for creating smaller synthetic arrays.
With this in mind, it’s time for you to create some arrays from
scratch! numpy is imported for you as np.
zero_array, which has two rows and four columns.0
and 1 called random_array, which has three
rows and six columns.# Create an array of zeros which has four columns and two rows
zero_array = np.zeros((2, 4))
print(zero_array)
# Create an array of random floats which has six columns and three rows
random_array = np.random.random((3, 6))
print(random_array)
np.arange() has especially useful applications in
graphing. Your task is to create a scatter plot with the values from
doubling_array on the y-axis.
doubling_array = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
Recall that a scatter plot can be created using the following code:
plt.scatter(x_values, y_values)
plt.show()
With doubling_array on the y-axis, you now need values
for the x-axis, which you can create with np.arange()!
numpy is loaded for you as np, and
matplotlib.pyplot is imported as plt.
np.arange(), create a 1D array called
one_to_ten which holds all integers from one to ten
(inclusive).doubling_array as the y
values and one_to_ten as the x values.# edited/added
from matplotlib import pyplot as plt
doubling_array = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
# Create an array of integers from one to ten
one_to_ten = np.arange(1, 11)
# Create your scatterplot
plt.scatter(one_to_ten, doubling_array)
plt.show()
In the first lesson, you created a sudoku_game
two-dimensional NumPy array. Perhaps you have hundreds of sudoku game
arrays, and you’d like to save the solution for this one,
sudoku_solution, as part of the same array as its
corresponding game in order to organize your sudoku data better. You
could accomplish this by stacking the two 2D arrays on top of each other
to create a 3D array.
numpy is loaded as np, and the
sudoku_game and sudoku_solution arrays are
available.
game_and_solution by stacking
the two 2D arrays, sudoku_game and
sudoku_solution, on top of one another; in the final array,
sudoku_game should appear before
sudoku_solution.game_and_solution.# edited/added
sudoku_solution = np.load('sudoku_solution.npy')
sudoku_list = np.load('sudoku_game.npy')
sudoku_game = np.array(sudoku_list)
# Create the game_and_solution 3D array
game_and_solution = np.array([sudoku_game, sudoku_solution])
# Print game_and_solution
print(game_and_solution)
Printing arrays is a good way to check code output for small arrays
like sudoku_game_and_solution, but it becomes unwieldy when
dealing with bigger arrays and those with higher dimensions. Another
important check is to look at the array’s .shape.
Now, you’ll create a 4D array that contains two sudoku games and
their solutions. numpy is loaded as np. The
game_and_solution 3D array you created in the previous
example is available, along with new_sudoku_game and
new_sudoku_solution.
new_game_and_solution
with a different 2D game and 2D solution pair:
new_sudoku_game and new_sudoku_solution.
new_sudoku_game should appear before
new_sudoku_solution.games_and_solutions by making
an array out of the two 3D arrays: game_and_solution and
new_game_and_solution, in that order.games_and_solutions.# edited/added
new_sudoku_game = np.load('new_sudoku_game.npy')
new_sudoku_solution = np.load('new_sudoku_solution.npy')
game_and_solution = np.load('game_and_solution.npy')
# Create a second 3D array of another game and its solution
new_game_and_solution = np.array([new_sudoku_game, new_sudoku_solution])
# Create a 4D array of both game and solution 3D arrays
games_and_solutions = np.array([game_and_solution, new_game_and_solution])
# Print the shape of your 4D array
print(games_and_solutions.shape)
You’ve learned to change not only array shape but also the number of
dimensions that an array has. To test these skills, you’ll change
sudoku_game from a 2D array to a 1D array and back again.
Can we trust NumPy to keep the array elements in the same order after
being flattened and reshaped? Time to find out.
numpy is imported as np, and
sudoku_game is loaded for you.
sudoku_game so that it is a 1D array, and save
it as flattened_game..shape of flattened_game.flattened_game back to its original shape
of nine rows and nine columns; save the new array as
reshaped_game.# edited/added
sudoku_game = np.load('sudoku_game_new.npy')
# Flatten sudoku_game
flattened_game = sudoku_game.flatten()
# Print the shape of flattened_game
print(flattened_game.shape)
# Flatten sudoku_game
flattened_game = sudoku_game.flatten()
# Print the shape of flattened_game
print(flattened_game.shape)
# Reshape flattened_game back to a nine by nine array
reshaped_game = flattened_game.reshape((9, 9))
# Print sudoku_game and reshaped_game
print(sudoku_game)
print(reshaped_game)
One way to control the data type of a NumPy array is to declare it
when the array is created using the dtype keyword argument.
Take a look at the data type NumPy uses by default when creating an
array with np.zeros(). Could it be updated?
numpy is loaded as np.
np.zeros(), create an array of zeros that has
three rows and two columns; call it zero_array.zero_array.zero_int_array,
which will also have three rows and two columns, but the data type
should be np.int32.zero_int_array.# Create an array of zeros with three rows and two columns
zero_array = np.zeros((3, 2))
# Print the data type of zero_array
print(zero_array.dtype)
# Create an array of zeros with three rows and two columns
zero_array = np.zeros((3, 2))
# Print the data type of zero_array
print(zero_array.dtype)
# Create a new array of int32 zeros with three rows and two columns
zero_int_array = np.zeros((3, 2), dtype=np.int32)
# Print the data type of zero_int_array
print(zero_int_array.dtype)
Anticipating what data type an array will have is very important since some NumPy functionality only works with certain data types. Let’s see what you’ve got.
np.array([78.988, "NumPy", True])np.array([9, 1.12, True]).astype("<U5")np.array([34.62, 70.13, 9]).astype(np.int64)np.array([45.67, True], dtype=np.int8)np.array([[6, 15.7], [True, False]])np.random.random((4, 5))NumPy data types, which emphasize speed, are more specific than Python data types, which emphasize flexibility. When working with large amounts of data in NumPy, it’s good practice to check the data type and consider whether a smaller data type is large enough for your data, since smaller data types use less memory.
It’s time to make your sudoku game more memory-efficient using your
knowledge of data types! sudoku_game has been loaded for
you as a NumPy array. numpy is imported as
np.
sudoku_game.The current data type of sudoku_game is
int64. Which of the following NumPy integers is the
smallest bitsize that is still large enough to hold the data in
sudoku_game? If you have never played sudoku, know that
sudoku games only ever store integers from one to nine.
np.int64
np.int32
np.int16
np.int8
Change the data type of sudoku_game to be int8, an
8-bit integer; name the new array
small_sudoku_game.
Print the data type of small_sudoku_game to be sure
that your change to int8 is reflected.
# Print the data type of sudoku_game
print(sudoku_game.dtype)
# Print the data type of sudoku_game
print(sudoku_game.dtype)
# Change the data type of sudoku_game to int8
small_sudoku_game = sudoku_game.astype(np.int8)
# Print the data type of small_sudoku_game
print(small_sudoku_game.dtype)
Sharpen your NumPy data wrangling skills by slicing, filtering, and sorting New York City’s tree census data. Create new arrays by pulling data based on conditional statements, and add and remove data along any dimension to suit your purpose. Along the way, you’ll learn the shape and dimension compatibility principles to prepare for super-fast array math.
Imagine you are a researcher working with data from New York City’s
tree census. Each row of the tree_census 2D array lists
information for a different tree: the tree ID, block ID, trunk diameter,
and stump diameter in that order. Living trees do not have stump
diameters, which explains why there are so many zeros in that column.
Column order is important because NumPy does not have column names! The
first and last three rows of tree_census are shown
below.
array([[ 3, 501451, 24, 0],
[ 4, 501451, 20, 0],
[ 7, 501911, 3, 0],
...,
[ 1198, 227387, 11, 0],
[ 1199, 227387, 11, 0],
[ 1210, 227386, 6, 0]])
In this exercise, you’ll be working specifically with the second
column, representing block IDs: your research requires you to select
specific city blocks for further analysis using NumPy slicing and
indexing. numpy is loaded as np, and the
tree_census 2D array is available.
block_ids.block_ids.block_ids, saving the
result as tenth_block_id.block_ids,
starting with the tenth ID, and save as block_id_slice# edited/added
tree_census = np.load('tree_census.npy')
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Print the first five block_ids
print(block_ids[:5])
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Select the tenth block ID from block_ids
tenth_block_id = block_ids[9]
print(tenth_block_id)
# Select all rows of block ID data from the second column
block_ids = tree_census[:, 1]
# Select five block IDs from block_ids starting with the tenth ID
block_id_slice = block_ids[9:14]
print(block_id_slice)
Now assume that your research requires you to take an admittedly
unrepresentative sample of trunk diameters, which are located in the
third column of tree_census. Getting just a selection of
trunk diameters can be done with NumPy’s slicing and stepping
functionality.
numpy is loaded as np, and the
tree_census 2D array is available.
hundred_diameters which contains
the first 100 trunk diameters in tree_census.every_other_diameter, which contains
only trunk diameters for trees with even row indices
from 50 to 100, inclusive.# Create an array of the first 100 trunk diameters from tree_census
hundred_diameters = tree_census[:100, 2]
print(hundred_diameters)
# Create an array of trunk diameters with even row indices from 50 to 100 inclusive
every_other_diameter = tree_census[50:101:2, 2]
print(every_other_diameter)
Sometimes it’s easiest to understand data when it is sorted according to the value you are most interested in. Your new research task is to create an array containing the trunk diameters in the New York City tree census, sorted in order from smallest to largest.
numpy is loaded as np, and the
tree_census 2D array is available.
sorted_trunk_diameters which
selects only the trunk diameter column from tree_census and
sorts it so that the smallest trunk diameters are at the top of the
array and the largest at the bottom.# Extract trunk diameters information and sort from smallest to largest
sorted_trunk_diameters = np.sort(tree_census[:, 2])
print(sorted_trunk_diameters)
In the last lesson, you sorted trees from smallest to largest. Now,
you’ll use fancy indexing to return the row of data representing the
largest tree in tree_census. You’ll also examine other
trees located on the same block as the largest tree: are they also
large?
numpy is loaded as np, and the
tree_census array is available. As a reminder, the
tree_census columns in order refer to a tree’s ID, its
block ID, its trunk diameter, and its stump diameter.
largest_tree_data, which contains the row of data on the
largest tree in tree_census corresponding to the tree with
a diameter of 51.# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Slice largest_tree_data to get only the block id
largest_tree_block_id = largest_tree_data[:, 1]
print(largest_tree_block_id)
# Create an array which contains row data on the largest tree in tree_census
largest_tree_data = tree_census[tree_census[:, 2] == 51]
print(largest_tree_data)
# Slice largest_tree_data to get only the block ID
largest_tree_block_id = largest_tree_data[:, 1]
print(largest_tree_block_id)
# Create an array which contains row data on all trees with largest_tree_block_id
trees_on_largest_tree_block = tree_census[tree_census[:, 1] == largest_tree_block_id]
print(trees_on_largest_tree_block)
You and your tree research team are double-checking collection data by visiting a few trees in person to confirm their measurements. You’ve been assigned to check the data for trees on block 313879, and you’d like to make a small array of just the tree data that relates to your work.
numpy is loaded as np, and the
tree_census array is available. As a reminder, the
tree_census columns in order refer to a tree’s ID, its
block ID, its trunk diameter, and its stump diameter.
block_313879 which only contains data for trees with a
block ID of 313879.np.where(), create an array of
row_indices for trees with a block ID of 313879.row_indices, create block_313879,
which contains data for trees on block 313879.# Create the block_313879 array containing trees on block 313879
block_313879 = tree_census[tree_census[:, 1] == 313879]
print(block_313879)
# Create an array of row_indices for trees on block 313879
row_indices = np.where(tree_census[:, 1] == 313879)
# Create an array which only contains data for trees on block 313879
block_313879 = tree_census[row_indices]
print(block_313879)
Currently, the stump diameter and trunk diameter values in
tree_census are in two different columns. Living trees have
a stump diameter of zero while stumps have a trunk diameter of zero. If
you’d like to include both living trees and stumps in certain research
calculations, it might be useful to have their diameters together in
just one column.
numpy is loaded as np, and the
tree_census array is available. As a reminder, the tree
census columns in order refer to a tree’s ID, its block ID, its trunk
diameter, and its stump diameter.
trunk_stump_diameters, which replaces a tree’s trunk
diameter with its stump diameter if the trunk diameter is zero.# Create and print a 1D array of tree and stump diameters
trunk_stump_diameters = np.where(tree_census[:, 2] == 0, tree_census[:, 3], tree_census[:, 2])
print(trunk_stump_diameters)
Before concatenating, it’s important to check whether two arrays can be concatenated together. If not, the array may need to be reshaped before concatenation.
(4, 2) and (6, 2)(15, 5) and (100, 5)(4, 2) and (4, 3)(5, 2) and (7, 4)(4, 2) and (4,)(4, 2) and (2,)The research team has discovered two trees that were left off the
tree_census. Your task is to add rows containing the data
for these new trees to the end of the tree_census array.
The new trees’ data is saved in a 2D array called
new_trees:
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
numpy is loaded as np, and the
tree_census and new_trees arrays are
available.
tree_census and
new_trees to confirm they are compatible to
concatenate.tree_census containing data for
the new trees from the new_trees 2D array; save the new
array as updated_tree_census.# edited/added
new_trees = np.array([[1211, 227386, 20, 0], [1212, 227386, 8, 0]])
# Print the shapes of tree_census and new_trees
print(tree_census.shape, new_trees.shape)
# Print the shapes of tree_census and new_trees
print(tree_census.shape, new_trees.shape)
# Add rows to tree_census which contain data for the new trees
updated_tree_census = np.concatenate((tree_census, new_trees))
print(updated_tree_census)
You finished the last set of exercises by creating an array called
trunk_stump_diameters, which combined data from the trunk
diameter and stump diameter columns into a 1D array. Now, you’ll add
that 1D array as a column to the tree_census array.
numpy is loaded as np, and both the
tree_census and trunk_stump_diameters arrays
are available.
tree_census and
trunk_stump_diameters.trunk_stump_diameters so that it can be
appended as the last column in tree_census; call the
reshaped array reshaped_diameters.reshaped_diameters to the end of
tree_census so that it becomes the last column; call the
new array concatenated_tree_census.# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Reshape trunk_stump_diameters
reshaped_diameters = trunk_stump_diameters.reshape((1000, 1))
# Print the shapes of tree_census and trunk_stump_diameters
print(trunk_stump_diameters.shape, tree_census.shape)
# Reshape trunk_stump_diameters
reshaped_diameters = trunk_stump_diameters.reshape((1000, 1))
# Concatenate reshaped_diameters to tree_census as the last column
concatenated_tree_census = np.concatenate((tree_census, reshaped_diameters), axis=1)
print(concatenated_tree_census)
What if your tree research focuses only on living trees on publicly-owned city blocks? It might be helpful to delete some unneeded data like the stump diameter column and some trees located on private blocks.
You’ve learned that NumPy’s np.delete() function takes
three arguments: the original array, the index or indices to be deleted,
and the axis to delete along. If you don’t know the index or indices of
the array you’d like to delete, recall that when it is only passed one
argument,np.where() returns an array of indices where a
condition is met!
numpy is loaded as np, and the
tree_census 2D array is available. The columns in order
refer to a tree’s ID, block number, trunk diameter, and stump
diameter.
tree_census, and
save the new 2D array as tree_census_no_stumps.np.where(), find the indices of any trees on
block 313879, a private block. Save the indices in an array called
private_block_indices. Using the indices you just found
using np.where(), delete the rows for trees on block 313879
from tree_census_no_stumps, saving the new 2D array as
tree_census_clean. Print the shape of
tree_census_clean.# Delete the stump diameter column from tree_census
tree_census_no_stumps = np.delete(tree_census, 3, axis=1)
# Save the indices of the trees on block 313879
private_block_indices = np.where(tree_census[:, 1] == 313879)
# Delete the stump diameter column from tree_census
tree_census_no_stumps = np.delete(tree_census, 3, axis=1)
# Save the indices of the trees on block 313879
private_block_indices = np.where(tree_census[:,1] == 313879)
# Delete the rows for trees on block 313879 from tree_census_no_stumps
tree_census_clean = np.delete(tree_census_no_stumps, private_block_indices, axis=0)
# Print the shape of tree_census_clean
print(tree_census_clean.shape)
Leverage NumPy’s speedy vectorized operations to gather summary insights on sales data for American liquor stores, restaurants, and department stores. Vectorize Python functions for use in your NumPy code. Finally, use broadcasting logic to perform mathematical operations between arrays of different sizes.
The dataset you’ll be working with during this chapter is one year’s
sales data by month for three different industries. Each row in this
monthly_sales array represents a month from January to
December. The first column has monthly sales data for liquor stores, the
second column has data for restaurants, and the last column tracks sales
for department stores.
array([[ 4134, 23925, 8657],
[ 4116, 23875, 9142],
[ 4673, 27197, 10645],
[ 4580, 25637, 10456],
[ 5109, 27995, 11299],
[ 5011, 27419, 10625],
[ 5245, 27305, 10630],
[ 5270, 27760, 11550],
[ 4680, 24988, 9762],
[ 4913, 25802, 10456],
[ 5312, 25405, 13401],
[ 6630, 27797, 18403]])
Your task is to create an array with all the information from
monthly_sales as well as a fourth column which totals the
monthly sales across industries for each month.
numpy is loaded for you as np, and the
monthly_sales array is available.
monthly_industry_sales.monthly_industry_sales with
monthly_sales into a new array called
monthly_sales_with_total, with the monthly cross-industry
sales information in the final column.# edited/added
monthly_sales = np.load('monthly_sales.npy')
# Create a 2D array of total monthly sales across industries
monthly_industry_sales = monthly_sales.sum(axis=1, keepdims=True)
print(monthly_industry_sales)
# Create a 2D array of total monthly sales across industries
monthly_industry_sales = monthly_sales.sum(axis=1, keepdims=True)
print(monthly_industry_sales)
# Add this column as the last column in monthly_sales
monthly_sales_with_total = np.concatenate((monthly_sales, monthly_industry_sales), axis=1)
print(monthly_sales_with_total)
Perhaps you have a hunch that department stores see greater increased sales than average during the end of the year as people rush to buy gifts. You’d like to test this theory by comparing monthly department store sales to average sales across all three industries.
numpy is loaded for you as np, and the
monthly_sales array is available. The
monthly_sales columns in order refer to liquor store,
restaurant, and department store sales.
avg_monthly_sales, which
contains an average sales amount for each month across the three
industries.avg_monthly_sales on the y-axis.monthly_sales on the y-axis.# Create the 1D array avg_monthly_sales
avg_monthly_sales = monthly_sales.mean(axis=1)
print(avg_monthly_sales)
# Create the 1D array avg_monthly_sales
avg_monthly_sales = monthly_sales.mean(axis=1)
print(avg_monthly_sales)
# Plot avg_monthly_sales by month
plt.plot(np.arange(1, 13), avg_monthly_sales, label="Average sales across industries")
# Plot department store sales by month
plt.plot(np.arange(1, 13), monthly_sales[:, 2], label="Department store sales")
plt.legend()
plt.show()
In the last exercise, you established that December is a big month for department stores. Are there other months where sales increase or decrease significantly?
Your task now is to look at monthly cumulative sales for each industry. The slope of the cumulative sales line will explain a lot about how steady sales are over time: a straight line will indicate steady growth, and changes in slope will indicate relative increases or decreases in sales.
numpy is loaded for you as np, and the
monthly_sales array is available. The
monthly_sales columns in order refer to liquor store,
restaurant, and department store sales.
cumulative_monthly_industry_sales.# Find cumulative monthly sales for each industry
cumulative_monthly_industry_sales = monthly_sales.cumsum(axis=0)
print(cumulative_monthly_industry_sales)
# Find cumulative monthly sales for each industry
cumulative_monthly_industry_sales = monthly_sales.cumsum(axis=0)
print(cumulative_monthly_industry_sales)
# Plot each industry's cumulative sales by month as separate lines
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 0], label="Liquor Stores")
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 1], label="Restaurants")
plt.plot(np.arange(1, 13), cumulative_monthly_industry_sales[:, 2], label="Department stores")
plt.legend()
plt.show()
It’s possible to use vectorized operations to calculate taxes for the liquor, restaurant, and department store industries! We’ll simplify the tax calculation process here and assume that government collects 5% of all sales across these industries as tax revenue.
Your task is to calculate the tax owed by each industry related to
each month’s sales. numpy is loaded for you as
np, and the monthly_sales array is
available.
tax_collected which calculates
tax collected by industry and month by multiplying each element in
monthly_sales by 0.05.total_tax_and_revenue collected by each industry and month
by adding each element in tax_collected with its
corresponding element in monthly_sales.# Create an array of tax collected by industry and month
tax_collected = monthly_sales * 0.05
print(tax_collected)
# Create an array of tax collected by industry and month
tax_collected = monthly_sales * 0.05
print(tax_collected)
# Create an array of sales revenue plus tax collected by industry and month
total_tax_and_revenue = tax_collected + monthly_sales
print(total_tax_and_revenue)
You’d like to be able to plan for next year’s operations by
projecting what sales will be, and you’ve gathered multipliers specific
to each month and industry. These multipliers are saved in an array
called monthly_industry_multipliers. For example, the
multiplier at monthly_industry_multipliers[0, 0] of
0.98 means that the liquor store industry is projected to
have 98% of this January’s sales in January of next year.
array([[0.98, 1.02, 1. ],
[1.00, 1.01, 0.97],
[1.06, 1.03, 0.98],
[1.08, 1.01, 0.98],
[1.08, 0.98, 0.98],
[1.1 , 0.99, 0.99],
[1.12, 1.01, 1. ],
[1.1 , 1.02, 1. ],
[1.11, 1.01, 1.01],
[1.08, 0.99, 0.97],
[1.09, 1. , 1.02],
[1.13, 1.03, 1.02]])
numpy is loaded for you as np, and the
monthly_sales and monthly_industry_multipliers
arrays are available. The monthly_sales columns in order
refer to liquor store, restaurant, and department store sales.
projected_monthly_sales which
contains projected sales for all three industries based on the
multipliers you have gathered.# edited/added
monthly_industry_multipliers = np.load('monthly_industry_multipliers.npy')
# Create an array of monthly projected sales for all industries
projected_monthly_sales = monthly_sales * monthly_industry_multipliers
print(projected_monthly_sales)
# Create an array of monthly projected sales for all industries
projected_monthly_sales = monthly_sales * monthly_industry_multipliers
print(projected_monthly_sales)
# Graph current liquor store sales and projected liquor store sales by month
plt.plot(np.arange(1, 13), monthly_sales[:, 0], label="Current liquor store sales")
plt.plot(np.arange(1, 13), projected_monthly_sales[:, 0], label="Projected liquor store sales")
plt.legend()
plt.show()
There are many situations where you might want to use Python methods
and functions on array elements in NumPy. You can always write a
for loop to do this, but vectorized operations are much
faster and more efficient, so consider using
np.vectorize()!
You’ve got an array called names which contains first
and last names:
names = np.array([["Izzy", "Monica", "Marvin"],
["Weber", "Patel", "Hernandez"]])
You’d like to use one of the Python methods you learned on DataCamp,
.upper(), to make all the letters of every name in the
array uppercase. As a reminder, .upper() is a string
method, meaning that it must be called on an instance of a string:
str.upper().
Your task is to vectorize this Python method. numpy is
loaded for you as np, and the names array is
available.
vectorized_upper
from the Python .upper() string method.vectorized_upper() to the names
array and save the resulting array as uppercase_names.# edited/added
names = np.array([["Izzy", "Monica", "Marvin"],
["Weber", "Patel", "Hernandez"]])
# Vectorize the .upper() string method
vectorized_upper = np.vectorize(str.upper)
# Apply vectorized_upper to the names array
uppercase_names = vectorized_upper(names)
print(uppercase_names)
Broadcasting takes the power of vectorized operations in NumPy one step further, saving memory and computing power. But before broadcasting, you’ll need to check whether it’s even possible to use broadcasting in your mathematical operations!
(3, 4) and (1, 4)(3, 4) and (4, )(3, 4) and (3, 1)(3, 4) and (1, 2)(3, 4) and (4, 1)(3, 4) and (3, )Recall that when broadcasting across columns, NumPy requires you to be explicit that it should broadcast a vertical array, and horizontal and vertical 1D arrays do not exist in NumPy. Instead, you must first create a 2D array to declare that you have vertical data. Then, NumPy creates a copy of this 2D vertical array for each column and applies the desired operation.
A Python list called monthly_growth_rate with
len() of 12 is available. This list represents
monthly year-over-year expected growth for the economy; your task is to
use broadcasting to multiply this list by each column in the
monthly_sales array. The monthly_sales array
is loaded, along with numpy as np.
monthly_growth_rate, currently a Python list,
into a one-dimensional NumPy array called
monthly_growth_1D.monthly_growth_1D so that it is broadcastable
column-wise across monthly_sales; call the new array
monthly_growth_2D.monthly_sales by monthly_growth_2D.# edited/added
monthly_growth_rate = [1.01, 1.03, 1.03, 1.02, 1.05, 1.03, 1.06, 1.04, 1.03, 1.04, 1.02, 1.01]
# Convert monthly_growth_rate into a NumPy array
monthly_growth_1D = np.array(monthly_growth_rate)
# Reshape monthly_growth_1D
monthly_growth_2D = monthly_growth_1D.reshape((12, 1))
# Multiply each column in monthly_sales by monthly_growth_2D
print(monthly_growth_2D * monthly_sales)
In the last set of exercises, you used
monthly_industry_multipliers, to create sales predictions.
Recall that monthly_industry_multipliers looks like
this:
array([[0.98, 1.02, 1. ],
[1.00, 1.01, 0.97],
[1.06, 1.03, 0.98],
[1.08, 1.01, 0.98],
[1.08, 0.98, 0.98],
[1.1 , 0.99, 0.99],
[1.12, 1.01, 1. ],
[1.1 , 1.02, 1. ],
[1.11, 1.01, 1.01],
[1.08, 0.99, 0.97],
[1.09, 1. , 1.02],
[1.13, 1.03, 1.02]])
Assume you’re not comfortable being so specific with your estimates.
Rather, you’d like to use monthly_industry_multipliers to
find a single average multiplier for each industry. Then you’ll use that
multiplier to project next year’s sales.
numpy is loaded for you as np, and the
monthly_sales and monthly_industry_multipliers
arrays are available. The monthly_sales columns in order
refer to liquor store, restaurant, and department store sales.
mean_multipliers.mean_multipliers and
monthly_sales to ensure they are suitable for
broadcasting.projected_sales.# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Print the shapes of mean_multipliers and monthly_sales
print(mean_multipliers.shape, monthly_sales.shape)
# Find the mean sales projection multiplier for each industry
mean_multipliers = monthly_industry_multipliers.mean(axis=0)
print(mean_multipliers)
# Print the shapes of mean_multipliers and monthly_sales
print(mean_multipliers.shape, monthly_sales.shape)
# Multiply each value by the multiplier for that industry
projected_sales = monthly_sales * mean_multipliers
print(projected_sales)
NumPy meets the art world in this final chapter as we use image data from a Monet masterpiece to explore how you can use to augment image data. You’ll use flipping and transposing functionality to quickly transform our masterpiece. Next, you’ll pull the Monet array apart, make changes, and reconstruct it using array stacking to see the results.
The exercises for this chapter will use a NumPy array holding an
image in RGB format. Which image? You’ll have to load the array from the
mystery_image.npy file to find out!
numpy is loaded as np, and
mystery_image.npy is available.
mystery_image.npy file using the alias
f, saving the contents as an array called
rgb_array.# Load the mystery_image.npy file
with open("mystery_image.npy", "rb") as f:
rgb_array = np.load(f)
plt.imshow(rgb_array)
plt.show()
You’ll need to use the .astype() array method we covered
in the first chapter of this course for the next exercise. If you forget
exactly how .astype() works, you could check out the course
slides or NumPy’s documentation on numpy.org. There is, however, an even
faster way to jog your memory…
numpy is loaded as np.
.astype().# Display the documentation for .astype()
help(np.ndarray.astype)
Perhaps you are training a machine learning model to recognize ocean
scenes. You’d like the model to understand that oceans are not only
associated with bright, summery colors, so you’re careful to include
images of oceans in bad weather or evening light as well. You may have
to manually transform some images in order to balance the data, so your
task is to darken the Monet ocean scene rgb_array.
Recall from the video that white is associated with the maximum RGB
value of 255, while darker colors are associated with lower values.
numpy is loaded as np, and the 3D Monet
rgb_array that you loaded in the last exercise is
available.
rgb_array by 50 percent, saving
the resulting array as darker_rgb_array.darker_rgb_array into an array of integers called
darker_rgb_int_array so that it can be plotted.darker_rgb_int_array as an .npy file
called darker_monet.npy using the alias
f.# edited/added
rgb_array = np.load('rgb_array.npy')
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Convert darker_rgb_array into an array of integers
darker_rgb_int_array = darker_rgb_array.astype(np.int8)
plt.imshow(darker_rgb_int_array)
plt.show()
# Reduce every value in rgb_array by 50 percent
darker_rgb_array = rgb_array * 0.5
# Convert darker_rgb_array into an array of integers
darker_rgb_int_array = darker_rgb_array.astype(np.int8)
plt.imshow(darker_rgb_int_array)
plt.show()
# Save darker_rgb_int_array to an .npy file called darker_monet.npy
with open("darker_monet.npy", "wb") as f:
np.save(f, darker_rgb_int_array)
Perhaps you’re still working on that machine learning model that identifies ocean scenes in paintings. You’d like to generate a few extra images to augment your existing data. After all, a human can tell that a painting is of an ocean even if the painting is upside-down: why shouldn’t your machine learning model?
numpy is loaded as np, and the 3D Monet
rgb_array is available.
rgb_array so that it is the mirror image of the
original, with the ocean on the right and grassy knoll on the left.rgb_array so that it is upside down but otherwise
remains the same.# Flip rgb_array so that it is the mirror image of the original
mirrored_monet = np.flip(rgb_array, axis=1)
plt.imshow(mirrored_monet)
plt.show()
# Flip rgb_array so that it is upside down
upside_down_monet = np.flip(rgb_array, axis=(0, 1))
plt.imshow(upside_down_monet)
plt.show()
You’ve learned that transposing an array reverses the order of the
array’s axes. To transpose the axes in a different order, you can pass
the desired axes order as arguments. You’ll practice with the 3D Monet
rgb_array, loaded for you. numpy has been
imported as np.
rgb_array so that the image appears
rotated 90 degrees left and as a mirror image of itself.# Transpose rgb_array
transposed_rgb = np.transpose(rgb_array, axes=(1, 0, 2))
plt.imshow(transposed_rgb)
plt.show()
Splitting and stacking skills aren’t just useful with 3D RGB arrays: they are excellent for subsetting and organizing data of any type and dimension!
You’ll now take a quick trip down memory lane to reorganize the
monthly_sales array as a 3D array. Recall that the first
dimension of monthly_sales is rows of a single month’s
sales across three industries, and the second dimension is columns of
monthly sales data for a single industry.
Your task is to split this data into quarterly sales data and stack
the quarterly sales data so that the new third dimension represents the
four 2D arrays of quarterly sales.numpy is loaded as
np, and the monthly_sales array is
available.
monthly_sales into four arrays representing
quarterly data across industries; print q1_sales.quarterly_sales, made up of the four quarterly 2D arrays in
order from the first to last quarter.# Split monthly_sales into quarterly data
q1_sales, q2_sales, q3_sales, q4_sales = np.split(monthly_sales, 4)
print(q1_sales)
# Split monthly_sales into quarterly data
q1_sales, q2_sales, q3_sales, q4_sales = np.split(monthly_sales, 4)
print(q1_sales)
# Stack the four quarterly sales arrays
quarterly_sales = np.stack([q1_sales, q2_sales, q3_sales, q4_sales])
print(quarterly_sales)
Perhaps you’d like to better understand Monet’s use of the color
blue. Your task is to create a version of the Monet
rgb_array that emphasizes parts of the painting that use
lots of blue by making them even bluer! You’ll perform the splitting
portion of this task in this exercise and the stacking portion in the
next.
numpy is loaded as np, and the Monet
rgb_array is available.
rgb_array into red, green, and blue
only pixel data; save the results as as red_array,
green_array, and blue_array.emphasized_blue_array, which replaces
blue_array values with 255 if they are higher than the mean
value of blue_array; otherwise, the value remains the
same..shape of
emphasized_blue_array.emphasized_blue_array to remove the trailing
third dimension; save as emphasized_blue_array_2D.# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Create emphasized_blue_array
emphasized_blue_array = np.where(blue_array > blue_array.mean(), 255, blue_array)
# Print the shape of emphasized_blue_array
print(emphasized_blue_array.shape)
# Split rgb_array into red, green, and blue arrays
red_array, green_array, blue_array = np.split(rgb_array, 3, axis=2)
# Create emphasized_blue_array
emphasized_blue_array = np.where(blue_array > blue_array.mean(), 255, blue_array)
# Print the shape of emphasized_blue_array
print(emphasized_blue_array.shape)
# Remove the trailing dimension from emphasized_blue_array
emphasized_blue_array_2D = emphasized_blue_array.reshape((675, 843)) # edited/added
Now you’ll combine red_array, green_array,
and emphasized_blue_array_2D to see what Monet’s painting
looks like with the blues emphasized!
numpy is loaded as np, and the
red_array, green_array,
blue_array and emphasized_blue_array_2D
objects that you created in the last exercise are available.
blue_array and
emphasized_blue_array_2D.red_array and green_array so that
they can be stacked with emphasized_blue_array_2D.red_array_2D, green_array_2D, and
emphasized_blue_array_2D together (in that order) into a 3D
array called emphasized_blue_monet.# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Reshape red_array and green_array
red_array_2D = red_array.reshape((675, 843)) # edited/added
green_array_2D = green_array.reshape((675, 843)) # edited/added
# Print the shapes of blue_array and emphasized_blue_array_2D
print(blue_array.shape, emphasized_blue_array_2D.shape)
# Reshape red_array and green_array
red_array_2D = red_array.reshape((675, 843)) # edited/added
green_array_2D = green_array.reshape((675, 843)) # edited/added
# Stack red_array_2D, green_array_2D, and emphasized_blue_array_2D
emphasized_blue_monet = np.stack([red_array_2D, green_array_2D, emphasized_blue_array_2D], axis=2)
plt.imshow(emphasized_blue_monet)
plt.show()
You made it!
You’re now a card-carrying member of Team NumPy. I hope I’ve convinced you that NumPy is amazing.
It leverages the immense calculating power of C while keeping the friendly syntax of Python. In some places, it even makes Python syntax simpler by vectorizing operations!
What’s next on your NumPy journey? You could check out one of the many libraries built on top of NumPy’s API. Perhaps you’d like to explore data analysis with pandas, data visualization with Seaborn, parallel programming with Dask, machine learning with scikit-learn, or deep learning with TensorFlow. Every one of these libraries is built on top of NumPy, and DataCamp has courses dedicated to all of them.
No matter where you take your NumPy knowledge, thanks for taking this course all the way to the end. I’m Izzy Weber, and on behalf of all of us at DataCamp, congratulations on your wide new array of NumPy skills!
Thanks again!