Building an image retrieval system with deep features

Fire up GraphLab Create

In [243]:
import graphlab

Load the CIFAR-10 dataset

We will use a popular benchmark dataset in computer vision called CIFAR-10.

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

This dataset is already split into a training set and test set. In this simple retrieval example, there is no notion of "testing", so we will only use the training data.

In [244]:
image_train = graphlab.SFrame('image_train_data/')

Computing deep features for our images

The two lines below allow us to compute deep features. This computation takes a little while, so we have already computed them and saved the results as a column in the data you loaded.

(Note that if you would like to compute such deep features and have a GPU on your machine, you should use the GPU enabled GraphLab Create, which will be significantly faster for this task.)

In [245]:
#deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
#image_train['deep_features'] = deep_learning_model.extract_features(image_train)
In [246]:
image_train.head()
Out[246]:
id image label deep_features image_array
24 Height: 32 Width: 32 bird [0.242871761322,
1.09545373917, 0.0, ...
[73.0, 77.0, 58.0, 71.0,
68.0, 50.0, 77.0, 69.0, ...
33 Height: 32 Width: 32 cat [0.525087952614, 0.0,
0.0, 0.0, 0.0, 0.0, ...
[7.0, 5.0, 8.0, 7.0, 5.0,
8.0, 5.0, 4.0, 6.0, 7.0, ...
36 Height: 32 Width: 32 cat [0.566015958786, 0.0,
0.0, 0.0, 0.0, 0.0, ...
[169.0, 122.0, 65.0,
131.0, 108.0, 75.0, ...
70 Height: 32 Width: 32 dog [1.12979578972, 0.0, 0.0,
0.778194487095, 0.0, ...
[154.0, 179.0, 152.0,
159.0, 183.0, 157.0, ...
90 Height: 32 Width: 32 bird [1.71786928177, 0.0, 0.0,
0.0, 0.0, 0.0, ...
[216.0, 195.0, 180.0,
201.0, 178.0, 160.0, ...
97 Height: 32 Width: 32 automobile [1.57818555832, 0.0, 0.0,
0.0, 0.0, 0.0, ...
[33.0, 44.0, 27.0, 29.0,
44.0, 31.0, 32.0, 45.0, ...
107 Height: 32 Width: 32 dog [0.0, 0.0,
0.220677852631, 0.0, ...
[97.0, 51.0, 31.0, 104.0,
58.0, 38.0, 107.0, 61.0, ...
121 Height: 32 Width: 32 bird [0.0, 0.23753464222, 0.0,
0.0, 0.0, 0.0, ...
[93.0, 96.0, 88.0, 102.0,
106.0, 97.0, 117.0, ...
136 Height: 32 Width: 32 automobile [0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 7.5737862587, 0.0, ...
[35.0, 59.0, 53.0, 36.0,
56.0, 56.0, 42.0, 62.0, ...
138 Height: 32 Width: 32 bird [0.658935725689, 0.0,
0.0, 0.0, 0.0, 0.0, ...
[205.0, 193.0, 195.0,
200.0, 187.0, 193.0, ...
[10 rows x 5 columns]

Train a nearest-neighbors model for retrieving images using deep features

We will now build a simple image retrieval system that finds the nearest neighbors for any image.

Use image retrieval model with deep features to find similar images

Let's find similar images to this car picture.

In [247]:
knn_model = graphlab.nearest_neighbors.create(image_train,features=['deep_features'],
                                             label='id')
Starting brute force nearest neighbors model training.
In [248]:
graphlab.canvas.set_target('ipynb')

We are going to create a simple function to view the nearest neighbors to save typing:

In [249]:
def get_images_from_ids(query_result):
    return image_train.filter_by(query_result['reference_label'],'id')

Very cool results showing similar cats.

Finding similar images to a car

In [250]:
car = image_train[8:9]
car['image'].show()
In [251]:
get_images_from_ids(knn_model.query(car))['image'].show()
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 14.723ms     |
| Done         |         | 100         | 155.886ms    |
+--------------+---------+-------------+--------------+

Just for fun, let's create a lambda to find and show nearest neighbor images

In [252]:
show_neighbors = lambda i: get_images_from_ids(knn_model.query(image_train[i:i+1]))['image'].show()
In [289]:
show_neighbors(83)
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 17.145ms     |
| Done         |         | 100         | 168.98ms     |
+--------------+---------+-------------+--------------+
In [254]:
show_neighbors(24)
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 8.443ms      |
| Done         |         | 100         | 181.955ms    |
+--------------+---------+-------------+--------------+
In [255]:
knn_model.query(cat)
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.0498753   | 17.518ms     |
| Done         |         | 100         | 171.584ms    |
+--------------+---------+-------------+--------------+
Out[255]:
query_label reference_label distance rank
0 494 0.0 1
0 3410 32.1222653422 2
0 3514 32.3677991738 3
0 47820 32.524424589 4
0 46729 32.9860859525 5
[5 rows x 4 columns]

Split the SFrame with the training data into 4 different SFrames. Each of these will contain data for 1 of the 4 categories above. Hint: if you use a logical filter to select the rows where the ‘label’ column equals ‘dog’, you can create an SFrame with only the data for images labeled ‘dog’.

In [256]:
image_train['label'].sketch_summary()
Out[256]:
+------------------+-------+----------+
|       item       | value | is exact |
+------------------+-------+----------+
|      Length      |  2005 |   Yes    |
| # Missing Values |   0   |   Yes    |
| # unique values  |   4   |    No    |
+------------------+-------+----------+

Most frequent items:
+-------+------------+-----+-----+------+
| value | automobile | cat | dog | bird |
+-------+------------+-----+-----+------+
| count |    509     | 509 | 509 | 478  |
+-------+------------+-----+-----+------+

Accuracy of predicting dog in the test data: Using the work you did in this question, what is the accuracy of the 1-nearest neighbor classifier at classifying ‘dog’ images from the test set?

Similarly to the image retrieval notebook you downloaded, you are going to create a nearest neighbor model using the 'deep_features' as the features, but this time create one such model for each category, using the corresponding subset of the training_data. You can call the model with the ‘dog’ data the dog_model, the one with the ‘cat’ data the cat_model, as so on.

In [257]:
dog_train=image_train[image_train['label']=='dog']
cat_train=image_train[image_train['label']=='cat']
bird_train=image_train[image_train['label']=='bird']
automobile_train=image_train[image_train['label']=='automobile']
In [258]:
dog_knn_model = graphlab.nearest_neighbors.create(dog_train,features=['deep_features'],
                                             label='id')
cat_knn_model = graphlab.nearest_neighbors.create(cat_train,features=['deep_features'],
                                             label='id')
bird_knn_model = graphlab.nearest_neighbors.create(bird_train,features=['deep_features'],
                                             label='id')
automobile_knn_model = graphlab.nearest_neighbors.create(automobile_train,features=['deep_features'],
                                             label='id')
Starting brute force nearest neighbors model training.
Starting brute force nearest neighbors model training.
Starting brute force nearest neighbors model training.
Starting brute force nearest neighbors model training.

You now have a nearest neighbors model that can find the nearest ‘dog’ to any image you give it, the dog_model; one that can find the nearest ‘cat’, the cat_model; and so on.

Using these models, answer the following questions. The cat image below is the first in the test data:

You can access this image, similarly to what we did in the iPython notebooks above, with this command:

image_test[0:1]

What is the nearest ‘cat’ labeled image in the training data to the cat image above (the first image in the test data)?

In [259]:
image_test = graphlab.SFrame('image_test_data/')
mycat=image_test[0:1]
In [260]:
image_test[0:1]['image'].show()
In [261]:
mycat_neighbors = get_images_from_ids(cat_knn_model.query(mycat))
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 9.218ms      |
| Done         |         | 100         | 51.393ms     |
+--------------+---------+-------------+--------------+
In [262]:
mycat_neighbors['image'].show()
In [263]:
mycat_neighbors['id']
Out[263]:
dtype: int
Rows: 5
[331, 16289, 25713, 32139, 45646]
In [264]:
mycat_neighbors
Out[264]:
id image label deep_features image_array
331 Height: 32 Width: 32 cat [0.0, 0.0,
0.510963916779, 0.0, ...
[45.0, 65.0, 92.0, 72.0,
95.0, 110.0, 106.0, ...
16289 Height: 32 Width: 32 cat [0.964287519455, 0.0,
0.0, 0.0, 1.12515509129, ...
[215.0, 219.0, 231.0,
215.0, 219.0, 232.0, ...
25713 Height: 32 Width: 32 cat [0.536971271038, 0.0,
0.0, 0.0894458889961, ...
[228.0, 222.0, 236.0,
224.0, 213.0, 222.0, ...
32139 Height: 32 Width: 32 cat [1.29409468174, 0.0, 0.0,
0.513800263405, ...
[217.0, 220.0, 205.0,
221.0, 227.0, 218.0, ...
45646 Height: 32 Width: 32 cat [0.983677506447, 0.0,
0.0, 0.0, 0.0, ...
[51.0, 42.0, 26.0, 56.0,
47.0, 31.0, 59.0, 50.0, ...
[5 rows x 5 columns]
In [265]:
cat_knn_model.query(mycat)
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 10.653ms     |
| Done         |         | 100         | 69.039ms     |
+--------------+---------+-------------+--------------+
Out[265]:
query_label reference_label distance rank
0 16289 34.623719208 1
0 45646 36.0068799284 2
0 32139 36.5200813436 3
0 25713 36.7548502521 4
0 331 36.8731228168 5
[5 rows x 4 columns]

What is the nearest ‘dog’ labeled image in the training data to the cat image above (the first image in the test data)?

In [266]:
dog_knn_model.query(mycat)
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 12.725ms     |
| Done         |         | 100         | 69.55ms      |
+--------------+---------+-------------+--------------+
Out[266]:
query_label reference_label distance rank
0 16976 37.4642628784 1
0 13387 37.5666832169 2
0 35867 37.6047267079 3
0 44603 37.7065585153 4
0 6094 38.5113254907 5
[5 rows x 4 columns]

For the first image in the test data (image_test[0:1]), which we used above, compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘cat’ in the training data (similarly to what you did in the previous question). Save this result.

Similarly, for the first image in the test data (image_test[0:1]), which we used above, compute the mean distance between this image at its 5 nearest neighbors that were labeled ‘dog’ in the training data (similarly to what you did in the previous question). Save this result.

On average, is the first image in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data? (In a later course, we will see that this is an example of what is called a k-nearest neighbors classifier, where we use the label of neighboring points to predict the label of a test point.)

In [267]:
cat_knn_model.query(mycat)['distance'].mean()
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 8.181ms      |
| Done         |         | 100         | 62.731ms     |
+--------------+---------+-------------+--------------+
Out[267]:
36.15573070978294
In [268]:
 get_images_from_ids(dog_knn_model.query(mycat))['image'].show()
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 8.748ms      |
| Done         |         | 100         | 52.204ms     |
+--------------+---------+-------------+--------------+
In [269]:
cat_knn_model.query(mycat)['distance']
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 9.676ms      |
| Done         |         | 100         | 56.706ms     |
+--------------+---------+-------------+--------------+
Out[269]:
dtype: float
Rows: 5
[34.62371920804245, 36.00687992842462, 36.52008134363789, 36.754850252057054, 36.87312281675268]
In [270]:
dog_knn_model.query(mycat)['distance'].mean()
Starting pairwise querying.
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 0            | 1       | 0.196464    | 7.304ms      |
| Done         |         | 100         | 49.865ms     |
+--------------+---------+-------------+--------------+
Out[270]:
37.77071136184157

Spliting test data by label: Above, you split the train data SFrame into one SFrame for images labeled ‘dog’, another for those labeled ‘cat’, etc. Now, do the same for the test data. You can call the resulting SFrames

image_test_cat, image_test_dog, image_test_bird, image_test_automobile

In [271]:
image_test_dog=image_test[image_test['label']=='dog']
image_test_cat=image_test[image_test['label']=='cat']
image_test_bird=image_test[image_test['label']=='bird']
image_test_automobile=image_test[image_test['label']=='automobile']

Finding nearest neighbors in the training set for each part of the test set: Thus far, we have queried, e.g.,

dog_model.query()

our nearest neighbors models with a single image as the input, but you can actually query with a whole set of data, and it will find the nearest neighbors for each data point. Note that the input index will be stored in the ‘query_label’ column of the output SFrame.

In [272]:
dog_query=dog_knn_model.query(image_test_dog)
dog_query
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 63000   | 12.3772     | 288.29ms     |
| Done         | 509000  | 100         | 390.308ms    |
+--------------+---------+-------------+--------------+
Out[272]:
query_label reference_label distance rank
0 49803 33.4773590373 1
0 21235 34.415221599 2
0 23803 34.8138630061 3
0 41752 34.9289313468 4
0 13865 37.1546409194 5
1 5755 32.8458495684 1
1 38013 35.6379572518 2
1 10669 37.0042463585 3
1 11933 37.0051632125 4
1 48566 37.9279031587 5
[5000 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

In [273]:
dog_query[dog_query['query_label']==4]
Out[273]:
query_label reference_label distance rank
4 12089 37.4849250909 1
4 31474 39.202748492 2
4 36093 39.2197656421 3
4 542 39.560248151 4
4 14346 39.6794828182 5
[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [274]:
cat_query=cat_knn_model.query(image_test_cat)
cat_query[cat_query['query_label']==1]
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 64000   | 12.5737     | 380.709ms    |
| Done         | 509000  | 100         | 470.867ms    |
+--------------+---------+-------------+--------------+
Out[274]:
query_label reference_label distance rank
1 13094 33.8680579302 1
1 10883 34.4414068951 2
1 43295 34.8348775045 3
1 6304 34.9159478835 4
1 8302 35.1367177322 5
[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [275]:
bird_query=bird_knn_model.query(image_test_bird)
bird_query[bird_query['query_label']==2]
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 60000   | 12.5523     | 340.315ms    |
| Done         | 478000  | 100         | 557.734ms    |
+--------------+---------+-------------+--------------+
Out[275]:
query_label reference_label distance rank
2 26768 35.806619825 1
2 40225 35.8951118406 2
2 34669 36.1523802473 3
2 16041 37.3203952055 4
2 44181 37.3771969728 5
[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.
In [276]:
automobile_query=automobile_knn_model.query(image_test_automobile)
automobile_query[automobile_query['query_label']==2]
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 63000   | 12.3772     | 308.67ms     |
| Done         | 509000  | 100         | 485.894ms    |
+--------------+---------+-------------+--------------+
Out[276]:
query_label reference_label distance rank
2 41471 35.7511498408 1
2 35086 35.9272826235 2
2 536 36.5601606568 3
2 44415 36.7238115205 4
2 23107 36.82806943 5
[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

Using this knowledge find the closest neighbor in to the dog test data using each of the trained models, e.g.,

dog_cat_neighbors = cat_model.query(image_test_dog, k=1)

finds 1 neighbor (that’s what k=1 does) to the dog test images (image_test_dog) in the cat portion of the training data (used to train the cat_model).

Now, do this for every combination of the labels in the training and test data.

In [277]:
dog_cat_neighbors = cat_knn_model.query(image_test_dog, k=1)
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 63000   | 12.3772     | 328.685ms    |
| Done         | 509000  | 100         | 422.232ms    |
+--------------+---------+-------------+--------------+
In [278]:
dog_cat_neighbors[0]
Out[278]:
{'distance': 36.41960770675437,
 'query_label': 0,
 'rank': 1,
 'reference_label': 33}
In [279]:
dog_bird_neighbors = bird_knn_model.query(image_test_dog, k=1)
dog_bird_neighbors[0]
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 60000   | 12.5523     | 297.33ms     |
| Done         | 478000  | 100         | 378.513ms    |
+--------------+---------+-------------+--------------+
Out[279]:
{'distance': 41.753864730351246,
 'query_label': 0,
 'rank': 1,
 'reference_label': 44658}

Create an SFrame with the distances from ‘dog’ test examples to the respective nearest neighbors in each class in the training data: The ‘distance’ column in dog_cat_neighbors above contains the distance between each ‘dog’ image in the test set and its nearest ‘cat’ image in the training set. The question we want to answer is how many of the test set ‘dog’ images are closer to a ‘dog’ in the training set than to a ‘cat’, ‘automobile’ or ‘bird’. So, next we will create an SFrame containing just these distances per data point. The goal is to create an SFrame called dog_distances with 4 columns:

i. dog_distances[‘dog-dog’] ---- storing dog_dog_neighbors[‘distance’]

ii. dog_distances[‘dog-cat’] ---- storing dog_cat_neighbors[‘distance’]

iii. dog_distances[‘dog-automobile’] ---- storing dog_automobile_neighbors[‘distance’]

iv. dog_distances[‘dog-bird’] ---- storing dog_bird_neighbors[‘distance’]

In [280]:
dog_cat_neighbors = cat_knn_model.query(image_test_dog, k=5)
dog_bird_neighbors = bird_knn_model.query(image_test_dog, k=5)
dog_dog_neighbors = dog_knn_model.query(image_test_dog, k=5)
dog_automobile_neighbors = automobile_knn_model.query(image_test_dog, k=5)
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 63000   | 12.3772     | 262.748ms    |
| Done         | 509000  | 100         | 388.716ms    |
+--------------+---------+-------------+--------------+
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 59000   | 12.3431     | 277.665ms    |
| Done         | 478000  | 100         | 378.885ms    |
+--------------+---------+-------------+--------------+
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 63000   | 12.3772     | 273.763ms    |
| Done         | 509000  | 100         | 412.749ms    |
+--------------+---------+-------------+--------------+
Starting blockwise querying.
max rows per data block: 4348
number of reference data blocks: 8
number of query data blocks: 1
+--------------+---------+-------------+--------------+
| Query points | # Pairs | % Complete. | Elapsed Time |
+--------------+---------+-------------+--------------+
| 1000         | 64000   | 12.5737     | 323.01ms     |
| Done         | 509000  | 100         | 361.02ms     |
+--------------+---------+-------------+--------------+
In [281]:
dog_distances = graphlab.SFrame({'dog-dog': dog_dog_neighbors['distance'],
                                 'dog-automobile': dog_automobile_neighbors['distance'],
                                 'dog-bird': dog_bird_neighbors['distance'],
                                 'dog-cat': dog_cat_neighbors['distance']})
In [282]:
dog_distances
Out[282]:
dog-automobile dog-bird dog-cat dog-dog
41.9579761457 41.7538647304 36.4196077068 33.4773590373
44.1437321415 41.8665828436 36.5151634774 34.415221599
44.477613787 41.9679651276 37.0642580438 34.8138630061
45.1042420361 42.2266731821 37.1217725492 34.9289313468
45.286528164 42.330722322 38.2837145836 37.1546409194
46.0021331807 41.3382958925 38.8353268874 32.8458495684
46.3466070395 41.8627183618 38.9588618578 35.6379572518
46.7436666554 42.7126939536 39.0535355661 37.0042463585
47.7992911825 43.6588676158 39.9301919908 37.0051632125
47.9857098244 44.445440652 40.3811553784 37.9279031587
[5000 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
Computing the number of correct predictions using 1-nearest neighbors for the dog class: Now that you have created the SFrame dog_distances, you will learn to use the method .apply() on this SFrame to iterate line by line and compute the number of ‘dog’ test examples where the distance to the nearest ‘dog’ was lower than that to the other classes. You will do this in three steps: i. Consider one row of the SFrame dog_distances. Let’s call this variable row. You can access each distance by calling, for example, row['dog-cat'] which, in example table above, will have value equal to 36.4196077068 for the first row. Create a function starting with def is_dog_correct(row): which returns 1 if the value for row[‘dog-dog’] is lower than that of the other columns, and 0 otherwise. That is, returns 1 if this row is correctly classified by 1-nearest neighbors, and 0 otherwise. ii. Using the function is_dog_correct(row), you can check if 1 row is correctly classified. Now, you want to count how many rows are correctly classified. You could do a for loop iterating through each row and applying the function is_dog_correct(row). This method will be really slow, because the SFrame is not optimized for this type of operation. Instead, we will use the .apply() method to iterate the function is_dog_correct for each row of the SFrame. Read about using the .apply() method here. iii. Computing the number of correct predictions for ‘dog’: You can now call: dog_distances.apply(is_dog_correct) which will return an SArray (a column of data) with a 1 for every correct row and a 0 for every incorrect one. You can call: .sum() on the result to get the total number of correctly classified ‘dog’ images in the test set!
In [283]:
def is_dog_correct(row):
    
    if row['dog-dog']<row['dog-cat'] and row['dog-dog']<row['dog-bird']  and row['dog-dog']<row['dog-automobile']:
        return (1)
    else:
        return (0)
        

dog_distances.apply(is_dog_correct).sum()
Out[283]:
3554

Accuracy of predicting dog in the test data: Using the work you did in this question, what is the accuracy of the 1-nearest neighbor classifier at classifying ‘dog’ images from the test set?

In [284]:
3554./5000
Out[284]:
0.7108