Introduction

Proportion of pixels that are exactly 0

## [1] 0.8137054

\[ \text{Approximately 81.37% of pixels are exactly 0} \]

Plot 4 observations (images) in a 2x2 plot grid

In which corners does sparsity appear highter?

\[ \text{Sparsity appears higher when} \ .25 < x < .75 \]

Question 3

Explain k-means clustering

\[ \text{K-means clustering takes a set of data and randomly assigns each point to one of k groups. K centroids are generated at the} \\ \text{mean of each group. Every point is reassigned to the closest* centroid. A new mean is taken, and the process repeats until the} \\ \text{clusters stop changing.} \\ \text{*closest usually means Euclidian distance.} \]

Explain how we would choose k?

\[ \text{It's easy to choose k with this data set. There are 10 numbers so we expect to be able to divide the data into 10 clusters.} \\ \text{If we don't have a way to predetermine k then we must utilize the withinss of the models. The total sum of squares within each} \\ \text{decreases with each additional k, until k = n and the sum of squares is zero. We are looking for the "elbow point" at which} \\ \text{additional k's cease to significantly decrease the total sum of squares.} \]

## [1] 4 2 6 0
## [1] 8 0 8 3
## [1] 9 7 3 1
## [1] 1 9 6 4

\[ \text{The modes mostly correspond with the images, but there are problems. My assessment of box [4,4] is that it could be a 1, 7, or 9,} \\ \text{but 4 occurs the most in that cluster. There are no clusters with a mode of 5. My intuition is that 5 should cluster into} \\ \text{box [2,4] or [3,3], but both of those boxes are mode 3.} \]