Assignment 4

Author

Connor Perfetto

Go to the shared posit.cloud workspace for this class and open the assign04 project. Open the assign04.qmd file and complete the exercises.

We will be using pay-per-click (PPC) data from a 31 day campaign from a company that sells USB keys and USB hubs. Each row of the 555 observations represents a click on an internet ad based on a keyword search and there are 3 columns.

In this assignment you will be examining each column for data validity. Each exercise presents one or more questions for you to answer.

We’ll start by loading the tidyverse family of packages along with the janitor and skimr packages, and our data.

library(tidyverse)
library(janitor)
library(skimr)
ppc_data <- read_csv("https://jsuleiman.com/datasets/ppc_data.csv")
glimpse(ppc_data)
Rows: 555
Columns: 3
$ day     <dbl> 9, 30, 19, 4, 30, 17, 5, 8, 17, 21, 4, 13, 29, 25, 25, 7, 4, 5…
$ keyword <chr> "usb", "usb hub", "usb hub", "usb key", "key", "usb hub", "hub…
$ price   <dbl> 5.9, 8.0, 2.8, 7.7, 1.7, 5.5, 5.2, 3.8, 6.0, 2.0, 9.7, 7.0, 8.…

Exercises

There are six exercises in this assignment. The Grading Rubric is available at the end of this document.

Exercise 1

Create a graph of number of clicks (i.e., observations) for each day (1-31). Use geom_bar() for your geometry. In the narrative below your code note which days had zero clicks.

library(ggplot2)
library(dplyr)
data <- read.csv("https://jsuleiman.com/datasets/ppc_data.csv")
clicks_per_day <- data %>%
  group_by(day) %>%
  summarise(clicks = n())
ggplot(clicks_per_day, aes(x = factor(day), y = clicks)) +
  geom_bar(stat = "identity", fill = "maroon") +
  labs(title = "Clicks per Day", x = "Day of the Month", y = "Number of Clicks") +
  theme_minimal()

all_days <- data.frame(day = 1:31)
clicks_full <- left_join(all_days, clicks_per_day, by = "day")
clicks_full[is.na(clicks_full$clicks), ]
   day clicks
1    1     NA
27  27     NA

The first day of the month was the one day with no clicks.

Exercise 2

Insert a code cell to show how many NA (i.e., missing) values there are in price. In the narrative below that code cell write out how many NA values there are for price and what percent of the observations that represents.

num_na_price <- sum(is.na(data$price))
total_observations <- nrow(data)
percent_na_price <- (num_na_price / total_observations) * 100
num_na_price
[1] 6
percent_na_price
[1] 1.081081

There are six NA values for price which represents approximately 1.08% of observations.

Exercise 3

Valid values for price are 0.1 or greater. Insert a code cell that displays the number of values of price that are less than 0.1. In the narrative below that code cell write how many values are below 0.1.

num_invalid_price <- sum(data$price < 0.1, na.rm = TRUE)
invalid_price_rows <- data[data$price < 0.1, ]
num_invalid_price
[1] 10
invalid_price_rows
     day keyword price
18     5     key     0
79     4     key     0
101   18     ubs     0
NA    NA    <NA>    NA
NA.1  NA    <NA>    NA
NA.2  NA    <NA>    NA
220    6 usb key     0
NA.3  NA    <NA>    NA
301   19     key     0
309   14 usb hub     0
345   16 usb key     0
365   23 usb key     0
NA.4  NA    <NA>    NA
421   10     hub     0
426   19 usb key     0
NA.5  NA    <NA>    NA

There are 6 values below .1

Exercise 4

Insert a code cell that drops all of the rows that contain invalid or NA values for price.

cleaned_data <- data[!is.na(data$price) & data$price >= 0.1, ]
cleaned_data
    day keyword price
1     9     usb   5.9
2    30 usb hub   8.0
3    19 usb hub   2.8
4     4 usb key   7.7
5    30     key   1.7
6    17 usb hub   5.5
7     5     hub   5.2
8     8     key   3.8
9    17 usb key   6.0
10   21 usb key   2.0
11    4     ubs   9.7
12   13     ubs   7.0
13   29     usb   8.3
14   25     key   9.3
15   25 usb hub   3.5
16    7 usb key   6.4
17    4 usb key   8.5
19    4     hub   0.4
20   23 usb hub   8.3
21   20     key   5.9
22    7     hub   6.9
23    2     hub   8.1
24   15 usb key   2.8
25    8     usb   2.5
26   18 usb hub   8.7
27   21     hub   0.8
28   20 usb hub   5.7
29   26 usb key   0.1
30   22     hub   8.5
31    5 usb key   0.8
32   12 usb key   1.1
33    3     hub   7.7
34   20 usb key   7.4
35    4 usb hub   8.0
36   24     key   7.1
37   23 usb key   1.5
38   15 usb hub   6.7
39   26     hub   7.2
40   30 usb key   5.9
41    9 usb hub   5.5
42   13     hub   8.1
43   14 usb key   2.2
44   28 usb hub   7.2
45    3 ubs key   6.4
46   13     usb   2.6
47   13     usb   1.8
48    5     key   4.2
49   19     usb   7.9
50   13     key   7.9
51   23     hub   6.9
52   16 usb hub   6.3
53    5     key   0.8
54   11 usb hub   5.1
55   13 usb key   9.8
56   15 usb key   7.2
57   24     key   6.3
58   15     key   2.1
59   28     key   7.1
60   19 usb key   4.1
61   21     key   1.5
62    2 usb key   5.3
63   22 usb hub   2.1
64   13 usb key   3.2
65   25     key   0.9
66   14     hub   5.2
67   16     usb   9.6
68    5     hub   2.6
69   13 usb key   4.0
70   28     key   7.6
71   18     hub   4.1
72    3 usb key   6.8
73   21     usb   0.1
74   25 usb hub   1.8
75    2     key   8.2
76    9     hub   2.6
77   11 usb key   4.1
78   18     key   9.4
80   28 ubs key   3.3
81   23 usb hub   6.5
82   17     ubs   5.4
83    5 usb key   7.8
84    3 ubs key   0.4
85   17 usb hub   1.7
86   14 usb hub   2.4
87   12 usb key   0.9
88   18     key   3.4
89   30 usb key   1.3
90   30     hub   1.2
91   30     key   4.9
92   18     hub   4.7
93   31 usb hub   1.3
94   14 usb hub   1.3
95    3 usb hub   4.9
96   21     hub   0.7
97   19 usb key   5.4
98   29     usb   3.7
99   13     usb   9.4
100  19 usb key   8.7
102  19 usb hub   4.1
103   9 usb hub   0.7
104  23     usb   5.1
105  26 usb key   3.4
106  12 usb hub   7.1
107  19     key   6.8
108  16 usb hub   6.9
109  19     usb   8.8
110  28 usb key   5.6
111  21 usb hub   0.1
112   6 usb key   2.1
113  14     hub   2.5
114  18     key   0.2
115  24     usb   1.2
116  30 usb key   3.9
117  25 usb key   0.2
118   6 usb hub   9.1
119  12 usb key   9.9
120  17     usb   9.4
121  30     hub   5.6
122   2 usb hub   7.1
123  28     usb   4.1
124   3     key   2.6
125  30     key   1.9
126  23 usb hub   0.9
127   4     key   8.4
128  22 usb hub   2.4
129  10 usb hub   8.6
130   3 usb hub   0.1
131  12     hub   0.6
132   8 usb key   6.1
133   5 usb key   9.0
134  17 usb key   3.1
135  24 usb hub   7.4
136  22     hub   0.6
137  15     hub   0.9
138  17     hub   0.4
139  21     key   5.4
140   4     usb   8.5
141  20     key   3.0
142  21 usb key   5.5
143  30     hub   1.9
144  10     key   4.3
145   5     usb   0.6
146  18     usb   8.6
147  14     ubs   6.7
148  24 usb hub   1.0
149  24 usb hub   9.6
150  22 usb hub   3.2
152  12 usb key   2.2
153   7 usb hub   3.4
154  19     hub   3.0
155  25     ubs   7.2
156  22 usb hub   2.6
157   8 ubs key   3.4
158  20     hub   9.0
159  29 usb hub   4.1
160   9     usb   3.0
161  17     hub   0.8
162  18 usb key   9.8
163   5 ubs key   6.7
164   7     key   7.5
165   5     hub   2.8
166  23 usb hub   3.7
167  31     usb   2.6
168  30     key   8.2
169   7     hub  10.0
170  22     usb  10.0
171  10     usb   4.6
172   5 usb hub   3.6
174  22     usb   7.5
175   8 usb key   3.1
176  19 usb key   8.6
177  31 usb key   8.1
178  26     usb   3.2
179  19 usb key   9.6
180  31 usb key   9.1
181  20     hub   7.4
182  24 usb key   4.2
183  29 usb key   0.6
184  28 usb key   1.5
185   7 usb key   9.7
186  13     usb   6.6
187  21 usb key   8.6
188  14 usb key   3.7
189   2 usb key   0.7
190  14 usb key   9.3
191   9     hub   3.6
192  19     usb   4.2
193  31     usb   6.0
195  24     hub   9.2
196  29     usb   8.9
197  14 usb hub   7.8
198  22     usb   3.2
199  19     hub   3.6
200  13 usb key   6.2
201  11     key   9.4
202  13     hub   4.3
203  13 usb key   9.6
204  12     hub   4.7
205   2     usb   1.7
206  29     key   4.7
207  20     key   6.1
208   6     hub   5.9
209   2 usb hub   7.7
210  22 usb key   7.3
211  11 usb hub   4.6
212  12     key   1.9
213  13 usb hub   1.4
214  23     hub   1.5
215  24 usb key   4.7
216   4 usb key   5.2
217  28 usb hub   8.7
218  20     usb   4.7
219  20 usb key   5.5
221  16 usb key   0.5
222   4     hub   7.2
223  13     usb   1.5
224   4 usb key   0.7
225  19 usb key   3.0
226   8     hub   2.5
227   8     key   5.8
228  26 usb key   3.3
229   7     usb   4.6
230   8 usb key   0.4
231  13 usb hub   6.2
232  30 usb hub   9.1
233  14     usb   9.3
235  20     key   6.1
236   2 usb key   7.2
237   7 usb key   1.1
238  13 usb hub   8.6
239  10     usb   3.2
240  31 usb hub   7.2
241  21 ubs key   6.5
242   8 usb hub   9.7
243  13     hub   4.6
244  20     key   9.1
245   2     key   5.5
246  19 usb hub   8.7
247  23     key   0.5
248  15 usb key   2.7
249  20 usb key   4.9
250  31 usb key   8.0
251   2     usb   9.5
252  23     key   9.7
253   9     key   6.7
254  31     key   4.2
255   8 usb hub   2.7
256  14     usb   4.3
257  20     usb   7.8
258  19     key   4.6
259  17 usb key   1.5
260  13 usb hub   4.7
261  24     hub   3.7
262  13     usb   5.8
263  14 usb key   6.2
264  26     hub   3.1
265   3     hub   9.2
266  14     usb   1.3
267  24     hub   8.9
268   4 usb key   5.1
269  18     usb   5.5
270  25     key   5.6
271  13     usb   3.6
272  17 usb hub   6.7
273  11     hub   3.7
274  22 usb key   4.8
275  23 usb key   8.2
276  28     key   3.6
277   4 usb hub   5.2
278  14     usb   7.4
279  14     key   6.9
280  18     hub   0.6
281  28 usb hub   4.6
282  28 usb key   8.5
283  12     usb   1.7
284  22 usb hub   3.0
285  26     usb   2.9
286  21 usb hub   8.7
287  31     hub   0.1
288  14     usb   5.0
289   7     ubs   4.0
290  28 usb hub   9.2
291  14 usb hub   0.1
292  19     key   3.0
293  26     key   1.6
294  19     hub   8.1
295  21 usb hub   2.8
296  11 usb key   9.9
297  31     hub   1.7
298   8 usb key   3.1
299  13 usb key   2.4
300  26 usb key   1.3
302  26 usb hub   3.0
303  25 usb hub   1.0
304  29     hub   1.3
305  22 usb key   3.9
306  25 usb hub   6.0
307  22 usb key   8.5
308  31 usb key   8.5
310  29     hub   4.8
311  29     hub   1.0
312  20 usb hub   9.5
313   2     key   0.2
314  23 usb key   3.3
315   6     hub   1.4
316  28     hub   1.5
317  19 usb hub   3.1
318  12 usb key   0.7
319  31     ubs   2.1
320  18 usb hub   0.1
321  23     key   8.3
322   4 usb key   0.1
323  25     key   1.8
324  24 usb key   9.5
325   6 usb hub   6.2
326  14     hub   2.1
327  15 usb key   0.7
328   5     usb   7.0
329  13     key   1.3
330   6 usb key   8.3
331   8     key   9.8
332  17     hub   0.3
333  13 usb hub   4.8
334  11 usb key   5.4
335  21     key   5.9
336  22 usb hub   0.4
337  28     key   0.5
338  25 usb key   1.8
339  30 ubs key   9.1
340  20 usb key   1.3
341  23     key   7.7
342   5 usb key   5.2
343   8 usb key   1.2
344  18 usb key   7.0
346  12 usb hub   7.6
347  11     key   8.4
348  18 usb hub   2.1
349   2     ubs   2.9
350  28 usb key   8.0
351  25 ubs key   8.0
352  24 usb key   3.4
353  24     usb   7.4
354  24 usb key   5.2
355  11     hub   2.2
356  31 usb key   5.2
357  29     usb   1.6
358  26 usb key   2.8
359  16     hub   2.2
360  25 usb hub   8.4
361  28 usb hub   1.7
362  11     usb   6.0
363   4 usb hub   2.2
364  24 usb hub   7.6
366   3     key   7.0
367  25 usb key   6.9
368  30 usb hub   3.8
369  15     hub   1.7
370  24 usb key   5.5
371  12     usb   2.7
372   6     usb   6.6
373   4 usb key   3.2
374  28 usb hub   0.5
375  12     hub   1.9
376  26 usb hub   1.7
377   8     hub   9.2
378   5 usb key   0.7
379  10     key   7.3
380   7 usb hub   4.0
381  31     usb   2.2
382   8 usb key   4.0
383   6 usb key   2.3
384  15 usb hub   4.9
385  13 usb hub   6.2
386  13 usb hub   5.2
387  13 usb hub   5.6
388  13 usb hub   5.7
389  24 usb hub   7.8
390  29 usb hub   1.9
391  19 usb hub   4.5
393  16     usb   5.7
394   8     hub   7.8
395   2     hub   5.7
396  29     usb   3.1
397  30 usb key   6.2
398  21 usb key   2.1
399  28     hub   5.6
400  15 usb key   8.1
401  17 usb key   6.6
402  15     hub   4.5
403  17     hub   5.6
404  14 usb hub   3.1
405  26     hub   8.0
406   9 usb key   5.2
407  21 usb key   8.1
408  19     usb   6.5
409  15     hub   5.2
410  25     key   6.5
411  11 usb key   7.6
412  12     hub   3.4
413  16 usb hub   9.4
414   3     key   4.5
415  18     key   0.8
416  23 usb key   4.8
417   5     key   9.5
418   7 usb key   4.2
419  20     hub   8.4
420  22     hub   8.9
422   5 usb key   6.9
423  11     usb   3.2
424   9 ubs key   3.5
425  25 usb hub   6.7
427  15 usb key   1.3
428  17     usb   1.1
430  23 usb key   4.7
431   4 usb hub   9.9
432   9 usb key   8.6
433   8     hub   9.8
434   3 usb key   1.1
435  11     hub   2.9
436  14 usb key   3.3
437   9 usb hub   0.5
438  18     key   9.5
439   7     hub   2.6
440  22     key   0.8
441  17     hub   0.2
442  23     key   4.7
443   6 usb key   5.0
444  10 usb hub   2.1
445  30     key   6.5
446   2 usb hub   2.1
447  17     key   1.0
448  11 usb key   9.9
449  15     usb   9.9
450   7 usb key   0.1
451   2     key   6.5
452  20 usb key   2.6
453  13 usb key   3.4
454  11 usb key   0.7
455  10 usb hub   7.8
456  10     ubs   0.7
457  23 usb key   5.9
458   9 usb key   9.7
459  21 usb hub   7.7
460   8 usb key   4.7
461  23 usb key   6.3
462  11     hub   2.3
463   2 usb hub   4.2
464  15     hub   3.4
465  10     hub   8.3
466  13     usb   1.4
467  13     usb   4.0
468  22 usb hub   8.5
469  28     usb   5.8
470  18 usb hub   7.7
471  14     key   4.7
472  20     key   3.5
473  20     key   0.9
474  19     usb   8.2
475  29     usb   3.7
476   5 usb key   1.2
477  25     ubs   7.3
478  11 usb hub   4.0
479   2     hub   7.3
480  13 usb hub   1.8
481  13 usb hub   3.1
482  25 usb key   1.7
483  17 usb hub   7.7
484  20     usb   6.3
485  17 usb key   1.5
486   9     key   4.8
487  11 ubs key   3.5
488   5 usb hub   6.9
489  26 usb hub   0.4
490  22     usb   2.2
491   3 usb hub   2.3
492   5 usb key   7.3
493   6 usb hub   5.7
494   9 usb key   8.1
495  14     usb   5.3
496   5 usb key   0.2
497  31     key   5.3
498   7     hub   2.6
499  17     usb   5.8
500   7     key   6.5
501  28 usb key   0.8
502  13     key   0.3
503  31 usb hub   2.2
504  22 usb key   6.4
505  21 usb hub   9.8
506  13     hub   3.6
507  11     key   2.9
508  20 usb hub   6.2
509  13 usb key  10.0
510  25     hub   6.0
511  23 usb hub   4.6
512  29 usb key   3.0
513   2 usb key   4.5
514  26     key   5.2
515  13     key   0.5
516   8     hub   7.3
517  29     key   3.6
518  26 usb key   6.9
519  10 usb key   7.6
520  18 usb hub   7.5
521   9 usb key   5.3
522  15 usb hub   9.7
523  22 usb key   8.7
524  29 usb key   2.3
525  24 usb key   0.1
526  20 usb key   6.4
527   9     usb   1.3
528  14 usb key   0.6
529   4 usb key   3.7
530   2 usb key   1.2
531   6     hub   7.5
532  31     key   3.0
533  26 usb key   0.8
534  30     usb   4.0
535   2 usb key   3.3
536  29 usb key   2.1
537  30 usb key   7.8
538  25     usb   4.7
539   7     key   3.9
540  19 usb key   3.7
541   2     usb   6.6
542   9     key   8.7
543  19 usb key   9.7
544  25     hub   4.6
545   6     hub   2.5
546  21 usb hub   0.9
547  22     usb   8.6
548  19     hub   4.2
549   7     hub   8.0
550   5 usb hub   6.5
551  22 usb hub   8.3
552  15 usb key   9.3
553  12     hub   1.0
554  22 usb hub   2.6
555  21 usb key   6.0

Exercise 5

Insert a code cell that shows a tabyl of the counts of each keyword. In the narrative below the code cell, list the misspellings and counts if there are any.

library(janitor)
keyword_counts <- tabyl(data$keyword)
keyword_counts
 data$keyword   n    percent
          hub  90 0.16216216
          key  85 0.15315315
          ubs  11 0.01981982
      ubs key  10 0.01801802
          usb  75 0.13513514
      usb hub 121 0.21801802
      usb key 163 0.29369369

There is one misspelling of usb being spelled as “Ubs”.

Exercise 6

Insert a code cell that corrects all the misspellings for keyword, then rerun tabyl to verify.

data$keyword <- tolower(data$keyword) 
data$keyword[data$keyword == "usb hbu"] <- "usb hub"
data$keyword[data$keyword == "usbhub"] <- "usb hub"
data$keyword[data$keyword == "usbkey"] <- "usb key"
keyword_counts_corrected <- tabyl(data$keyword)
keyword_counts_corrected
 data$keyword   n    percent
          hub  90 0.16216216
          key  85 0.15315315
          ubs  11 0.01981982
      ubs key  10 0.01801802
          usb  75 0.13513514
      usb hub 121 0.21801802
      usb key 163 0.29369369

Submission

To submit your assignment:

  • Change the author name to your name in the YAML portion at the top of this document
  • Render your document to html and publish it to RPubs.
  • Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
  • Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.

Grading Rubric

Item
(percent overall)
100% - flawless 67% - minor issues 33% - moderate issues 0% - major issues or not attempted
Narrative: typos and grammatical errors
(7%)
Document formatting: correctly implemented instructions
(7%)

Exercises

(13% each)

Submitted properly to Brightspace

(8%)

NA NA You must submit according to instructions to receive any credit for this portion.