Q1) SWIRL

a.

Tell me the two best things you learned in module 5, and how you could apply them 1. NA is a placeholder for a missing value. This is good to know because applying functions to a dataset including NAs can quickly go awry, leading to a whole set of NAs. 2. NaN is “not a number”, which is the result of applying a mathematical operation that results in an unreal number. In case I get this result, now I know what it means!

b.

What was the last question in module 6

> Inf-Inf
[1] NaN

Q2) Data frames

a.

What does fake.vals[c(2,3),1:2] return?

> fake.vals[c(2,3),1:2]
  x y
2 2 2
3 3 1

b.

What is the value of y in the second row of fake.vals[c(2,3),1:2]

1

Q3) Cracking seeds

a.

What is the mode of the frequency distribution?

mode = 12-13mm

b.

Estimate, by eye, the fraction of birds whose measurements are in the interval representing the mode.

0.5

c.

There is a hint of a second peak in the frequency distribution between 15 and 16 mm. What strategy would you recommend be used to explore more fully the possibility of a second peak?

I would decrease the intervals of the histogram to better visualize the trend in that range.

d.

What name is given to a frequency distribution having two distinct peaks?

Bimodal.

Q4) Do not pass go, do not collect $200

a.

What type of table is this?

Frequency distribution table.

b.

How many variable are presented in this table?

2

c.

How many boys had exactly two convictions by the end of the study?

21

d.

What fraction of boys had no convictions? [useR]

> sum(freq)
[1] 395
> 265/395
[1] 0.6708861

e.

What is the appropriate graph for these data

Histogram

f.

WITH R Display the frequency distribution in a graph

g.

Describe the shape of the frequency distribution. Is it skewed or symmetric? Is it unimodal or bimodal? Where is the mode in the number of criminal convictions? Are there outliers in the number of convictions?

Skewed. Bimodal. Mode = 0. Extreme values, but no outliers.

h.

Does the sample of boys in the study represent a random sample of British boys? Why or why not?

No, this study took only boys from 6 schools in northern London, so it cannot be extrapolated to provide information about all boys in England.

Q5) Sneaky f*$%@rs

a.

Display these results in a table [perhaps use data.frame with appropriate row and column names] Identify the type of table you used.

> sneaky <- data.frame(ate.eggs, no.eating, row.names = 0:2)
> sneaky
  ate.eggs no.eating
0       61       389
1       18        17
2       16        14

b.

Illustrate the same result using a graphical technique instead. Identify the type of graph you used.

mosaicplot(sneaky, main="Number of Sneaky Males")

Q6) Over the bar

In Poland, students are required to achieve a score of 21 or higher on the high-school Polish language “maturity exam” to be eligible for university. The following graph shows the frequency distribution of scores (Freakonomics 2011).

a.

Examine the graph and identify the most conspicuous pattern in these data.

Skewed normal distribution (with an area of variance)

b.

Generate a hypothesis to explain the pattern

The sharp decrease followed by a sharp increase in the 17-22 region could indicate that the test contains a set of questions in which if you get one you will typically get them all (so either 17 or 21).

Q7) Table types

What is the difference between a data table and a display table? When would you use one over the other?

A data table typically contains all data and is used for analysis and interpretation of the study whereas a display table finalizes the significant results and make them visually understandable to present to an audience.