Understanding your data is a critical step in analysis. Describe in words the following data types. Is it numeric? Continious? Bounded at 0?
Don’t yet worry about the distributions, the point is to recognize fundamental characteristics of data.
- Draw 1000 random normal points with mean of 0 and sd of 1
- Draw 1000 random Poisson points lambda=1
- Draw 1000 random binomial points with prob =.5 (coin flip,n=1)
- Create a histogram of each distribution above.
- Bin the histogram into fewer sections (e.g., 5). See ?hist
Using curve(), fit a histogram of your data with a distribution curve
Following the example.
x<-rnorm(100)
head(x)
## [1] 1.74637874 -0.01461965 1.40216756 -0.35058284 0.65140411 -1.50337140
hist(x,prob=TRUE)
curve(dnorm(x,mean=mean(x),sd=sd(x)),add=TRUE,col="red")
What does the prob=TRUE argument do in histogram? Why is it needed?
- Fit questions 2 and 3 to their respective distributions.
- Plot all distributions together, using the add=TRUE parameters. Color them seperately and make note of which distribution is which color.