library(psych)
dim(iris)
## [1] 150 5
There is 150 rows and 5 columns in this matrix. As we learned in earlier readings, each row represents a unique observational unit. Therefore, there was 150 cases included in this data.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
We can see from the table above, there are 4 numerical variables: sepal.length, sepal.width, petal.length, and petal.width. These are all continuous as they are a measurable value.
unique(iris$Species)
## [1] setosa versicolor virginica
## Levels: setosa versicolor virginica
We knew from the previous question that Species was the only categorical variable. The code snippet above shows the Species variable is composed of 3 values: Setosa, Versicolor, and Virginica.
I chose to go with the Penguins dataset as those are my second favorite animal (behind Pandas).
summary(penguins)
## species island bill_len bill_dep
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_len body_mass sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
The summary above shows we are working with cross-sectional data as this is a study of many subjects (penguins). The dataset has 8 variables: species, island, bill_length, bill_dep, flipper_len, body_mass, sex, year.
Now lets see if there’s any kind of relationship between the length of a penguins bill and how much that penguin weighs (in grams)
plot(
x=penguins$bill_len,
y=penguins$body_mass,
xlab = 'Body Mass of Penguin (grams)',
ylab = 'Bill Length of Penguin (millimeters)'
)
The scatterplot above would suggest that that longer the penguins bill is, the more that penguin weighs