32+3## [1] 35
R is the coding language used for data analysis and software, whereas RStudio is a place to use the R language to perform statistical analysis. The program R that is downloaded on the computer is just to provide your computer with the langage and RStudio is an environment to use the language.
tan(1)-(4.2^(0.3)*175)## [1] -267.604
abs(-7.5^(0.832)+cos(pi/(abs(12))))-0.7## [1] 3.680345
Sin 0.7: There is a special character that does not belong (most likely the space). tin(0.7): The “tin” function does not exist- or produce values. Sin0.7: The program cannot compute this because the object wasn’t created. You need paranthesis in order for it to recognize it as a function- otherwise it will think it is an object name.
happy <- c(1, 3, 5) #first object with vector
happy## [1] 1 3 5
glamper <- c(7, 9, 11) #second object with vector
glamper## [1] 7 9 11
happy*glamper #multiply vectors## [1] 7 27 55
happyglamper <- happy*glamper #creating new object
happyglamper #new object with vectors multiplied## [1] 7 27 55
print(happyglamper)## [1] 7 27 55
summary(CO2)## Plant Type Treatment conc
## Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95
## Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175
## Qn3 : 7 Median : 350
## Qc1 : 7 Mean : 435
## Qc3 : 7 3rd Qu.: 675
## Qc2 : 7 Max. :1000
## (Other):42
## uptake
## Min. : 7.70
## 1st Qu.:17.90
## Median :28.30
## Mean :27.21
## 3rd Qu.:37.12
## Max. :45.50
##
str(CO2)## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of 5 variables:
## $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
## $ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
## $ conc : num 95 175 250 350 500 675 1000 95 175 250 ...
## $ uptake : num 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
## - attr(*, "formula")=Class 'formula' language uptake ~ conc | Plant
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Treatment * Type
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Ambient carbon dioxide concentration"
## ..$ y: chr "CO2 uptake rate"
## - attr(*, "units")=List of 2
## ..$ x: chr "(uL/L)"
## ..$ y: chr "(umol/m^2 s)"
head(CO2)## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
The summary function will give you a summary of all the columns in the dataset. Str will tell you the different types of data in the set and the different levels of variables in the set- telling you if things are objects/functions/values/etc. Head will give you the first 6 rows of the dataframe.
boxplot(CO2$Treatment, CO2$uptake) Boxplots can tell you whether or not a data set is symmetric or skewed. This boxplot for the CO2 dataset shows that the chilled treatment has a close to normal distribution because the median bar line evenly splits up the box and has close to even whiskers. The unchilled treatment is skewed closer to zero, meaning that unchilled treatments had CO2 uptake values closer to zero.
summary(CO2)## Plant Type Treatment conc
## Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95
## Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175
## Qn3 : 7 Median : 350
## Qc1 : 7 Mean : 435
## Qc3 : 7 3rd Qu.: 675
## Qc2 : 7 Max. :1000
## (Other):42
## uptake
## Min. : 7.70
## 1st Qu.:17.90
## Median :28.30
## Mean :27.21
## 3rd Qu.:37.12
## Max. :45.50
##
str(CO2)## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 84 obs. of 5 variables:
## $ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
## $ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
## $ conc : num 95 175 250 350 500 675 1000 95 175 250 ...
## $ uptake : num 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
## - attr(*, "formula")=Class 'formula' language uptake ~ conc | Plant
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Treatment * Type
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Ambient carbon dioxide concentration"
## ..$ y: chr "CO2 uptake rate"
## - attr(*, "units")=List of 2
## ..$ x: chr "(uL/L)"
## ..$ y: chr "(umol/m^2 s)"
head(CO2)## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
No, there is not missing data. It is important to see if there are NA values because in order ot performal statistical analysis on data the NA values must be changed to zeros, otherwise your commands will not work.
head(CO2)## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
#CriteriaOne
quebec <- CO2[which(CO2$Type=='Quebec' & CO2$conc>=435), ] #Quebec and conc >=435
quebec$Plant <- NULL
quebec$Type <- NULL
quebec$conc <- NULL
head(quebec) #dataset with chilled/nonchilled data for quebec where concentration values were greater than or equal to 435.## Treatment uptake
## 5 nonchilled 35.3
## 6 nonchilled 39.2
## 7 nonchilled 39.7
## 12 nonchilled 40.6
## 13 nonchilled 41.4
## 14 nonchilled 44.3
#CriteriaTwo
missquebec <- CO2[which(CO2$Treatment=='chilled' & CO2$conc==95|350), ] #only chilled treatment and conc = 95 or 350
missquebec$Plant <- NULL
missquebec$conc <- NULL
missquebec$Treatment <- NULL
head(missquebec)## Type uptake
## 1 Quebec 16.0
## 2 Quebec 30.4
## 3 Quebec 34.8
## 4 Quebec 37.2
## 5 Quebec 35.3
## 6 Quebec 39.2
qqnorm(iris$Sepal.Width)
qqline(iris$Sepal.Width)Figure: Q-Q plot of sepal width from Iris dataset. The plot shows that this relationship does qualify for an ANOVA.
boxplot(iris$Sepal.Width~iris$Species)Figure: Boxplot for sepal width and species from the Iris dataset. The plot shows that this relationship does qualify for an ANOVA.
3 Assumptions of an ANOVA: Data are independent- meaning that there is no sampling bias and each data point is not correlated to other points, data are normally distributed- meaning if you were to run a Q-Q plot then the points would be close to the line and not curl away, and data from groups within the set have equal variance- meaning that each category has an equal expectation of distribution from the mean. The relationship displayed in the Q-Q plot qualifies for an ANOVA because the data are independent, the distribution is normal (points are all close to the line), there is equal variance (box plots are roughly same size).
Assumptions of Linear Regression: Data are independent, data is normally distributed, data from groups within the set have equal variance, and data have a linear relationship.
The main difference between the assumptions of an ANOVA and a linear regression is that with linear regressions there is an additional assumption that the data have a linear relationship. You can check for a linear relationship just by plotting the data. Another difference between an ANOVA and a linear regression is that you would run an ANOVA with categorical and continuous data, whereas with a linear regression you would run it with data that is just continuous.