R Basics Homework

Example Question

32+3

## [1] 35

Question One

R is the coding language used for data analysis and software, whereas RStudio is a place to use the R language to perform statistical analysis. The program R that is downloaded on the computer is just to provide your computer with the langage and RStudio is an environment to use the language.

Question Two

tan(1)-(4.2^(0.3)*175)

## [1] -267.604

abs(-7.5^(0.832)+cos(pi/(abs(12))))-0.7

## [1] 3.680345

Question Three

Sin 0.7: There is a special character that does not belong (most likely the space). tin(0.7): The “tin” function does not exist- or produce values. Sin0.7: The program cannot compute this because the object wasn’t created. You need paranthesis in order for it to recognize it as a function- otherwise it will think it is an object name.

Question Four

happy <- c(1, 3, 5) #first object with vector
happy

## [1] 1 3 5

glamper <- c(7, 9, 11) #second object with vector
glamper

## [1]  7  9 11

happy*glamper #multiply vectors

## [1]  7 27 55

happyglamper <- happy*glamper #creating new object
happyglamper #new object with vectors multiplied

## [1]  7 27 55

print(happyglamper)

## [1]  7 27 55

Question Five

summary(CO2)

##      Plant             Type         Treatment       conc     
##  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95  
##  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175  
##  Qn3    : 7                                    Median : 350  
##  Qc1    : 7                                    Mean   : 435  
##  Qc3    : 7                                    3rd Qu.: 675  
##  Qc2    : 7                                    Max.   :1000  
##  (Other):42                                                  
##      uptake     
##  Min.   : 7.70  
##  1st Qu.:17.90  
##  Median :28.30  
##  Mean   :27.21  
##  3rd Qu.:37.12  
##  Max.   :45.50  
##

str(CO2)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   84 obs. of  5 variables:
##  $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
##  $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...
##  $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
##  - attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula'  language ~Treatment * Type
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Ambient carbon dioxide concentration"
##   ..$ y: chr "CO2 uptake rate"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(uL/L)"
##   ..$ y: chr "(umol/m^2 s)"

head(CO2)

##   Plant   Type  Treatment conc uptake
## 1   Qn1 Quebec nonchilled   95   16.0
## 2   Qn1 Quebec nonchilled  175   30.4
## 3   Qn1 Quebec nonchilled  250   34.8
## 4   Qn1 Quebec nonchilled  350   37.2
## 5   Qn1 Quebec nonchilled  500   35.3
## 6   Qn1 Quebec nonchilled  675   39.2

The summary function will give you a summary of all the columns in the dataset. Str will tell you the different types of data in the set and the different levels of variables in the set- telling you if things are objects/functions/values/etc. Head will give you the first 6 rows of the dataframe.

Question Six

boxplot(CO2$Treatment, CO2$uptake)

Boxplots can tell you whether or not a data set is symmetric or skewed. This boxplot for the CO2 dataset shows that the chilled treatment has a close to normal distribution because the median bar line evenly splits up the box and has close to even whiskers. The unchilled treatment is skewed closer to zero, meaning that unchilled treatments had CO2 uptake values closer to zero.

Question Seven

summary(CO2)

##      Plant             Type         Treatment       conc     
##  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95  
##  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175  
##  Qn3    : 7                                    Median : 350  
##  Qc1    : 7                                    Mean   : 435  
##  Qc3    : 7                                    3rd Qu.: 675  
##  Qc2    : 7                                    Max.   :1000  
##  (Other):42                                                  
##      uptake     
##  Min.   : 7.70  
##  1st Qu.:17.90  
##  Median :28.30  
##  Mean   :27.21  
##  3rd Qu.:37.12  
##  Max.   :45.50  
##

str(CO2)

## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':   84 obs. of  5 variables:
##  $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
##  $ conc     : num  95 175 250 350 500 675 1000 95 175 250 ...
##  $ uptake   : num  16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
##  - attr(*, "formula")=Class 'formula'  language uptake ~ conc | Plant
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "outer")=Class 'formula'  language ~Treatment * Type
##   .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
##  - attr(*, "labels")=List of 2
##   ..$ x: chr "Ambient carbon dioxide concentration"
##   ..$ y: chr "CO2 uptake rate"
##  - attr(*, "units")=List of 2
##   ..$ x: chr "(uL/L)"
##   ..$ y: chr "(umol/m^2 s)"

head(CO2)

##   Plant   Type  Treatment conc uptake
## 1   Qn1 Quebec nonchilled   95   16.0
## 2   Qn1 Quebec nonchilled  175   30.4
## 3   Qn1 Quebec nonchilled  250   34.8
## 4   Qn1 Quebec nonchilled  350   37.2
## 5   Qn1 Quebec nonchilled  500   35.3
## 6   Qn1 Quebec nonchilled  675   39.2

No, there is not missing data. It is important to see if there are NA values because in order ot performal statistical analysis on data the NA values must be changed to zeros, otherwise your commands will not work.

Question Eight

head(CO2)

##   Plant   Type  Treatment conc uptake
## 1   Qn1 Quebec nonchilled   95   16.0
## 2   Qn1 Quebec nonchilled  175   30.4
## 3   Qn1 Quebec nonchilled  250   34.8
## 4   Qn1 Quebec nonchilled  350   37.2
## 5   Qn1 Quebec nonchilled  500   35.3
## 6   Qn1 Quebec nonchilled  675   39.2

#CriteriaOne

quebec <- CO2[which(CO2$Type=='Quebec' & CO2$conc>=435), ] #Quebec and conc >=435

quebec$Plant <- NULL
quebec$Type <- NULL
quebec$conc <- NULL

head(quebec) #dataset with chilled/nonchilled data for quebec where concentration values were greater than or equal to 435.

##     Treatment uptake
## 5  nonchilled   35.3
## 6  nonchilled   39.2
## 7  nonchilled   39.7
## 12 nonchilled   40.6
## 13 nonchilled   41.4
## 14 nonchilled   44.3

#CriteriaTwo

missquebec <- CO2[which(CO2$Treatment=='chilled' & CO2$conc==95|350), ] #only chilled treatment and conc = 95 or 350
missquebec$Plant <- NULL
missquebec$conc <- NULL
missquebec$Treatment <- NULL

head(missquebec)

##     Type uptake
## 1 Quebec   16.0
## 2 Quebec   30.4
## 3 Quebec   34.8
## 4 Quebec   37.2
## 5 Quebec   35.3
## 6 Quebec   39.2

Question Nine

qqnorm(iris$Sepal.Width)
qqline(iris$Sepal.Width)

Figure: Q-Q plot of sepal width from Iris dataset. The plot shows that this relationship does qualify for an ANOVA.

boxplot(iris$Sepal.Width~iris$Species)

Figure: Boxplot for sepal width and species from the Iris dataset. The plot shows that this relationship does qualify for an ANOVA.

3 Assumptions of an ANOVA: Data are independent- meaning that there is no sampling bias and each data point is not correlated to other points, data are normally distributed- meaning if you were to run a Q-Q plot then the points would be close to the line and not curl away, and data from groups within the set have equal variance- meaning that each category has an equal expectation of distribution from the mean. The relationship displayed in the Q-Q plot qualifies for an ANOVA because the data are independent, the distribution is normal (points are all close to the line), there is equal variance (box plots are roughly same size).

Question Ten

Assumptions of Linear Regression: Data are independent, data is normally distributed, data from groups within the set have equal variance, and data have a linear relationship.

The main difference between the assumptions of an ANOVA and a linear regression is that with linear regressions there is an additional assumption that the data have a linear relationship. You can check for a linear relationship just by plotting the data. Another difference between an ANOVA and a linear regression is that you would run an ANOVA with categorical and continuous data, whereas with a linear regression you would run it with data that is just continuous.