Class 3 - Data Structures and Basic Stats

Beginning with a Matrix

# A Matrix is a rectangular (numeric) data structure. 
# A matrix must consist of one data type and is imperitive with image analysis

# we will usually work with data sets that also have to be rectangular but can have multiple data types - think mtcars. 

matrix(c(0,10), nrow = 2, ncol = 5, byrow = TRUE)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0   10    0   10    0
## [2,]   10    0   10    0   10
# The numbers are filled by rows because of the last arguments. 

A Dataset

Like mentioned above is a rectangular data set with multiple datatypes

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# mtcars is an example of a data set 
# head(mtcars) throws out the first 6 fors of the set (or any data set you input for that matter)

Importing specific data in an excel sheet & dealing with missing data

# Missing data can ruin a data output if not handles properly
# How do you handle missing data?

#Do you leave the cell black?
# Do you leave a "NONE" or something similar in the cell

# R deals with blank cells and "NONE" differently

You will see a NA = box in the window that pops up in the “Import Dataset -> Excel” window. If you track empty cells with NONE or somethign, you will enter that value in that box. If you leave your empty cells blank, then leave the box blank too.

R represents missing data with NA (not available)

MissingValueVector = c(1,2,NA,4,5,NA,6,7,8,9)
MissingValueVector
##  [1]  1  2 NA  4  5 NA  6  7  8  9

Next topic presented in class 3 was a introductory look at statictics and how we define them and classify them.

We used Taft High School Math students vs all american math students

Following this example we shoudl know the following terms:

Statistics vs. Parameters
Sample vs. Population
Central Tendency vs. Spread
Distribution

Introduction to Liner Regression

attach(mtcars)
plot(wt,hp, main="Weight vs. HP", col = 'blue', type = 'p')
abline(lm(hp ~ wt), col = 'red')

##### In three lines of code, we have attached a data set, meanign that the column names have automatically become variables (apparent in the next line). ##### The line that creates the trend-line in the plot is the third one line in the above code. abline = a function that takes in coordinates or a formula that creates a straight line. ##### lm() crates the actual formula between the relationship between two variables. ##### The code below spits out the necessary coefficients for abline()

lm(hp~wt)
## 
## Call:
## lm(formula = hp ~ wt)
## 
## Coefficients:
## (Intercept)           wt  
##      -1.821       46.160