Unlocking Statitics and Regression with R and RStudio Cloud

Motivation

Why I should learn Statistic with R - GP initiative

Statistics is used everywhere.
- Research, M.Sc, Ph.D, Business Analytics
- SPSS, SAS, STATA, IBM, MatLat, etc dependency
- R is open source for all platform
Current AI wave and Data Science Job
- Deep Learning = Domain Knowledge + Statistics + Coding Skill
- Machine Learning = Domain Knowledge + Statistics + Coding Skill
  - Regression vs. Business Predictive Analytics
  - Classification
- Data Science Salary is quite high
  - Data Scientist
  - Data Analyst
  - Data Engineer
Need to assist to learn Statistics since young

Objectives

Assist and Support to all interested R learners
Provide Technical Assistance and services to them
- Professional Analysis
- Thesis, Dissertation and Business report writing and web publishing
- Dashboard development
Educate R Statistic for Data Science Career
- Machine Learning
- Deep Learning
- ChatGPT and Generative AI integration

Prerequisite - Software Installation

R
RStudio (Posit Cloud) IDE
Rtools (Github)
Quarto (Reporting)
Anaconda (Jupyter Notebook, Jupyter lab, ipython, )

Lecture 1: Introduction to R and Basic Coding

Introduction to R and Basic Coding

What is R? (brief overview of R and its uses)
Basic syntax and data types in R (vectors, variables, basic operators)
Example code for basic arithmetic operations
Hands-on exercise: Write a simple R script to perform basic arithmetic operations
Solutions to hands-on exercise

What is R?

R is a programming language and environment for statistical computing and graphics

R was introduced since 2001 using S language

R can be used in different platform such as Window, Mac, Linux

R is widely used in academia, research, and industry for data analysis and visualization

R is free and open-source, Current version - 4.4.1

R vs. RStudio vs Positron

R is a main workhorse as a motor engine to run

RStudio is an Integrated Development Environment (IDE), serve as a car body

Packages are tools to work with like shovel, knife, sachet, other parts of cars

There are 21140 packages by now.

Top 10 packages are ‘ggplot2’, ‘rlang’, ‘magrittr’, ‘dplyr’, ‘vctrs’, ‘cli’, ‘tibble’, ‘devtools’, ‘jsonlite’, ‘Rcpp’

Their respective downloads are 147,070,480; 135,195,761; 125,445,993; 110,729,472; 98,242,310; 95,558,907; 93,010,991; 91,765,271; 91,013,906; 87,825,074

Positron is integrated for both R and Python for data science projects

R packages downloads

Motivation

Data Science is one of Sexiest Job in 21 centuries.
R, Python and Julia are top for data science projects
Salary range 5000 USD - 12000 USD per month
What you need to know?
- Machine Learning
  - Classification
    - Supervise
    - Unsupervise
  - Regression
  - Reinforcement Learning
- Statistics
- Data Engineering
  - Big Data management
  - Data manipulation

Basic syntax and data types in R

Scalars: only one value

Vectors: a collection of values of the same type

Variables: a name given to a value or a vector

Basic operators: `+, -, *, /, %%, > , >=, <, <=, !=, etc.

Used in

Mathematic Calculation
Matrix (Linear Programming (lpSolve))
Statistics
Probability and Distribution
Text mining
Data Visualization
Spatial Data Analysis
Reporting
Dashboard
Business Analytics

Function

name_of_function <- function(){}

mean( ), median( ), mode( ), sd( ), summary( ), sample( )
cor( )
lm( ) # $y = \beta + \alpha x$

Basic Syntax

Example code for basic arithmetic operations

https://webr.r-wasm.org/latest/

Lecture 2: Data Types and Structures

Data Types and Structures

Data types in R (numeric, character, logical, factor)
Data structures in R (vectors, matrices, data frames)
Example code for creating vectors, matrices, and data frames
Hands-on exercise: Create a data frame from vectors
Solutions to hands-on exercise

Data types in R

Numeric: 1, 2, 3, etc.

Character: “hello”, “world”, etc.

Logical: TRUE, FALSE, etc.

Factor: categorical data, e.g. “male”, “female”, etc.

Data types in details

String (character)

single quote ' ' or double quote " " escape character “\”

read_csv(“C:/Users/kyawmoeaung/Downloads/iris.csv”) - Mac read_csv(“C:\Users\kyawmoeaung\Downloads\iris.csv”) - Window
Numeric
- double (floating point, decimal)
- int (integer)
Factor
- binary (yes, no; gender- male, female)
- ordinal (0 - 100, 101 - 200, 201 - 300, 301 - 400)
- nominal data - (eg. forest types, rice varieties, endangered species)
Logical
- TRUE # must be calculator
- FALSE

Data structures in R

Vectors: a collection of values of the same type

Matrices: a two-dimensional array of values of the same type

Data frames: a collection of vectors of different types

List: all objects can be stored

Vectors

Vectors can be created using c() function which object length length() is more than 1. can be verified by typeof() and class()

numeric vector

integer data

logical vector

factor

Matrices

Can be created with matrix() function.

DataFrame

Commonly used data type.

List

Store all type of data covering, vector, matrix, dataframe, any objects list()` function is used to create a list

Converting from one type to another

Use as.. ### For vectors * as.numeric()

as.character()
as.factor()

For matrix and Data Frame

as.data.frame()
as_tibble()
as.matrix()

checking data type/structure with `as.`

is.numeric() # checking double or integer type
is.vector() # vector data
is.matrix() # matrix
is.data.frame() # data.frame
is_tibble() # tibble
is.character() # character
is.na() # NA value

Lecture 3: Importing and Exporting Data

Importing and Exporting Data

Importing data into R (e.g. CSV, Excel, )
Example code for importing data using read.csv() and readxl
Exporting data from R (e.g. CSV, Excel)
Example code for exporting data using write.csv() and write_xlsx()
Hands-on exercise: Import and export a sample dataset

Importing data into R

Import files using RStudio IDE (GUI)

* Go to Environment Pane

* Click on 'Import Dataset'

* Choose Text, Excel, SPSS, Stata, SAS

* Choose file 

* click import

Learn the code while importing

importing data

read_csv() # load(readr package)
read_tsv() # load(readr package)
read_excel() # load(readxl package)
read_sav() # load(haven package)
read_sas() # load(haven package)
read_dta() # load(haven package)

:::

Exporting data

write_csv() # require readr
write_tsv() # require readr
write_xlsx() # require writexl package
write_sav() # require haven
write_sas() # require haven
write_dta() # require haven

Example code for importing data

Lecture 4: Summary Statistics and Data Visualization

Summary Statistics and Data Visualization

Summary statistics in R (mean, median, mode, standard deviation, variance )
Example code for calculating summary statistics using summary()
Data visualization in R (histograms, box plots)
Example code for creating visualizations using ggplot2
Hands-on exercise: Create summary statistics and visualizations for a sample dataset

Summary statistics in R

gapminder data contains world pop, gdp and life expectancy data.

Life Expectancy, Populaion, GDP summary

continent	mean_pop	mean_lifeExp	mean_gdp	sd_pop	sd_lifeExp	sd_gdp
Africa	9916003	48.86533	2193.755	15490923	9.150210	9.150210
Americas	24504795	64.65874	7136.110	50979430	9.345088	9.345088
Asia	77038722	60.06490	7902.150	206885205	11.864532	11.864532
Europe	17169765	71.90369	14469.476	20519438	5.433178	5.433178
Oceania	8874672	74.32621	18621.609	6506342	3.795611	3.795611

Summary statistics in R

summary(): calculate summary statistics for a dataset summary( ) function can be used on the whole dataset and individual columns we specified.

summary of gapminder data

continent	year	lifeExp	pop	gdpPercap
Afghanistan: 12	Africa :624	Min. :1952	Min. :23.60	Min. :6.001e+04	Min. : 241.2
Albania : 12	Americas:300	1st Qu.:1966	1st Qu.:48.20	1st Qu.:2.794e+06	1st Qu.: 1202.1
Algeria : 12	Asia :396	Median :1980	Median :60.71	Median :7.024e+06	Median : 3531.8
Angola : 12	Europe :360	Mean :1980	Mean :59.47	Mean :2.960e+07	Mean : 7215.3
Argentina : 12	Oceania : 24	3rd Qu.:1993	3rd Qu.:70.85	3rd Qu.:1.959e+07	3rd Qu.: 9325.5
Australia : 12	NA	Max. :2007	Max. :82.60	Max. :1.319e+09	Max. :113523.1
(Other) :1632	NA	NA	NA	NA	NA

Example code for calculating summary statistics

summary(gapminder$pop)
summary(gapminder$gdpPercap)
summary(gapminder$lifeExp)

[1] “Summary of Pop”

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
60011	2793664	7023596	29601212	19585222	1318683096

[1] “Summary of lifeExp”

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
23.60	48.20	60.71	59.47	70.85	82.60

[1] “Summary of gdp”

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
241.2	1202.1	3531.8	7215.3	9325.5	113523.1

Visualization

Explore gapminder data
library(gapminder) # Load your package
library(tidyverse) # load tidyverse
library(knitr) # load knitr
glimpse(gapminder) # data structure and data type exploration
plot(gapminder) # looking into correlated variables, factor and numeric variables
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
add geom_point() +
Because scales are too different between lifeExp and gdpPercap, we need to transform x value by natural logarithym
add scale_x_log10()

Gapminder data visualization

Life expectancy vs. gdpPercap (before scaling)

Gapminder data visualization

add scale_x_log10() function to adjust x and y scale.

Life expectancy vs. gdpPercap (after scaling)

Life expectancy vs. gdpPercap (after scaling and adding color)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size and shape)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding sizewith life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with `population`) by grouping continent (only 1952 and 2007))

Lecture 5: Correlation and Regression

Correlation and Regression

Correlation analysis in R (Pearson’s r, scatter plots)
Example code for calculating correlation using cor()
Simple linear regression in R
Example code for performing simple linear regression using lm()
Hands-on exercise: Perform correlation analysis and simple linear regression on a sample dataset

Correlation analysis in R

Correlation is measured the strength of relationship between two varialbes.

Correlation strength ranges from -1 to 1.
SPSS survival manual - Julian Pellet mentioned how to interpret them.
- 1 - perfect correlation
- above 0.7 - very strong correlation
- 0.6 to 0.69 - strong correlation
- 0.4 to 0.59 - medium correlation
- 0.2 to 0.39 - weak correlation
- 0.1 to 0.19 - very week correlation (no correlation)
- 0 - no correlation

cor()1: calculate correlation between two variables

corrplot(): visulization of correlation among variables need to install corrplot package

Example code for calculating correlation

cor(data$x, data$y)

cor() requires two arguments for x and y variables. Each column can be accessed by data followed by $ and by column name.

[The correlation between carat and price is: 0.92159130119347]

We can conclude that there is a very strong correlation between price and carat from diamond dataset.

Correlation with cor() on palmerpenguins

cor(penguins[3:6]) |> knitr::kable()

bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g
bill_length_mm	1.0000000	-0.2286256	0.6530956	0.5894511
bill_depth_mm	-0.2286256	1.0000000	-0.5777917	-0.4720157
flipper_length_mm	0.6530956	-0.5777917	1.0000000	0.8729789
body_mass_g	0.5894511	-0.4720157	0.8729789	1.0000000

round(cor(penguins[3:6]), 2)|> knitr::kable()

	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g
bill_length_mm	1.00	-0.23	0.65	0.59
bill_depth_mm	-0.23	1.00	-0.58	-0.47
flipper_length_mm	0.65	-0.58	1.00	0.87
body_mass_g	0.59	-0.47	0.87	1.00

Correllation with corrplot()

corrplot() function takes two arguments: data with variables, method.

corrplot(data, method)

method can be: 1. “circle”, 2. “square”, 3. “ellipse”, 4. “number”, 5. “shade”, 6. “color”, 7. “pie”.

Example code with corrplot()

corrplot() on penguin dataset

corrplot() on iris dataset

Rows: 150

Columns: 5

$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…

$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…

$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…

$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…

$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

corrplot() on iris dataset

	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width
Sepal.Length	1.00	-0.12	`0.87`	`0.82`
Sepal.Width	-0.12	1.00	-0.43	-0.37
Petal.Length	`0.87`	-0.43	1.00	`0.96`
Petal.Width	`0.82`	-0.37	`0.96`	1.00

corrplot() on iris dataset

Final exercise for correlation

Use mtcars dataset for cor() function and corrplot() function
Step 1. look at your data
Step 2. Remove NA values using na.omit() function
Step 3. Remove all charater related factors columns
search correlation among variables
Show your corrplot()

Linear Regression - Introduction

Linear regression is widely used by researchers, academia, finance, banking and business for prediction of variables.

There are many regression models and some of them are:

Simple Linear Regression Model
Multiple Linear Regression
Simple Non-Linear Regression
Multiple Non-Linear Regression

Linear Regression formula in R

lm(formula, data)

glm(formula, family, data) # for binary data

Linear Model

lm : linear model

formula : y ~ x

data : dataset

Generalized Linear Model

glm : Generalize Linear Model

formula : y ~ x

family : gaussian, bionomial or poisson

data : dataset

Linear Model formula
lm(y ~ x, data= data)

Generalize Linear Model
lm(y ~ x, family, data= data)

Some definitions

coefficients: intercept or slope terms
residuals : actual - predicted
R-squared : Model Performance
Adjusted R-squared : Model Performance
RSE : Residual Standard Error
RMSE : Root Mean Squared Error

Steps to analyze Linear Model

load packages - tidyverse, modelr, broom, palmerpenguins
datasets : mpg, iris, gapminder, penguins, mtcars, fish
take a look on the structure of data and identify dependent variable or response variable and independent variable or explanatory variable
explore data with glimpse(), head(), tail(), summary(), plot() to identify data types, correlation patterns among variables.
start lm() function, add formular (response ~ explanatory variable) and add dataset.
assign the variable with fit or the name you like
run lm() model
print with summary() function.

Example code

mpg data exploration

str(mtcars)

‘data.frame’: 32 obs. of 11 variables:

$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …

$ cyl : num 6 6 4 6 8 6 8 4 4 6 …

$ disp: num 160 160 108 258 360 …

$ hp : num 110 110 93 110 175 105 245 62 95 123 …

$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …

$ wt : num 2.62 2.88 2.32 3.21 3.44 …

$ qsec: num 16.5 17 18.6 19.4 17 …

$ vs : num 0 0 1 1 0 1 0 1 1 1 …

$ am : num 1 1 1 0 0 0 0 0 0 0 …

$ gear: num 4 4 4 3 3 3 3 4 4 4 …

$ carb: num 4 4 1 1 2 1 4 2 2 4 …

mpg data exploration

head(mtcars)

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225	105	2.76	3.460	20.22	1	0	3	1

mpg data exploration

tail(mtcars)

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Porsche 914-2	26.0	4	120.3	91	4.43	2.140	16.7	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.9	1	1	5	2
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.5	0	1	5	4
Ferrari Dino	19.7	6	145.0	175	3.62	2.770	15.5	0	1	5	6
Maserati Bora	15.0	8	301.0	335	3.54	3.570	14.6	0	1	5	8
Volvo 142E	21.4	4	121.0	109	4.11	2.780	18.6	1	1	4	2

mpg data exploration

summary()

mpg	cyl	disp	hp
Min. :10.40	Min. :4.000	Min. : 71.1	Min. : 52.0
1st Qu.:15.43	1st Qu.:4.000	1st Qu.:120.8	1st Qu.: 96.5
Median :19.20	Median :6.000	Median :196.3	Median :123.0
Mean :20.09	Mean :6.188	Mean :230.7	Mean :146.7
3rd Qu.:22.80	3rd Qu.:8.000	3rd Qu.:326.0	3rd Qu.:180.0
Max. :33.90	Max. :8.000	Max. :472.0	Max. :335.0

drat	wt	qsec	vs
Min. :2.760	Min. :1.513	Min. :14.50	Min. :0.0000
1st Qu.:3.080	1st Qu.:2.581	1st Qu.:16.89	1st Qu.:0.0000
Median :3.695	Median :3.325	Median :17.71	Median :0.0000
Mean :3.597	Mean :3.217	Mean :17.85	Mean :0.4375
3rd Qu.:3.920	3rd Qu.:3.610	3rd Qu.:18.90	3rd Qu.:1.0000
Max. :4.930	Max. :5.424	Max. :22.90	Max. :1.0000

am	gear	carb
Min. :0.0000	Min. :3.000	Min. :1.000
1st Qu.:0.0000	1st Qu.:3.000	1st Qu.:2.000
Median :0.0000	Median :4.000	Median :2.000
Mean :0.4062	Mean :3.688	Mean :2.812
3rd Qu.:1.0000	3rd Qu.:4.000	3rd Qu.:4.000
Max. :1.0000	Max. :5.000	Max. :8.000

mtcars data exploration

plot(mtcars)

mpg : Miles gallon

cyl : number of cylinders

displ: engine displacement

hp: housepower

drat: Rear axle ratio

wt: weight

mtcars Simple Linear Regression

Call:

lm(formula = mpg ~ wt, data = mtcars)

Coefficients:

(Intercept) wt

 37.285       -5.344

response = Intercept + slope * explanatory variable

1 pound weight corresponds to 37.285 + (-5.344) * 1 = 31.941 miles per gallon.

2 pound weight corresponds to 37.285 + (-5.344) * 2 = 26.597 miles per gallon.

3 pound weight corresponds to 37.285 + (-5.344) * 3 = 21.253 miles per gallon.

4 pound weight corresponds to 37.285 + (-5.344) * 4= 15.909 miles per gallon.

5 pound weight corresponds to 37.285 + (-5.344) * 5= 10.565 miles per gallon.

Interpretation and visualization

Interpretation and visualization (change color to local variable)

Interpretation and visualization (change color to local variable and method to ‘lm’)

Interpreting model performance with broom package

install broom and modelr package
load packages
fit the model
use augment(), glance(), pull()
pull r.squared, adjusted.r.squared, sigma

Interpreting model performance with broom package

Linear Model exercise with palmerpenguins dataset

load required libraries
check the structure and data type of penguins dataset
identify two variables (response vs explanatory variables)
Remove NA values using na.omit() function on penguins by overwritting the penguin dataset
write formula lm(formula, data) and assign to a variable you like
print the fitted variable or the model
print summary() of the model
Interpret model performance and visualize the data

Lecture 6: Cluster Analysis

Cluster Analysis

Cluster analysis in R (k-means, hierarchical clustering)
Example code for performing k-means clustering using kmeans()
Example code for performing hierarchical clustering using hclust
Example code for visualizing clusters using fviz_cluster()
Hands-on exercise: Perform k-means clustering and hierarchical clustering on a sample dataset

Cluster analysis in R

K-means clustering: partition data into k clusters based on similarity
Hierarchical clustering: build a hierarchy of clusters by merging or splitting existing clusters
kmeans() requires two arguments data and centers

Example code for performing k-means and hierachical clustering

Additional Resources

R documentation: https://www.r-project.org/
R tutorials: https://www.datacamp.com/tutorial/r-tutorial
R cheat sheets: https://www.rstudio.com/online-learning/

General Tips

Use clear and concise language in your slides
Use images, diagrams, and charts to illustrate complex concepts
Use code examples to demonstrate how to perform tasks in R
Leave space for notes and comments
Use a consistent font and color scheme throughout the presentation

Motivation

Why I should learn Statistic with R - GP initiative

Objectives

Prerequisite - Software Installation

Lecture 1: Introduction to R and Basic Coding

Introduction to R and Basic Coding

What is R?

R vs. RStudio vs Positron

Motivation

Basic syntax and data types in R

Basic Syntax

Example code for basic arithmetic operations

Lecture 2: Data Types and Structures

Data Types and Structures

Data types in R

Data types in details

Data structures in R

Vectors

Matrices

DataFrame

List

Converting from one type to another

For matrix and Data Frame

checking data type/structure with as.

Lecture 3: Importing and Exporting Data

Importing and Exporting Data

Importing data into R

importing data

Exporting data

Example code for importing data

Lecture 4: Summary Statistics and Data Visualization

Summary Statistics and Data Visualization

Summary statistics in R

Life Expectancy, Populaion, GDP summary

Summary statistics in R

summary of gapminder data

Example code for calculating summary statistics

Visualization

Gapminder data visualization

Life expectancy vs. gdpPercap (before scaling)

Gapminder data visualization

Life expectancy vs. gdpPercap (after scaling)

Life expectancy vs. gdpPercap (after scaling and adding color)

Life expectancy vs. gdpPercap (after scaling and adding color)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size and shape)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size and shape)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding sizewith life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with population) by grouping continent (only 1952 and 2007))

Lecture 5: Correlation and Regression

Correlation and Regression

Correlation analysis in R

Example code for calculating correlation

Correlation with cor() on palmerpenguins

Correllation with corrplot()

Example code with corrplot()

corrplot() on penguin dataset

corrplot() on iris dataset

corrplot() on iris dataset

corrplot() on iris dataset

Final exercise for correlation

Linear Regression - Introduction

Linear Regression formula in R

Linear Model

Generalized Linear Model

Linear Model formula lm(y ~ x, data= data)

Generalize Linear Model lm(y ~ x, family, data= data)

Some definitions

Steps to analyze Linear Model

Example code

mpg data exploration

mpg data exploration

mpg data exploration

mpg data exploration

mtcars data exploration

checking data type/structure with `as.`

Life expectancy vs. gdpPercap (after scaling and adding color, adding size with `population`) by grouping continent (only 1952 and 2007))

Linear Model formula
lm(y ~ x, data= data)

Generalize Linear Model
lm(y ~ x, family, data= data)