Unlocking Statitics and Regression with R and RStudio Cloud
Motivation
Why I should learn Statistic with R - GP initiative
Statistics is used everywhere.
Research, M.Sc, Ph.D, Business Analytics
SPSS, SAS, STATA, IBM, MatLat, etc dependency
R is open source for all platform
Current AI wave and Data Science Job
Deep Learning = Domain Knowledge + Statistics + Coding Skill
Machine Learning = Domain Knowledge + Statistics + Coding Skill
Regression vs. Business Predictive Analytics
Classification
Data Science Salary is quite high
Data Scientist
Data Analyst
Data Engineer
Need to assist to learn Statistics since young
Objectives
Assist and Support to all interested R learners
Provide Technical Assistance and services to them
Professional Analysis
Thesis, Dissertation and Business report writing and web publishing
Dashboard development
Educate R Statistic for Data Science Career
Machine Learning
Deep Learning
ChatGPT and Generative AI integration
Prerequisite - Software Installation
R
RStudio (Posit Cloud) IDE
Rtools (Github)
Quarto (Reporting)
Anaconda (Jupyter Notebook, Jupyter lab, ipython, )
Lecture 1: Introduction to R and Basic Coding
Introduction to R and Basic Coding
What is R? (brief overview of R and its uses)
Basic syntax and data types in R (vectors, variables, basic operators)
Example code for basic arithmetic operations
Hands-on exercise: Write a simple R script to perform basic arithmetic operations
Solutions to hands-on exercise
What is R?
R is a programming language and environment for statistical computing and graphics
R was introduced since 2001 using S language
R can be used in different platform such as Window, Mac, Linux
R is widely used in academia, research, and industry for data analysis and visualization
R is free and open-source, Current version - 4.4.1
R vs. RStudio vs Positron
R is a main workhorse as a motor engine to run
RStudio is an Integrated Development Environment (IDE), serve as a car body
Packages are tools to work with like shovel, knife, sachet, other parts of cars
There are 21140 packages by now.
Top 10 packages are ‘ggplot2’, ‘rlang’, ‘magrittr’, ‘dplyr’, ‘vctrs’, ‘cli’, ‘tibble’, ‘devtools’, ‘jsonlite’, ‘Rcpp’
Their respective downloads are 147,070,480; 135,195,761; 125,445,993; 110,729,472; 98,242,310; 95,558,907; 93,010,991; 91,765,271; 91,013,906; 87,825,074
Positron is integrated for both R and Python for data science projects
Motivation
Data Science is one of Sexiest Job in 21 centuries.
R, Python and Julia are top for data science projects
Salary range 5000 USD - 12000 USD per month
What you need to know?
Machine Learning
Classification
Supervise
Unsupervise
Regression
Reinforcement Learning
Statistics
Data Engineering
Big Data management
Data manipulation
Basic syntax and data types in R
Scalars: only one value
Vectors: a collection of values of the same type
Variables: a name given to a value or a vector
Basic operators: `+, -, *, /, %%, > , >=, <, <=, !=, etc.
Used in
Mathematic Calculation
Matrix (Linear Programming (lpSolve))
Statistics
Probability and Distribution
Text mining
Data Visualization
Spatial Data Analysis
Reporting
Dashboard
Business Analytics
Function
name_of_function <- function(){}
mean( ), median( ), mode( ), sd( ), summary( ), sample( )
cor( )
lm( ) # \(y = \beta + \alpha x\)
Basic Syntax
Example code for basic arithmetic operations
https://webr.r-wasm.org/latest/
Lecture 2: Data Types and Structures
Data Types and Structures
Data typesin R (numeric, character, logical, factor)Data structuresin R (vectors, matrices, data frames)Example code for creating vectors, matrices, and data frames
Hands-on exercise: Create a data frame from vectors
Solutions to hands-on exercise
Data types in R
Numeric: 1, 2, 3, etc.
Character: “hello”, “world”, etc.
Logical: TRUE, FALSE, etc.
Factor: categorical data, e.g. “male”, “female”, etc.
Data types in details
String (character)
single quote
' 'or double quote" "escape character “\”read_csv(“C:/Users/kyawmoeaung/Downloads/iris.csv”) - Mac read_csv(“C:\Users\kyawmoeaung\Downloads\iris.csv”) - Window
Numeric
double (floating point, decimal)
int (integer)
Factor
binary (yes, no; gender- male, female)
ordinal (0 - 100, 101 - 200, 201 - 300, 301 - 400)
nominal data - (eg. forest types, rice varieties, endangered species)
Logical
TRUE # must be calculator
FALSE
Data structures in R
Vectors: a collection of values of the same type
Matrices: a two-dimensional array of values of the same type
Data frames: a collection of vectors of different types
List: all objects can be stored
Vectors
Vectors can be created using c() function which object length length() is more than 1. can be verified by typeof() and class()
- numeric vector
- integer data
- logical vector
- factor
Matrices
Can be created with matrix() function.
DataFrame
Commonly used data type.
List
Store all type of data covering, vector, matrix, dataframe, any objects list()` function is used to create a list
Converting from one type to another
Use as.. ### For vectors * as.numeric()
as.character()
as.factor()
For matrix and Data Frame
as.data.frame()
as_tibble()
as.matrix()
checking data type/structure with as.
is.numeric()# checking double or integer typeis.vector()# vector datais.matrix()# matrixis.data.frame()# data.frameis_tibble()# tibbleis.character()# characteris.na()# NA value
Lecture 3: Importing and Exporting Data
Importing and Exporting Data
Importing data into R (e.g. CSV, Excel, )
Example code for importing data using read.csv() and readxl
Exporting data from R (e.g. CSV, Excel)
Example code for exporting data using write.csv() and write_xlsx()
Hands-on exercise: Import and export a sample dataset
Importing data into R
Import files using RStudio IDE (GUI)
* Go to Environment Pane
* Click on 'Import Dataset'
* Choose Text, Excel, SPSS, Stata, SAS
* Choose file
* click import
Learn the code while importing
importing data
read_csv()# load(readrpackage)read_tsv()# load(readrpackage)read_excel()# load(readxlpackage)read_sav()# load(havenpackage)read_sas()# load(havenpackage)read_dta()# load(havenpackage)
:::
Exporting data
write_csv()# requirereadrwrite_tsv()# requirereadrwrite_xlsx()# requirewritexlpackagewrite_sav()# requirehavenwrite_sas()# requirehavenwrite_dta()# requirehaven
Example code for importing data
Lecture 4: Summary Statistics and Data Visualization
Summary Statistics and Data Visualization
Summary statistics in R (
mean, median, mode, standard deviation, variance)Example code for calculating summary statistics using
summary()Data visualization in R (
histograms, box plots)Example code for creating visualizations using ggplot2
Hands-on exercise: Create summary statistics and visualizations for a sample dataset
Summary statistics in R
gapminder data contains world pop, gdp and life expectancy data.
Life Expectancy, Populaion, GDP summary
| continent | mean_pop | mean_lifeExp | mean_gdp | sd_pop | sd_lifeExp | sd_gdp |
|---|---|---|---|---|---|---|
| Africa | 9916003 | 48.86533 | 2193.755 | 15490923 | 9.150210 | 9.150210 |
| Americas | 24504795 | 64.65874 | 7136.110 | 50979430 | 9.345088 | 9.345088 |
| Asia | 77038722 | 60.06490 | 7902.150 | 206885205 | 11.864532 | 11.864532 |
| Europe | 17169765 | 71.90369 | 14469.476 | 20519438 | 5.433178 | 5.433178 |
| Oceania | 8874672 | 74.32621 | 18621.609 | 6506342 | 3.795611 | 3.795611 |
Summary statistics in R
summary(): calculate summary statistics for a dataset summary( ) function can be used on the whole dataset and individual columns we specified.
summary of gapminder data
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| Afghanistan: 12 | Africa :624 | Min. :1952 | Min. :23.60 | Min. :6.001e+04 | Min. : 241.2 | |
| Albania : 12 | Americas:300 | 1st Qu.:1966 | 1st Qu.:48.20 | 1st Qu.:2.794e+06 | 1st Qu.: 1202.1 | |
| Algeria : 12 | Asia :396 | Median :1980 | Median :60.71 | Median :7.024e+06 | Median : 3531.8 | |
| Angola : 12 | Europe :360 | Mean :1980 | Mean :59.47 | Mean :2.960e+07 | Mean : 7215.3 | |
| Argentina : 12 | Oceania : 24 | 3rd Qu.:1993 | 3rd Qu.:70.85 | 3rd Qu.:1.959e+07 | 3rd Qu.: 9325.5 | |
| Australia : 12 | NA | Max. :2007 | Max. :82.60 | Max. :1.319e+09 | Max. :113523.1 | |
| (Other) :1632 | NA | NA | NA | NA | NA |
Example code for calculating summary statistics
summary(gapminder$pop)summary(gapminder$gdpPercap)summary(gapminder$lifeExp)
[1] “Summary of Pop”
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 60011 | 2793664 | 7023596 | 29601212 | 19585222 | 1318683096 |
[1] “Summary of lifeExp”
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 23.60 | 48.20 | 60.71 | 59.47 | 70.85 | 82.60 |
[1] “Summary of gdp”
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 241.2 | 1202.1 | 3531.8 | 7215.3 | 9325.5 | 113523.1 |
Visualization
Explore gapminder data
library(gapminder)# Load your packagelibrary(tidyverse)# load tidyverselibrary(knitr)# load knitrglimpse(gapminder)# data structure and data type explorationplot(gapminder)# looking into correlated variables, factor and numeric variablesggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +add
geom_point() +Because scales are too different between lifeExp and gdpPercap, we need to transform x value by natural logarithym
add
scale_x_log10()
Gapminder data visualization
Life expectancy vs. gdpPercap (before scaling)
Gapminder data visualization
- add
scale_x_log10()function to adjust x and y scale.
Life expectancy vs. gdpPercap (after scaling)
Life expectancy vs. gdpPercap (after scaling and adding color)
Life expectancy vs. gdpPercap (after scaling and adding color)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size and shape)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size and shape)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size) by grouping continent (only 1952 and 2007)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)
Life expectancy vs. gdpPercap (after scaling and adding color, adding sizewith life Expectancy) by grouping continent (only 1952 and 2007)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size with life Expectancy) by grouping continent (only 1952 and 2007)
Life expectancy vs. gdpPercap (after scaling and adding color, adding size with population) by grouping continent (only 1952 and 2007))
Lecture 5: Correlation and Regression
Correlation and Regression
Correlation analysis in R (Pearson’s r, scatter plots)
Example code for calculating correlation using
cor()Simple linear regression in R
Example code for performing simple linear regression using
lm()Hands-on exercise: Perform correlation analysis and simple linear regression on a sample dataset
Correlation analysis in R
Correlation is measured the strength of relationship between two varialbes.
Correlation strength ranges from
-1 to 1.SPSS survival manual - Julian Pellet mentioned how to interpret them.
1 - perfect correlationabove 0.7 - very strong correlation0.6 to 0.69 - strong correlation0.4 to 0.59 - medium correlation0.2 to 0.39 - weak correlation0.1 to 0.19 - very week correlation (no correlation)0 - no correlation
cor()1: calculate correlation between two variables
corrplot(): visulization of correlation among variables need to install corrplot package
Example code for calculating correlation
cor(data\(x, data\)y)
cor() requires two arguments for x and y variables. Each column can be accessed by data followed by $ and by column name.
[The correlation between carat and price is: 0.92159130119347]
We can conclude that there is a very strong correlation between price and carat from diamond dataset.
Correlation with cor() on palmerpenguins
cor(penguins[3:6]) |> knitr::kable()
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
|---|---|---|---|---|
| bill_length_mm | 1.0000000 | -0.2286256 | 0.6530956 | 0.5894511 |
| bill_depth_mm | -0.2286256 | 1.0000000 | -0.5777917 | -0.4720157 |
| flipper_length_mm | 0.6530956 | -0.5777917 | 1.0000000 | 0.8729789 |
| body_mass_g | 0.5894511 | -0.4720157 | 0.8729789 | 1.0000000 |
round(cor(penguins[3:6]), 2)|> knitr::kable()
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
|---|---|---|---|---|
| bill_length_mm | 1.00 | -0.23 | 0.65 | 0.59 |
| bill_depth_mm | -0.23 | 1.00 | -0.58 | -0.47 |
| flipper_length_mm | 0.65 | -0.58 | 1.00 | 0.87 |
| body_mass_g | 0.59 | -0.47 | 0.87 | 1.00 |
Correllation with corrplot()
corrplot() function takes two arguments: data with variables, method.
corrplot(data, method)
method can be: 1. “circle”, 2. “square”, 3. “ellipse”, 4. “number”, 5. “shade”, 6. “color”, 7. “pie”.
Example code with corrplot()
corrplot() on penguin dataset
corrplot() on iris dataset
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
corrplot() on iris dataset
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | |
|---|---|---|---|---|
| Sepal.Length | 1.00 | -0.12 | 0.87 |
0.82 |
| Sepal.Width | -0.12 | 1.00 | -0.43 | -0.37 |
| Petal.Length | 0.87 |
-0.43 | 1.00 | 0.96 |
| Petal.Width | 0.82 |
-0.37 | 0.96 |
1.00 |
corrplot() on iris dataset
Final exercise for correlation
Use
mtcarsdataset forcor()function andcorrplot()functionStep 1. look at your data
Step 2. Remove
NAvalues usingna.omit()functionStep 3. Remove all charater related factors columns
search correlation among variables
Show your corrplot()
Linear Regression - Introduction
Linear regression is widely used by researchers, academia, finance, banking and business for prediction of variables.
There are many regression models and some of them are:
Simple Linear Regression Model
Multiple Linear Regression
Simple Non-Linear Regression
Multiple Non-Linear Regression
Linear Regression formula in R
lm(formula, data)
glm(formula, family, data) # for binary data
Linear Model
lm : linear model
formula : y ~ x
data : dataset
Generalized Linear Model
glm : Generalize Linear Model
formula : y ~ x
family : gaussian, bionomial or poisson
data : dataset
Linear Model formula
lm(y ~ x, data= data)
Generalize Linear Model
lm(y ~ x, family, data= data)
Some definitions
coefficients: intercept or slope termsresiduals: actual - predictedR-squared: Model PerformanceAdjusted R-squared: Model PerformanceRSE: Residual Standard ErrorRMSE: Root Mean Squared Error
Steps to analyze Linear Model
load packages -
tidyverse,modelr,broom,palmerpenguinsdatasets : mpg, iris, gapminder, penguins, mtcars, fish
take a look on the structure of data and identify
dependent variableorresponse variableandindependent variableorexplanatory variableexplore data with
glimpse(),head(),tail(),summary(),plot()to identify data types, correlation patterns among variables.start
lm()function, add formular (response ~ explanatory variable) and add dataset.assign the variable with
fitor the name you likerun lm() model
print with summary() function.
Example code
mpg data exploration
str(mtcars)
‘data.frame’: 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 …
$ cyl : num 6 6 4 6 8 6 8 4 4 6 …
$ disp: num 160 160 108 258 360 …
$ hp : num 110 110 93 110 175 105 245 62 95 123 …
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 …
$ wt : num 2.62 2.88 2.32 3.21 3.44 …
$ qsec: num 16.5 17 18.6 19.4 17 …
$ vs : num 0 0 1 1 0 1 0 1 1 1 …
$ am : num 1 1 1 0 0 0 0 0 0 0 …
$ gear: num 4 4 4 3 3 3 3 4 4 4 …
$ carb: num 4 4 1 1 2 1 4 2 2 4 …
mpg data exploration
head(mtcars)
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
mpg data exploration
tail(mtcars)
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.7 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.9 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.5 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.5 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.6 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.6 | 1 | 1 | 4 | 2 |
mpg data exploration
summary()
| mpg | cyl | disp | hp |
|---|---|---|---|
| Min. :10.40 | Min. :4.000 | Min. : 71.1 | Min. : 52.0 |
| 1st Qu.:15.43 | 1st Qu.:4.000 | 1st Qu.:120.8 | 1st Qu.: 96.5 |
| Median :19.20 | Median :6.000 | Median :196.3 | Median :123.0 |
| Mean :20.09 | Mean :6.188 | Mean :230.7 | Mean :146.7 |
| 3rd Qu.:22.80 | 3rd Qu.:8.000 | 3rd Qu.:326.0 | 3rd Qu.:180.0 |
| Max. :33.90 | Max. :8.000 | Max. :472.0 | Max. :335.0 |
| drat | wt | qsec | vs |
|---|---|---|---|
| Min. :2.760 | Min. :1.513 | Min. :14.50 | Min. :0.0000 |
| 1st Qu.:3.080 | 1st Qu.:2.581 | 1st Qu.:16.89 | 1st Qu.:0.0000 |
| Median :3.695 | Median :3.325 | Median :17.71 | Median :0.0000 |
| Mean :3.597 | Mean :3.217 | Mean :17.85 | Mean :0.4375 |
| 3rd Qu.:3.920 | 3rd Qu.:3.610 | 3rd Qu.:18.90 | 3rd Qu.:1.0000 |
| Max. :4.930 | Max. :5.424 | Max. :22.90 | Max. :1.0000 |
| am | gear | carb |
|---|---|---|
| Min. :0.0000 | Min. :3.000 | Min. :1.000 |
| 1st Qu.:0.0000 | 1st Qu.:3.000 | 1st Qu.:2.000 |
| Median :0.0000 | Median :4.000 | Median :2.000 |
| Mean :0.4062 | Mean :3.688 | Mean :2.812 |
| 3rd Qu.:1.0000 | 3rd Qu.:4.000 | 3rd Qu.:4.000 |
| Max. :1.0000 | Max. :5.000 | Max. :8.000 |
mtcars data exploration
plot(mtcars)
mpg : Miles gallon
cyl : number of cylinders
displ: engine displacement
hp: housepower
drat: Rear axle ratio
wt: weight
mtcars Simple Linear Regression
mtcars Simple Linear Regression
Call:
lm(formula = mpg ~ wt, data = mtcars)
Coefficients:
(Intercept) wt
37.285 -5.344
response = Intercept + slope * explanatory variable
1 pound weight corresponds to 37.285 + (-5.344) * 1 = 31.941 miles per gallon.
2 pound weight corresponds to 37.285 + (-5.344) * 2 = 26.597 miles per gallon.
3 pound weight corresponds to 37.285 + (-5.344) * 3 = 21.253 miles per gallon.
4 pound weight corresponds to 37.285 + (-5.344) * 4= 15.909 miles per gallon.
5 pound weight corresponds to 37.285 + (-5.344) * 5= 10.565 miles per gallon.
Interpretation and visualization
Interpretation and visualization
Interpretation and visualization (change color to local variable)
Interpretation and visualization (change color to local variable)
Interpretation and visualization (change color to local variable and method to ‘lm’)
Interpretation and visualization (change color to local variable and method to ‘lm’)
Interpreting model performance with broom package
install
broomandmodelrpackageload packages
fit the model
use
augment(),glance(),pull()pull
r.squared,adjusted.r.squared,sigma
Interpreting model performance with broom package
Linear Model exercise with palmerpenguins dataset
load required libraries
check the structure and data type of penguins dataset
identify two variables (response vs explanatory variables)
Remove NA values using
na.omit()function on penguins by overwritting the penguin datasetwrite formula lm(formula, data) and assign to a variable you like
print the fitted variable or the model
print summary() of the model
Interpret model performance and visualize the data
Lecture 6: Cluster Analysis
Cluster Analysis
Cluster analysis in R (k-means, hierarchical clustering)
Example code for performing k-means clustering using
kmeans()Example code for performing hierarchical clustering using
hclustExample code for visualizing clusters using
fviz_cluster()Hands-on exercise: Perform k-means clustering and hierarchical clustering on a sample dataset
Cluster analysis in R
- K-means clustering: partition data into k clusters based on similarity
- Hierarchical clustering: build a hierarchy of clusters by merging or splitting existing clusters
kmeans()requires two argumentsdataandcenters
Example code for performing k-means and hierachical clustering
Additional Resources
- R documentation: https://www.r-project.org/
- R tutorials: https://www.datacamp.com/tutorial/r-tutorial
- R cheat sheets: https://www.rstudio.com/online-learning/
General Tips
- Use clear and concise language in your slides
- Use images, diagrams, and charts to illustrate complex concepts
- Use code examples to demonstrate how to perform tasks in R
- Leave space for notes and comments
- Use a consistent font and color scheme throughout the presentation