1. INTRODUCTION TO R

1.1 What is R?

R is a statistical software and programming language used for:

Data entry
Data analysis
Graphs and Charts
Summary statistics
Reporting results

It is widely used in:

Research
Business
Economics
Health sciences
Social sciences

1.2 Why use R for Statistics Practical?

R helps students to:

Analyze real data
Produce tables and graphs
Compute statistics quickly
Understand statistical concepts practically
Build reproducible analyses

1.3 Installing R and RStudio

Students should install:

R -Download from: https://cran.r-project.org
RStudio - Download from: https://posit.co/download/rstudio-desktop/
RStudio Interface This is where you run code.

Source(Scripts): Write and save R code here.
Console: Execute commands interactively.
Environment: View variables, data frames, and functions in memory.
Plots: Display graphical outputs.

Install packages only once per machine: A package is a collection of functions designed to perform specific tasks.

For this course, two important packages are:

1. `ggplot2`

ggplot2 is used for drawing graphs and charts in a clear and attractive way.
It helps students create:

Bar charts
Histograms
Boxplots
Scatter plots
Line graphs

2. `dplyr`

dplyr is used for data manipulation and summarization.
It helps students to:

Select variables
Filter rows
Create new variables
Arrange data
Summarize data easily

# install.packages("ggplot2")
# install.packages("dplyr")

Load packages at the start of every new session:

# library(ggplot2)
# library(dplyr)

2. BASIC R TRAINING FOR BEGINNERS

2.1 R as a calculator

R can perform simple arithmetic.

a = 2 + 3 

b = 10 - 4 

c = 6 * 5 

d = 20 / 4

e = 2^3 

f = sqrt(49)

2.2 Assigning values to objects

In R, we store values in objects using `<-`.

x <- 10
y <- 5
x + y

[1] 15

x * y

[1] 50

You can also use `=` but `<-` is preferred.

z = 7
z

[1] 7

2.3 Creating vectors

A vector is a list of values of the same type.

scores <- c(56, 67, 45, 80, 72, 61, 59, 90)
scores

[1] 56 67 45 80 72 61 59 90

Useful commands:

length(scores)

[1] 8

sum(scores)

[1] 530

mean(scores)

[1] 66.25

min(scores)

[1] 45

max(scores)

[1] 90

sort(scores)

[1] 45 56 59 61 67 72 80 90

2.4 Types of data in R

R handles different kinds of data:

Numeric data

age <- c(18, 19, 20, 21)

Character data

names <- c("Ama", "Kojo", "Esi", "Yaw")

Logical data

passed <- c(TRUE, FALSE, TRUE, TRUE)

Factor (categorical data)

gender <- factor(c("Male", "Female", "Female", "Male"))
gender

[1] Male   Female Female Male  
Levels: Female Male

2.5 Creating a data frame

A data frame is like a spreadsheet table.

students <- data.frame(
  Name = c("Ama", "Kojo", "Esi", "Yaw", "Akosua"),
  Age = c(19, 20, 18, 21, 19),
  Gender = c("Female", "Male", "Female", "Male", "Female"),
  Score = c(78, 65, 88, 70, 90)
)

students

    Name Age Gender Score
1    Ama  19 Female    78
2   Kojo  20   Male    65
3    Esi  18 Female    88
4    Yaw  21   Male    70
5 Akosua  19 Female    90

Check the structure:

str(students)

'data.frame':   5 obs. of  4 variables:
 $ Name  : chr  "Ama" "Kojo" "Esi" "Yaw" ...
 $ Age   : num  19 20 18 21 19
 $ Gender: chr  "Female" "Male" "Female" "Male" ...
 $ Score : num  78 65 88 70 90

summary(students)

     Name                Age          Gender              Score     
 Length:5           Min.   :18.0   Length:5           Min.   :65.0  
 Class :character   1st Qu.:19.0   Class :character   1st Qu.:70.0  
 Mode  :character   Median :19.0   Mode  :character   Median :78.0  
                    Mean   :19.4                      Mean   :78.2  
                    3rd Qu.:20.0                      3rd Qu.:88.0  
                    Max.   :21.0                      Max.   :90.0

2.6 Viewing parts of data

students$Score

[1] 78 65 88 70 90

students$Name

[1] "Ama"    "Kojo"   "Esi"    "Yaw"    "Akosua"

students[1, ]      # first row

  Name Age Gender Score
1  Ama  19 Female    78

students[, 2]      # second column

[1] 19 20 18 21 19

students[1:3, ]    # first three rows

  Name Age Gender Score
1  Ama  19 Female    78
2 Kojo  20   Male    65
3  Esi  18 Female    88

3. DATA COLLECTION AND SURVEYS

3.1 Meaning of data

Data are facts, figures, or observations collected for analysis.

Examples:

Age of students
Test scores
Gender
Height
Income
Marital status

3.2 Sources of data

Primary data: collected directly by the researcher
Secondary data: obtained from existing sources

3.3 Methods of data collection

Questionnaires
Interviews
Observation
Experiments
Administrative records

3.4 Surveys

A survey is a method of collecting information from individuals.

Types of surveys

Census
Sample survey

Sampling methods

Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Convenience sampling

3.5 Entering survey data into R

Example survey data:

survey <- data.frame(
  Gender = c("Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female", "Female", "Male"),
  Age = c(20, 19, 21, 22, 20, 18, 23, 19, 20, 21),
  StudyHours = c(3, 4, 5, 2, 6, 4, 3, 5, 4, 2),
  Satisfaction = c("High", "Medium", "High", "Low", "High", "Medium", "Low", "High", "Medium", "Low")
)

survey

   Gender Age StudyHours Satisfaction
1    Male  20          3         High
2  Female  19          4       Medium
3  Female  21          5         High
4    Male  22          2          Low
5    Male  20          6         High
6  Female  18          4       Medium
7    Male  23          3          Low
8  Female  19          5         High
9  Female  20          4       Medium
10   Male  21          2          Low

Check summary:

summary(survey)

    Gender               Age          StudyHours   Satisfaction      
 Length:10          Min.   :18.00   Min.   :2.00   Length:10         
 Class :character   1st Qu.:19.25   1st Qu.:3.00   Class :character  
 Mode  :character   Median :20.00   Median :4.00   Mode  :character  
                    Mean   :20.30   Mean   :3.80                     
                    3rd Qu.:21.00   3rd Qu.:4.75                     
                    Max.   :23.00   Max.   :6.00

4. DATA PRESENTATION

4.1 Types of data

1. Qualitative (Categorical) data

These describe qualities or categories.

Examples:

Gender
Religion
Marital status
Blood group

2. Quantitative (Numerical) data

These are numbers.

(a) Discrete data

Countable values

Examples:

Number of children
Number of cars
Number of students

(b) Continuous data

Measured values

Examples:

Height
Weight
Temperature
Time

4.2 Frequency distributions and tabulation

Example 1: Frequency table for categorical data

gender <- c("Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female")
table(gender)

gender
Female   Male 
     4      4

Relative frequency:

prop.table(table(gender))

gender
Female   Male 
   0.5    0.5

Percentage frequency:

prop.table(table(gender)) * 100

gender
Female   Male 
    50     50

Example 2: Frequency table for numerical data

scores <- c(45, 50, 55, 60, 60, 65, 70, 70, 70, 75, 80, 85, 90)

table(scores)

scores
45 50 55 60 65 70 75 80 85 90 
 1  1  1  2  1  3  1  1  1  1

Grouped frequency distribution

Suppose we want class intervals.

marks <- c(34, 45, 56, 67, 78, 89, 90, 43, 55, 61, 73, 84, 39, 48, 52)

classes <- cut(marks,
               breaks = c(30, 40, 50, 60, 70, 80, 90),
               right = FALSE)

table(classes)

classes
[30,40) [40,50) [50,60) [60,70) [70,80) [80,90) 
      2       3       3       2       2       2

4.3 Cross-tabulation

Used to summarize two categorical variables.

table(survey$Gender, survey$Satisfaction)

        
         High Low Medium
  Female    2   0      3
  Male      2   3      0

Add proportions:

prop.table(table(survey$Gender, survey$Satisfaction))

        
         High Low Medium
  Female  0.2 0.0    0.3
  Male    0.2 0.3    0.0

Row proportions:

prop.table(table(survey$Gender, survey$Satisfaction), 1)

        
         High Low Medium
  Female  0.4 0.0    0.6
  Male    0.4 0.6    0.0

Column proportions:

prop.table(table(survey$Gender, survey$Satisfaction), 2)

        
         High Low Medium
  Female  0.5 0.0    1.0
  Male    0.5 1.0    0.0

5. GRAPHICAL REPRESENTATION OF DATA

5.1 Bar chart

Suitable for categorical data.

gender_tab <- table(survey$Gender)
barplot(gender_tab,
        main = "Bar Chart of Gender",
        xlab = "Gender",
        ylab = "Frequency",
        col = c("skyblue", "pink"))

5.2 Pie chart

pie(gender_tab,
    main = "Pie Chart of Gender",
    col = c("skyblue", "pink"))

5.3 Histogram

Suitable for continuous numerical data.

hist(survey$Age,
     main = "Histogram of Age",
     xlab = "Age",
     col = "lightgreen",
     border = "black")

5.4 Frequency polygon

hist(survey$Age, plot = FALSE)

$breaks
[1] 18 19 20 21 22 23

$counts
[1] 3 3 2 1 1

$density
[1] 0.3 0.3 0.2 0.1 0.1

$mids
[1] 18.5 19.5 20.5 21.5 22.5

$xname
[1] "survey$Age"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

h <- hist(survey$Age, plot = FALSE)
plot(h$mids, h$counts, type = "b",
     main = "Frequency Polygon of Age",
     xlab = "Age",
     ylab = "Frequency",
     col = "blue")

5.5 Boxplot

Useful for showing spread and outliers.

boxplot(survey$StudyHours,
        main = "Boxplot of Study Hours",
        ylab = "Hours",
        col = "orange")

Compare groups:

boxplot(Score ~ Gender, data = students,
        main = "Scores by Gender",
        xlab = "Gender",
        ylab = "Score",
        col = c("pink", "lightblue"))

5.6 Stem-and-leaf plot

stem(scores)


  The decimal point is 1 digit(s) to the right of the |

  4 | 5
  5 | 05
  6 | 005
  7 | 0005
  8 | 05
  9 | 0

5.7 Scatter plot

For two numerical variables.

plot(survey$Age, survey$StudyHours,
     main = "Scatter Plot of Age and Study Hours",
     xlab = "Age",
     ylab = "Study Hours",
     pch = 19,
     col = "red")

6. PRINCIPLES OF EFFECTIVE DATA VISUALIZATION

Students should learn the following:

Give every graph a clear title
Label axes properly
Use appropriate graph for the data type
Avoid too many colors
Keep graphs simple and readable
Show units where necessary
Avoid misleading scales
Use legends when needed

Example of a well-labeled plot

hist(marks,
     main = "Distribution of Students' Marks",
     xlab = "Marks",
     ylab = "Frequency",
     col = "lightblue",
     border = "black")

7. MEASURES OF CENTRAL TENDENCY

Measures of central tendency describe the center of the data.

7.1 Mean

The arithmetic average.

scores <- c(56, 67, 45, 80, 72, 61, 59, 90)
mean(scores)

[1] 66.25

7.2 Median

The middle value after arranging data.

scores <- c(56, 67, 45, 80, 72, 61, 59, 90)
median(scores)

[1] 64

7.3 Mode

R has no built-in mode for statistical mode, but we can define it.

get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

data1 <- c(2, 4, 4, 5, 6, 7, 4, 8)
get_mode(data1)

[1] 4

7.4 Comparing mean, median, and mode

data2 <- c(10, 12, 14, 15, 15, 16, 18, 100)

mean(data2)

[1] 25

median(data2)

[1] 15

get_mode(data2)

[1] 15

This helps show the effect of an outlier.

8. MEASURES OF DISPERSION

These describe how spread out the data are.

8.1 Range

range(scores)

[1] 45 90

max(scores) - min(scores)

[1] 45

8.2 Variance

var(scores)

[1] 203.3571

8.3 Standard deviation

sd(scores)

[1] 14.26033

8.4 Interquartile range (IQR)

IQR(scores)

[1] 15.75

8.5 Five-number summary

scores <- c(56, 67, 45, 80, 72, 61, 59, 90)
fivenum(scores)

[1] 45.0 57.5 64.0 76.0 90.0

summary(scores)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  45.00   58.25   64.00   66.25   74.00   90.00

9. SHAPE OF DISTRIBUTIONS

The shape of a distribution can be:

symmetric
positively skewed (right-skewed)
negatively skewed (left-skewed)

9.1 Visual inspection using histogram

hist(data2,
     main = "Histogram of Data2",
     xlab = "Values",
     col = "purple")

Because of the value 100, the distribution is positively skewed.

9.2 Skewness idea using mean and median

If mean > median, distribution may be right-skewed
If mean < median, distribution may be left-skewed
If mean ≈ median, distribution may be symmetric

Example:

mean(data2)

[1] 25

median(data2)

[1] 15

10. APPLICATIONS IN SUMMARIZING REAL DATA

We now analyze a small realistic dataset.

class_data <- data.frame(
  Student = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
  Gender = c("Male", "Female", "Female", "Male", "Female", "Male", "Male", "Female", "Female", "Male"),
  Age = c(18, 19, 18, 20, 21, 19, 20, 18, 19, 21),
  Score = c(65, 78, 82, 55, 91, 73, 68, 84, 79, 60)
)

class_data

   Student Gender Age Score
1        A   Male  18    65
2        B Female  19    78
3        C Female  18    82
4        D   Male  20    55
5        E Female  21    91
6        F   Male  19    73
7        G   Male  20    68
8        H Female  18    84
9        I Female  19    79
10       J   Male  21    60

10.1 Summary statistics

summary(class_data)

   Student             Gender               Age            Score      
 Length:10          Length:10          Min.   :18.00   Min.   :55.00  
 Class :character   Class :character   1st Qu.:18.25   1st Qu.:65.75  
 Mode  :character   Mode  :character   Median :19.00   Median :75.50  
                                       Mean   :19.30   Mean   :73.50  
                                       3rd Qu.:20.00   3rd Qu.:81.25  
                                       Max.   :21.00   Max.   :91.00

mean(class_data$Score)

[1] 73.5

median(class_data$Score)

[1] 75.5

sd(class_data$Score)

[1] 11.38469

var(class_data$Score)

[1] 129.6111

IQR(class_data$Score)

[1] 15.5

10.2 Frequency table for gender

table(class_data$Gender)


Female   Male 
     5      5

prop.table(table(class_data$Gender)) * 100


Female   Male 
    50     50

10.3 Graphs

Bar chart of gender

barplot(table(class_data$Gender),
        main = "Gender Distribution",
        xlab = "Gender",
        ylab = "Frequency",
        col = c("pink", "lightblue"))

Histogram of scores

hist(class_data$Score,
     main = "Distribution of Scores",
     xlab = "Score",
     col = "gold",
     border = "black")

Boxplot of scores

boxplot(class_data$Score,
        main = "Boxplot of Scores",
        ylab = "Score",
        col = "cyan")

10.4 Group comparison

Compare scores by gender:

aggregate(Score ~ Gender, data = class_data, mean)

  Gender Score
1 Female  82.8
2   Male  64.2

aggregate(Score ~ Gender, data = class_data, median)

  Gender Score
1 Female    82
2   Male    65

aggregate(Score ~ Gender, data = class_data, sd)

  Gender    Score
1 Female 5.167204
2   Male 6.978539

Boxplot by gender:

boxplot(Score ~ Gender, data = class_data,
        main = "Scores by Gender",
        xlab = "Gender",
        ylab = "Score",
        col = c("pink", "lightblue"))

11. MISSING VALUES IN R

Sometimes data contain missing values written as `NA`.

x <- c(12, 15, NA, 20, 18)

mean(x)

[1] NA

mean(x, na.rm = TRUE)

[1] 16.25

sum(x, na.rm = TRUE)

[1] 65

Important: remind students to use `na.rm = TRUE` when necessary.

12. SIMPLE CLASS PRACTICAL EXERCISES

Exercise 1: Create a vector

Create a vector containing the ages: 18, 19, 20, 18, 21, 22, 19, 20

Tasks:

find the mean
find the median
find the range
find the standard deviation

ages <- c(18, 19, 20, 18, 21, 22, 19, 20)

mean(ages)

[1] 19.625

median(ages)

[1] 19.5

range(ages)

[1] 18 22

sd(ages)

[1] 1.407886

Exercise 2: Frequency table

Use the following data on preferred drink:

Tea, Coffee, Tea, Juice, Coffee, Tea, Juice, Tea

drink <- c("Tea", "Coffee", "Tea", "Juice", "Coffee", "Tea", "Juice", "Tea")

table(drink)

drink
Coffee  Juice    Tea 
     2      2      4

prop.table(table(drink)) * 100

drink
Coffee  Juice    Tea 
    25     25     50

barplot(table(drink), col = c("brown", "orange", "green"))

pie(table(drink), col = c("brown", "orange", "green"))

Exercise 3: Histogram and boxplot

heights <- c(150, 155, 160, 162, 158, 170, 172, 168, 165, 159)

hist(heights,
     main = "Histogram of Heights",
     xlab = "Height (cm)",
     col = "lightblue")

boxplot(heights,
        main = "Boxplot of Heights",
        ylab = "Height (cm)",
        col = "lightgreen")

Exercise 4: Data frame practice

mydata <- data.frame(
  Name = c("John", "Mary", "Peter", "Linda"),
  Age = c(20, 21, 19, 22),
  Score = c(75, 88, 67, 90)
)

mydata

   Name Age Score
1  John  20    75
2  Mary  21    88
3 Peter  19    67
4 Linda  22    90

summary(mydata)

     Name                Age            Score     
 Length:4           Min.   :19.00   Min.   :67.0  
 Class :character   1st Qu.:19.75   1st Qu.:73.0  
 Mode  :character   Median :20.50   Median :81.5  
                    Mean   :20.50   Mean   :80.0  
                    3rd Qu.:21.25   3rd Qu.:88.5  
                    Max.   :22.00   Max.   :90.0

mean(mydata$Score)

[1] 80

13. INTRODUCTION TO SOME USEFUL R COMMANDS

help(mean)

starting httpd help server ... done

?mean
ls()          # list objects in memory

 [1] "a"          "age"        "ages"       "b"          "c"         
 [6] "class_data" "classes"    "d"          "data1"      "data2"     
[11] "drink"      "e"          "f"          "gender"     "gender_tab"
[16] "get_mode"   "h"          "heights"    "marks"      "mydata"    
[21] "names"      "passed"     "scores"     "students"   "survey"    
[26] "x"          "y"          "z"

rm(x)         # remove object x
rm(list = ls())   # remove all objects

Get working directory:

## getwd()

Set working directory:

## setwd("C:/Users/YourName/Documents")

14. IMPORTING DATA FROM CSV

If students have data in Excel, save it as CSV first.

# data <- read.csv("students.csv")
# head(data)
# str(data)
# summary(data)

15. SIMPLE TEACHING FLOW FOR BEGINNERS

A practical training can follow this order:

Week 1: Introduction to R

opening RStudio
arithmetic operations
assigning objects
creating vectors

Week 2: Data types and data frames

numeric and categorical data
factors
data frames
summary()

Week 3: Frequency tables and tabulation

table()
prop.table()
cross-tabulation

Week 4: Graphs

bar charts
pie charts
histograms
boxplots
scatter plots

Week 5: Descriptive statistics

mean
median
mode
range
variance
standard deviation
IQR

Week 6: Real data applications

importing CSV files
summarizing data
interpreting results

16. SAMPLE INTERPRETATION OF RESULTS

Suppose:

scores <- c(56, 67, 45, 80, 72, 61, 59, 90)
mean(scores)

[1] 66.25

median(scores)

[1] 64

sd(scores)

[1] 14.26033

Possible interpretation:

The mean score represents the average performance of students.
The median score gives the middle score and is less affected by extreme values.
The standard deviation shows how much student scores vary around the mean.
A small standard deviation means scores are close together.
A large standard deviation means scores are widely spread.

17. VERY IMPORTANT BEGINNER TIPS

R is case-sensitive score and Score are different.
Always use parentheses correctly
Check spelling of variable names
Use str() and summary() to inspect data
Save your script regularly
Comment your code with #

Example:

# Calculate mean score
mean(scores)

[1] 66.25

18. A COMPLETE BEGINNER EXAMPLE IN R

Students can copy and run this full example:

# Create dataset
students <- data.frame(
  Name = c("Ama", "Kojo", "Esi", "Yaw", "Akosua", "Kofi", "Abena", "Kwame"),
  Gender = c("Female", "Male", "Female", "Male", "Female", "Male", "Female", "Male"),
  Age = c(19, 20, 18, 21, 19, 22, 20, 21),
  Score = c(78, 65, 88, 70, 90, 55, 84, 73)
)

# View data
students

    Name Gender Age Score
1    Ama Female  19    78
2   Kojo   Male  20    65
3    Esi Female  18    88
4    Yaw   Male  21    70
5 Akosua Female  19    90
6   Kofi   Male  22    55
7  Abena Female  20    84
8  Kwame   Male  21    73

# Structure and summary
str(students)

'data.frame':   8 obs. of  4 variables:
 $ Name  : chr  "Ama" "Kojo" "Esi" "Yaw" ...
 $ Gender: chr  "Female" "Male" "Female" "Male" ...
 $ Age   : num  19 20 18 21 19 22 20 21
 $ Score : num  78 65 88 70 90 55 84 73

summary(students)

     Name              Gender               Age         Score      
 Length:8           Length:8           Min.   :18   Min.   :55.00  
 Class :character   Class :character   1st Qu.:19   1st Qu.:68.75  
 Mode  :character   Mode  :character   Median :20   Median :75.50  
                                       Mean   :20   Mean   :75.38  
                                       3rd Qu.:21   3rd Qu.:85.00  
                                       Max.   :22   Max.   :90.00

# Frequency table for gender
table(students$Gender)


Female   Male 
     4      4

prop.table(table(students$Gender)) * 100


Female   Male 
    50     50

# Measures of central tendency for scores
mean(students$Score)

[1] 75.375

median(students$Score)

[1] 75.5

# Mode function
get_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
get_mode(students$Score)

[1] 78

# Measures of dispersion
range(students$Score)

[1] 55 90

sd(students$Score)

[1] 12.02304

var(students$Score)

[1] 144.5536

IQR(students$Score)

[1] 16.25

# Graphs
barplot(table(students$Gender),
        main = "Gender Distribution",
        col = c("pink", "lightblue"))

hist(students$Score,
     main = "Histogram of Scores",
     xlab = "Score",
     col = "lightgreen")

boxplot(students$Score,
        main = "Boxplot of Scores",
        ylab = "Score",
        col = "orange")

boxplot(Score ~ Gender, data = students,
        main = "Scores by Gender",
        xlab = "Gender",
        ylab = "Score",
        col = c("pink", "lightblue"))

19. SUGGESTED PRACTICAL QUESTIONS FOR LEARNERS

Enter a dataset of 10 students with variables:

Name
Age
Gender
Test score

Produce:

frequency table for gender
bar chart for gender
histogram for test scores
boxplot for test scores

Calculate:

mean
median
mode
range
variance
standard deviation
IQR

Interpret the results.

20. CONCLUSION

Using R in introductory statistics practicals helps students move from theory to practice. With simple commands, they can:

Organize data
Summarize data
Visualize data
Interpret statistical measures

For beginners, start with:

Vectors
Data frames
Tables
Charts
Mean, Median, and Standard deviation

1. INTRODUCTION TO R

1.1 What is R?

1.2 Why use R for Statistics Practical?

1.3 Installing R and RStudio

1. ggplot2

2. dplyr

2. BASIC R TRAINING FOR BEGINNERS

2.1 R as a calculator

2.2 Assigning values to objects

2.3 Creating vectors

2.4 Types of data in R

Numeric data

Character data

Logical data

Factor (categorical data)

2.5 Creating a data frame

2.6 Viewing parts of data

3. DATA COLLECTION AND SURVEYS

3.1 Meaning of data

3.2 Sources of data

3.3 Methods of data collection

3.4 Surveys

Types of surveys

Sampling methods

3.5 Entering survey data into R

4. DATA PRESENTATION

4.1 Types of data

1. Qualitative (Categorical) data

2. Quantitative (Numerical) data

(a) Discrete data

(b) Continuous data

4.2 Frequency distributions and tabulation

Example 1: Frequency table for categorical data

Example 2: Frequency table for numerical data

Grouped frequency distribution

4.3 Cross-tabulation

5. GRAPHICAL REPRESENTATION OF DATA

5.1 Bar chart

5.2 Pie chart

5.3 Histogram

5.4 Frequency polygon

5.5 Boxplot

5.6 Stem-and-leaf plot

5.7 Scatter plot

6. PRINCIPLES OF EFFECTIVE DATA VISUALIZATION

Example of a well-labeled plot

7. MEASURES OF CENTRAL TENDENCY

7.1 Mean

7.2 Median

7.3 Mode

7.4 Comparing mean, median, and mode

8. MEASURES OF DISPERSION

8.1 Range

8.2 Variance

8.3 Standard deviation

8.4 Interquartile range (IQR)

8.5 Five-number summary

9. SHAPE OF DISTRIBUTIONS

9.1 Visual inspection using histogram

9.2 Skewness idea using mean and median

10. APPLICATIONS IN SUMMARIZING REAL DATA

10.1 Summary statistics

10.2 Frequency table for gender

10.3 Graphs

Bar chart of gender

Histogram of scores

Boxplot of scores

10.4 Group comparison

11. MISSING VALUES IN R

12. SIMPLE CLASS PRACTICAL EXERCISES

Exercise 1: Create a vector

Exercise 2: Frequency table

Exercise 3: Histogram and boxplot

Exercise 4: Data frame practice

13. INTRODUCTION TO SOME USEFUL R COMMANDS

14. IMPORTING DATA FROM CSV

15. SIMPLE TEACHING FLOW FOR BEGINNERS

Week 1: Introduction to R

Week 2: Data types and data frames

Week 3: Frequency tables and tabulation

1. `ggplot2`

2. `dplyr`