1. install and load necessary packaages

setting a working directory

import data to R

creating basic plot

#facet make use of the ~ operator

#facet make use of the ~ operator

## NULL

BAR PLOT OF GENDER

#histogram of income

#boxplot of ‘income’

#COMBINING PLOT

another session

## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
## Loading required package: RColorBrewer
##     Region Year Month     Age Intervention_Type Total Positive Incidence_Rate
## 1 Low Risk 2022   Sep 5 to 14      Vaccinations    25        1          0.038
## 2  Coastal 2022   May 5 to 14      Vaccinations   380      158          0.416
## 3 Highland 2022   Jul 5 to 14          Bed nets   271      103          0.379
## 4 Low Risk 2022   Dec 5 to 14      Vaccinations    26        1          0.037
## 5 Seasonal 2022   Jul 5 to 14              None   275      124          0.450
## 6 Highland 2022   Apr 5 to 14              None    60       27          0.442
##    Latitude Longitude period
## 1  1.398151  38.45110      1
## 2 -3.858256  40.21352      2
## 3  0.539970  35.55555      3
## 4  1.355270  38.20233      4
## 5  0.451452  36.96706      5
## 6  0.498972  35.98203      6
##  [1] "Region"            "Year"              "Month"            
##  [4] "Age"               "Intervention_Type" "Total"            
##  [7] "Positive"          "Incidence_Rate"    "Latitude"         
## [10] "Longitude"         "period"
## 'data.frame':    500 obs. of  11 variables:
##  $ Region           : chr  "Low Risk" "Coastal" "Highland" "Low Risk" ...
##  $ Year             : int  2022 2022 2022 2022 2022 2022 2022 2022 2022 2022 ...
##  $ Month            : chr  "Sep" "May" "Jul" "Dec" ...
##  $ Age              : chr  "5 to 14" "5 to 14" "5 to 14" "5 to 14" ...
##  $ Intervention_Type: chr  "Vaccinations" "Vaccinations" "Bed nets" "Vaccinations" ...
##  $ Total            : int  25 380 271 26 275 60 420 106 14 452 ...
##  $ Positive         : int  1 158 103 1 124 27 95 28 0 197 ...
##  $ Incidence_Rate   : num  0.038 0.416 0.379 0.037 0.45 0.442 0.226 0.265 0.029 0.436 ...
##  $ Latitude         : num  1.398 -3.858 0.54 1.355 0.451 ...
##  $ Longitude        : num  38.5 40.2 35.6 38.2 37 ...
##  $ period           : int  1 2 3 4 5 6 7 8 9 10 ...
##     Region               Year         Month               Age           
##  Length:500         Min.   :2022   Length:500         Length:500        
##  Class :character   1st Qu.:2022   Class :character   Class :character  
##  Mode  :character   Median :2023   Mode  :character   Mode  :character  
##                     Mean   :2023                                        
##                     3rd Qu.:2025                                        
##                     Max.   :2025                                        
##  Intervention_Type      Total           Positive      Incidence_Rate  
##  Length:500         Min.   : 10.00   Min.   :  0.00   Min.   :0.0100  
##  Class :character   1st Qu.: 91.75   1st Qu.: 17.00   1st Qu.:0.1015  
##  Mode  :character   Median :212.50   Median : 48.00   Median :0.2285  
##                     Mean   :225.94   Mean   : 61.44   Mean   :0.2331  
##                     3rd Qu.:357.50   3rd Qu.: 95.25   3rd Qu.:0.3700  
##                     Max.   :499.00   Max.   :226.00   Max.   :0.4980  
##     Latitude         Longitude         period      
##  Min.   :-4.4967   Min.   :34.03   Min.   : 1.000  
##  1st Qu.:-0.3005   1st Qu.:35.68   1st Qu.: 3.000  
##  Median : 0.3778   Median :36.41   Median : 6.000  
##  Mean   :-0.3074   Mean   :36.97   Mean   : 6.468  
##  3rd Qu.: 0.7547   3rd Qu.:38.26   3rd Qu.: 9.000  
##  Max.   : 1.9860   Max.   :40.50   Max.   :12.000
## Warning: package 'ggplot2' is in use and will not be installed

#data visualisation

#change the colour of the point to my choice

#colour point in the plot by region

use geom hist_histogram

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#change the colour

use geom_bar

#changing colour

#adding tittles and label for better readability #using geom_boxplot

#BOX PLOT #FACET

##  [1] "Region"            "Year"              "Month"            
##  [4] "Age"               "Intervention_Type" "Total"            
##  [7] "Positive"          "Incidence_Rate"    "Latitude"         
## [10] "Longitude"         "period"

##facet

## Saving 7 x 5 in image

#violin

#add dynamic animation to your visualisation #load required libraries

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#add animate with gganimate

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#add animation by gganimate

#save or dispalay animation

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Load necessary libraries

Sample text data

Step 1: Create a Corpus

Step 2: Preprocess the text

## Warning in tm_map.SimpleCorpus(corpus, content_transformer(tolower)):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, removeNumbers): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, removeWords, stopwords("en")):
## transformation drops documents

Step 3: Create a Term-Document Matrix

Step 4: Calculate word frequencies

Step 5: Generate a Word Cloud

Optional: Visualize word frequencies as a bar plot