Analysis Of Dummy Variables

A generalized linear model analysis was conducted on a dummy data, that entails the response to a given drug

Importing Dummy data from the excel file and reading it in R

library(readr)
Dummy_2_ <- read_csv("Dummy (2).csv")

## New names:
## Rows: 15 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): dose dbl (5): ...1, person, symptoms, dummy1, dummy2
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

View(Dummy_2_)

From the above data we can easily tell number of variables we’ve e.g Person, Dose, Symptoms, Dummy1 and Dummy2.

We therefore can explore our dataset after running some important libraries that we shall use.

library(ggplot2) #Enhances plots
library(granova)

## Loading required package: car

## Loading required package: carData

library(Rcmdr) #Then close the R commander window that pops

## Loading required package: splines

## Loading required package: RcmdrMisc

## Loading required package: sandwich

## Loading required package: effects

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

## The Commander GUI is launched only in interactive sessions

## 
## Attaching package: 'Rcmdr'

## The following object is masked from 'package:base':
## 
##     errorCondition

library(car) #Used for Levene's test
library(pastecs)
library(multcomp) #Used for post hoc test

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

library(compute.es) #Used for effect size
library(WRS2) #Tests for Robust
library(multcompView)
library(fastDummies)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:MASS':
## 
##     select

## The following objects are masked from 'package:pastecs':
## 
##     first, last

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(flextable)

Freely can therefore explore our dataset for easy visualization

head(Dummy_2_) #Shows the first six rows of our dataset

## # A tibble: 6 × 6
##    ...1 person dose     symptoms dummy1 dummy2
##   <dbl>  <dbl> <chr>       <dbl>  <dbl>  <dbl>
## 1     1      1 Placebo         3      0      0
## 2     2      2 Placebo         2      0      0
## 3     3      3 Placebo         1      0      0
## 4     4      4 Placebo         1      0      0
## 5     5      5 Placebo         4      0      0
## 6     6      6 Low Dose        5      0      1

tail(Dummy_2_) #Illustrates the last six rows of our dataset

## # A tibble: 6 × 6
##    ...1 person dose      symptoms dummy1 dummy2
##   <dbl>  <dbl> <chr>        <dbl>  <dbl>  <dbl>
## 1    10     10 Low Dose         3      0      1
## 2    11     11 High Dose        7      1      0
## 3    12     12 High Dose        4      1      0
## 4    13     13 High Dose        5      1      0
## 5    14     14 High Dose        3      1      0
## 6    15     15 High Dose        6      1      0

dim(Dummy_2_) #Tells the total number of rows and columns

## [1] 15  6

summary(Dummy_2_) #Displays the more frequently properties of the data

##       ...1          person         dose              symptoms    
##  Min.   : 1.0   Min.   : 1.0   Length:15          Min.   :1.000  
##  1st Qu.: 4.5   1st Qu.: 4.5   Class :character   1st Qu.:2.000  
##  Median : 8.0   Median : 8.0   Mode  :character   Median :3.000  
##  Mean   : 8.0   Mean   : 8.0                      Mean   :3.467  
##  3rd Qu.:11.5   3rd Qu.:11.5                      3rd Qu.:4.500  
##  Max.   :15.0   Max.   :15.0                      Max.   :7.000  
##      dummy1           dummy2      
##  Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000  
##  Mean   :0.3333   Mean   :0.3333  
##  3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000

str(Dummy_2_) #Further describes more features of the dataset

## spc_tbl_ [15 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ...1    : num [1:15] 1 2 3 4 5 6 7 8 9 10 ...
##  $ person  : num [1:15] 1 2 3 4 5 6 7 8 9 10 ...
##  $ dose    : chr [1:15] "Placebo" "Placebo" "Placebo" "Placebo" ...
##  $ symptoms: num [1:15] 3 2 1 1 4 5 2 4 2 3 ...
##  $ dummy1  : num [1:15] 0 0 0 0 0 0 0 0 0 0 ...
##  $ dummy2  : num [1:15] 0 0 0 0 0 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ...1 = col_double(),
##   ..   person = col_double(),
##   ..   dose = col_character(),
##   ..   symptoms = col_double(),
##   ..   dummy1 = col_double(),
##   ..   dummy2 = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Simple Anova

Suppose we tested the hypothesis that a new drug is superior by taking three groups of participants and administering one group with a Placebo(Sugar pill), one group with a low dose of drug and one with a high dose.

Therefore,we first form a linear model from our data as follows:

person<-1:15
person

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

symptoms<-c(3,2,1,1,4,5,2,4,2,3,7,4,5,3,6)
symptoms

##  [1] 3 2 1 1 4 5 2 4 2 3 7 4 5 3 6

dose<-gl(3,5, labels = c("Placebo","Low dose","High dose")) #Generate levels
dose

##  [1] Placebo   Placebo   Placebo   Placebo   Placebo   Low dose  Low dose 
##  [8] Low dose  Low dose  Low dose  High dose High dose High dose High dose
## [15] High dose
## Levels: Placebo Low dose High dose

By converting the above variables into factor factor levels we obtain,

factor(dose)

##  [1] Placebo   Placebo   Placebo   Placebo   Placebo   Low dose  Low dose 
##  [8] Low dose  Low dose  Low dose  High dose High dose High dose High dose
## [15] High dose
## Levels: Placebo Low dose High dose

Creating a matrix form like table, to represent our dataset

DrugData<-data.frame(person,dose,symptoms)
DrugData

##    person      dose symptoms
## 1       1   Placebo        3
## 2       2   Placebo        2
## 3       3   Placebo        1
## 4       4   Placebo        1
## 5       5   Placebo        4
## 6       6  Low dose        5
## 7       7  Low dose        2
## 8       8  Low dose        4
## 9       9  Low dose        2
## 10     10  Low dose        3
## 11     11 High dose        7
## 12     12 High dose        4
## 13     13 High dose        5
## 14     14 High dose        3
## 15     15 High dose        6

Therefore, we decided to create dummy variables to help us the categorical data representation. Where we purpose to avoid collinearity within our variables.

Collinearity reprents two or more independent variables in a regression model which are highly correlated.

Therefore we can plot some graphs to help us more analyze our dataset

line <- ggplot(DrugData, aes(dose, symptoms))
line + stat_summary(fun = mean, geom = "line", size = 1, aes(group=1), colour = "#FF6633") + stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2, size = 0.75, colour = "#990000") + stat_summary(fun = mean, geom = "point", size = 4, colour = "#990000") + stat_summary(fun = mean, geom = "point", size = 3, colour = "#FF6633") + labs(x = "Dose of Drug", y = "Mean symptoms")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Descriptive statistics data

by(DrugData$symptoms, DrugData$dose, stat.desc)

## DrugData$dose: Placebo
##      nbr.val     nbr.null       nbr.na          min          max        range 
##    5.0000000    0.0000000    0.0000000    1.0000000    4.0000000    3.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##   11.0000000    2.0000000    2.2000000    0.5830952    1.6189318    1.7000000 
##      std.dev     coef.var 
##    1.3038405    0.5926548 
## ------------------------------------------------------------ 
## DrugData$dose: Low dose
##      nbr.val     nbr.null       nbr.na          min          max        range 
##    5.0000000    0.0000000    0.0000000    2.0000000    5.0000000    3.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##   16.0000000    3.0000000    3.2000000    0.5830952    1.6189318    1.7000000 
##      std.dev     coef.var 
##    1.3038405    0.4074502 
## ------------------------------------------------------------ 
## DrugData$dose: High dose
##      nbr.val     nbr.null       nbr.na          min          max        range 
##    5.0000000    0.0000000    0.0000000    3.0000000    7.0000000    4.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##   25.0000000    5.0000000    5.0000000    0.7071068    1.9632432    2.5000000 
##      std.dev     coef.var 
##    1.5811388    0.3162278

Testing the assumptions of Anova

1 Levene test

Null_Hypothesis: The variance of residuals of the data are homogeneous

Alt_Hypothesis: The variance of residuals of the data are not homogeneous

leveneTest(DrugData$symptoms, DrugData$dose, center = median)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.1176   0.89
##       12

Since our p_value=0.89>0.05, we reject null_hypothesis and conclude that there is homogeneity with variances.

2 Shapiro-Wilk Normality Test

Null_Hypothesis: Residuals of the data have a normal distribution.

Alt_Hypothesis: Residuals of the data is not normally distributed.

shapiro_test <- by(DrugData$symptoms, DrugData$dose, shapiro.test)
shapiro_test

## DrugData$dose: Placebo
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.90202, p-value = 0.4211
## 
## ------------------------------------------------------------ 
## DrugData$dose: Low dose
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.90202, p-value = 0.4211
## 
## ------------------------------------------------------------ 
## DrugData$dose: High dose
## 
##  Shapiro-Wilk normality test
## 
## data:  dd[x, ]
## W = 0.98676, p-value = 0.9672

Analysis Of Dummy Variables

OLWALO ABRAHAM

2025-12-22

A generalized linear model analysis was conducted on a dummy data, that entails the response to a given drug

Importing Dummy data from the excel file and reading it in R

From the above data we can easily tell number of variables we’ve e.g Person, Dose, Symptoms, Dummy1 and Dummy2.

We therefore can explore our dataset after running some important libraries that we shall use.

Freely can therefore explore our dataset for easy visualization

Simple Anova

Suppose we tested the hypothesis that a new drug is superior by taking three groups of participants and administering one group with a Placebo(Sugar pill), one group with a low dose of drug and one with a high dose.

Therefore,we first form a linear model from our data as follows:

By converting the above variables into factor factor levels we obtain,

Creating a matrix form like table, to represent our dataset

Therefore, we decided to create dummy variables to help us the categorical data representation. Where we purpose to avoid collinearity within our variables.

Collinearity reprents two or more independent variables in a regression model which are highly correlated.

Therefore we can plot some graphs to help us more analyze our dataset

Descriptive statistics data

Testing the assumptions of Anova

1 Levene test

Null_Hypothesis: The variance of residuals of the data are homogeneous

Alt_Hypothesis: The variance of residuals of the data are not homogeneous

Since our p_value=0.89>0.05, we reject null_hypothesis and conclude that there is homogeneity with variances.

2 Shapiro-Wilk Normality Test

Null_Hypothesis: Residuals of the data have a normal distribution.

Alt_Hypothesis: Residuals of the data is not normally distributed.

Since our p_valu=0.9672>0.05, we reject the null hypothesis and conclude that the residuals of the data have a normal distribution.