The “Birthweight” dataset is a survey of risk factors associated with low birthweight infants (data collected during Baystate Medical Center in Massachusetts during 1986). The weak birth weight is an event that has interested physicians for several years because of the very high infant mortality rate and infant abnormality rate high in low birth weight infants. A woman’s behavior during pregnancy (diet, smoking habits, etc.) can significantly alter the chances of carrying the pregnancy to term, and therefore of giving birth to a child of normal weight. A child is considered to have low birth weight if this is less than 2500 g.
knitr::opts_chunk$set(comment = NA)
Install the following libraries to help in data manipulation and display of summary statistics
if(!require(dplyr)){install.packages('dplyr')} #installing the package if not
Loading required package: dplyr
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr) #loading the library
library(gtsummary)
Warning: package 'gtsummary' was built under R version 4.2.2
library(epiDisplay)
Warning: package 'epiDisplay' was built under R version 4.2.2
Loading required package: foreign
Loading required package: survival
Loading required package: MASS
Warning: package 'MASS' was built under R version 4.2.2
Attaching package: 'MASS'
The following object is masked from 'package:gtsummary':
select
The following object is masked from 'package:dplyr':
select
Loading required package: nnet
library(vtable)
Warning: package 'vtable' was built under R version 4.2.2
Loading required package: kableExtra
Warning: package 'kableExtra' was built under R version 4.2.2
Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
%in% : 'length(x) = 3 > 1' in coercion to 'logical(1)'
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
library(wilson)
Warning: package 'wilson' was built under R version 4.2.2
Attaching package: 'wilson'
The following object is masked from 'package:stats':
heatmap
library(fastR2)
Warning: package 'fastR2' was built under R version 4.2.2
Loading required package: mosaic
Registered S3 method overwritten by 'mosaic':
method from
fortify.SpatialPolygonsDataFrame ggplot2
The 'mosaic' package masks several functions from core packages in order to add
additional features. The original behavior of these functions should not be affected by this.
Attaching package: 'mosaic'
The following object is masked from 'package:Matrix':
mean
The following object is masked from 'package:ggplot2':
stat
The following objects are masked from 'package:dplyr':
count, do, tally
The following objects are masked from 'package:stats':
binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
quantile, sd, t.test, var
The following objects are masked from 'package:base':
max, mean, min, prod, range, sample, sum
Attaching package: 'fastR2'
The following object is masked from 'package:MASS':
Traffic
library(readxl)
library(epitools)
Attaching package: 'epitools'
The following object is masked from 'package:survival':
ratetable
library(mosaic)
library(finalfit)
Warning: package 'finalfit' was built under R version 4.2.2
library(dplyr)
library(magrittr)
Attaching package: 'magrittr'
The following object is masked from 'package:wilson':
and
library(finalfit)
library(ggplot2)
library(psych)
Warning: package 'psych' was built under R version 4.2.2
Attaching package: 'psych'
The following objects are masked from 'package:mosaic':
logit, rescale
The following objects are masked from 'package:ggplot2':
%+%, alpha
The following object is masked from 'package:wilson':
pca
The following objects are masked from 'package:epiDisplay':
alpha, cs, lookup
library(corrplot)
Warning: package 'corrplot' was built under R version 4.2.2
corrplot 0.92 loaded
library(VIM)
Warning: package 'VIM' was built under R version 4.2.2
Loading required package: colorspace
Loading required package: grid
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
Attaching package: 'VIM'
The following object is masked from 'package:vtable':
countNA
The following object is masked from 'package:datasets':
sleep
library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(car)
Warning: package 'car' was built under R version 4.2.2
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:psych':
logit
The following objects are masked from 'package:mosaic':
deltaMethod, logit
The following object is masked from 'package:dplyr':
recode
library(knitr)
library(gmodels)
Warning: package 'gmodels' was built under R version 4.2.2
Attaching package: 'gmodels'
The following object is masked from 'package:epiDisplay':
ci
Poids_naissance <- read.csv("C:\\Users\\user\\Downloads\\Poids_naissance.csv")
head(Poids_naissance,3)
Poids_naissance <-data.frame(Poids_naissance)
attach(Poids_naissance)
Structuring categorical variables refers to organizing and preparing categorical data for analysis using the R programming language. Categorical variables are those that represent qualitative or categorical data, such as gender, race, education level, or favorite color.
In RStudio, categorical variables can be structured using factors, which are a data type in R that represent categorical variables with a fixed set of levels. Factors allow for the efficient storage and analysis of categorical data, as well as the ability to perform statistical tests and visualizations specific to categorical data.
To structure a categorical variable as a factor in RStudio, you can use the factor() function. This function takes a vector of categorical data as its first argument, and a set of levels for the variable as its second argument (if not specified, the levels will be automatically determined based on the unique values in the data). ` #### Factor the categorical variables
Poids_naissance$RACE<-factor(Poids_naissance$RACE, levels = c(1,2,3),
labels = c("White", "Black", "Other"))
Poids_naissance<- Poids_naissance %>%mutate(RACE = factor(RACE))
Poids_naissance<- Poids_naissance %>%mutate(LOW = factor(LOW))
Poids_naissance<- Poids_naissance %>%mutate(SMOKE = factor(SMOKE))
Poids_naissance<- Poids_naissance %>%mutate(PTL = factor(PTL))
Poids_naissance<- Poids_naissance %>%mutate(HT = factor(HT))
Poids_naissance<- Poids_naissance %>%mutate(UI = factor(UI))
Poids_naissance<- Poids_naissance %>%mutate(FVT = factor(FVT))
tab1(Poids_naissance$RACE)
Poids_naissance$RACE :
Frequency Percent Cum. percent
White 96 50.8 50.8
Black 26 13.8 64.6
Other 67 35.4 100.0
Total 189 100.0 100.0
From the results above, there were 96 individuals from the race category (1), 26 individuals from race category (2) and 67 individuals from the race category 3. This represents 51%, 26% and 35% of the distribution races in the data, respectively. This bar above is the visual representation of the distribution of races in the dataset.
sum_stat <- data.frame(Poids_naissance$AGE, Poids_naissance$LWT, Poids_naissance$BWT)
stargazer(Poids_naissance[,-1], type = "text")
==========================================
Statistic N Mean St. Dev. Min Max
------------------------------------------
AGE 189 23.238 5.299 14 45
LWT 189 129.815 30.579 80 250
BWT 189 2,944.656 729.022 709 4,990
------------------------------------------
st(sum_stat, col.breaks = 3,
summ = list(
c('notNA(x)','mean(x)', 'median(x)','sd(x^2)','min(x)','max(x)'),
c('notNA(x)','mean(x)')
),
summ.names = list(
c('N','Mean','median','SD','Min','Max'),
c('Count','Percent')
))
| Variable | N | Mean | median | SD | Min | Max |
|---|---|---|---|---|---|---|
| Poids_naissance.AGE | 189 | 23.238 | 23 | 269.332 | 14 | 45 |
| Poids_naissance.LWT | 189 | 129.815 | 121 | 9347.597 | 80 | 250 |
| Poids_naissance.BWT | 189 | 2944.656 | 2977 | 4244457.151 | 709 | 4990 |
st(Poids_naissance, col.breaks = 11,
summ = list(
c('notNA(x)','mean(x)', 'median(x)','sd(x^2)','min(x)','max(x)', 'skew(x)'),
c('notNA(x)','mean(x)')
),
summ.names = list(
c('N','Mean','median','SD','Min','Max', 'skew'),
c('Count','Percent')
))
| Variable | N | Mean | median | SD | Min | Max | skew |
|---|---|---|---|---|---|---|---|
| ID | 189 | 121.079 | 123 | 15468.655 | 4 | 226 | -0.074 |
| AGE | 189 | 23.238 | 23 | 269.332 | 14 | 45 | 0.711 |
| LWT | 189 | 129.815 | 121 | 9347.597 | 80 | 250 | 1.38 |
| RACE | 189 | ||||||
| … White | 96 | 50.8% | |||||
| … Black | 26 | 13.8% | |||||
| … Other | 67 | 35.4% | |||||
| SMOKE | 189 | ||||||
| … 0 | 115 | 60.8% | |||||
| … 1 | 74 | 39.2% | |||||
| PTL | 189 | ||||||
| … 0 | 159 | 84.1% | |||||
| … 1 | 24 | 12.7% | |||||
| … 2 | 5 | 2.6% | |||||
| … 3 | 1 | 0.5% | |||||
| HT | 189 | ||||||
| … 0 | 177 | 93.7% | |||||
| … 1 | 12 | 6.3% | |||||
| UI | 189 | ||||||
| … 0 | 161 | 85.2% | |||||
| … 1 | 28 | 14.8% | |||||
| FVT | 189 | ||||||
| … 0 | 100 | 52.9% | |||||
| … 1 | 47 | 24.9% | |||||
| … 2 | 30 | 15.9% | |||||
| … 3 | 7 | 3.7% | |||||
| … 4 | 4 | 2.1% | |||||
| … 6 | 1 | 0.5% | |||||
| BWT | 189 | 2944.656 | 2977 | 4244457.151 | 709 | 4990 | -0.207 |
| LOW | 189 | ||||||
| … 0 | 130 | 68.8% | |||||
| … 1 | 59 | 31.2% |
describe(Poids_naissance[,2:11])
Poids_naissance [,c(2,3,4,5,6,7,8,9,10),11] %>%
tbl_summary(by = SMOKE) %>%
add_p() %>%
add_overall() %>%
bold_labels()
| Characteristic | Overall, N = 1891 | 0, N = 1151 | 1, N = 741 | p-value2 |
|---|---|---|---|---|
| AGE | 23 (19, 26) | 23 (20, 26) | 22 (19, 26) | 0.5 |
| LWT | 121 (110, 140) | 124 (112, 142) | 120 (107, 137) | 0.2 |
| RACE | <0.001 | |||
| White | 96 (51%) | 44 (38%) | 52 (70%) | |
| Black | 26 (14%) | 16 (14%) | 10 (14%) | |
| Other | 67 (35%) | 55 (48%) | 12 (16%) | |
| PTL | 0.036 | |||
| 0 | 159 (84%) | 103 (90%) | 56 (76%) | |
| 1 | 24 (13%) | 10 (8.7%) | 14 (19%) | |
| 2 | 5 (2.6%) | 2 (1.7%) | 3 (4.1%) | |
| 3 | 1 (0.5%) | 0 (0%) | 1 (1.4%) | |
| HT | >0.9 | |||
| 0 | 177 (94%) | 108 (94%) | 69 (93%) | |
| 1 | 12 (6.3%) | 7 (6.1%) | 5 (6.8%) | |
| UI | 0.4 | |||
| 0 | 161 (85%) | 100 (87%) | 61 (82%) | |
| 1 | 28 (15%) | 15 (13%) | 13 (18%) | |
| FVT | 0.12 | |||
| 0 | 100 (53%) | 55 (48%) | 45 (61%) | |
| 1 | 47 (25%) | 35 (30%) | 12 (16%) | |
| 2 | 30 (16%) | 19 (17%) | 11 (15%) | |
| 3 | 7 (3.7%) | 3 (2.6%) | 4 (5.4%) | |
| 4 | 4 (2.1%) | 3 (2.6%) | 1 (1.4%) | |
| 6 | 1 (0.5%) | 0 (0%) | 1 (1.4%) | |
| BWT | 2,977 (2,414, 3,475) | 3,100 (2,509, 3,622) | 2,776 (2,370, 3,246) | 0.007 |
| 1 Median (IQR); n (%) | ||||
| 2 Wilcoxon rank sum test; Pearson's Chi-squared test; Fisher's exact test | ||||
tab1(Poids_naissance$LOW)
Poids_naissance$LOW :
Frequency Percent Cum. percent
0 130 68.8 68.8
1 59 31.2 100.0
Total 189 100.0 100.0
The test is used to determine if the proportion of successes in a sample is significantly different from a hypothesized value (in this case, 0.5). From the output above, we have 59 success (birth weight less than or equal to 2500 g), and 130 failure . We can therefore perform the the wilson confidence interval for proportion as shown below.
prop.test(59,189)
1-sample proportions test with continuity correction
data: 59 out of 189
X-squared = 25.926, df = 1, p-value = 3.548e-07
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.2479596 0.3841585
sample estimates:
p
0.3121693
prop.test(59,189, correct=FALSE)
1-sample proportions test without continuity correction
data: 59 out of 189
X-squared = 26.672, df = 1, p-value = 2.411e-07
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.2504031 0.3814188
sample estimates:
p
0.3121693
The sample size is 189, and out of those 189, 59 were successes. The sample proportion of successes (p) is 0.3121693. The results of the test with continuity correction show a chi-squared value of 25.926, with 1 degree of freedom and a p-value of 3.548e-07. The alternative hypothesis is that the true proportion (p) is not equal to 0.5. The 95% confidence interval for the true proportion ranges from 0.2479596 to 0.3841585.
The results of the test without continuity correction show a chi-squared value of 26.672, with 1 degree of freedom and a p-value of 2.411e-07. The alternative hypothesis is the same as before. The 95% confidence interval for the true proportion ranges from 0.2504031 to 0.3814188.
Both tests rejected the null hypothesis that the true proportion is equal to 0.5, since the p-values are less than the significance level (typically 0.05). The confidence intervals also contain 0.5. The difference between the two tests is minor, with the continuity correction providing slightly narrower confidence intervals.
tab1(Poids_naissance$UI)
Poids_naissance$UI :
Frequency Percent Cum. percent
0 161 85.2 85.2
1 28 14.8 100.0
Total 189 100.0 100.0
# Input the counts
n <- 28
x <- (18.7 / 100)*n
In the code x <- ((18.7/100) * n), x is the number of women with UI who had preterm labor, which is calculated by multiplying the proportion of women with UI who had preterm labor (18.7%) by the total number of women with UI (28). This gives us x = 18.7 * 100 = 18.7.
So x is the count of the number of women with UI who had preterm labor in the sample.
# Perform the test
prop.test(x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95)
1-sample proportions test with continuity correction
data: x out of n
X-squared = 9.7562, df = 1, p-value = 0.001787
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.07286863 0.38510001
sample estimates:
p
0.187
The p-value is 0.001787, which is less than 0.05, so we reject the null hypothesis. The confidence interval for the true proportion ranges from 0.07286863 to 0.38510001, which does not include 0.5. This suggests that the proportion of women who suffer from UI and had preterm labor is different from the proportion of women who did not suffer this complication at a 5% level of significance.
Therefore, we can conclude that there is a significant difference between the proportion of women with UI and preterm labor and those without this complication, based on the given data and significance level.
A contingency table, also known as a cross-tabulation table, is a table that displays the distribution of two or more categorical variables. The table presents the frequency or count of each combination of the variables in rows and columns.
Each row in the table represents a level of one categorical variable, while each column represents a level of the other categorical variable. The cells in the table contain the number of observations that fall into each combination of the two variables. The table can also display percentages or proportions instead of counts, depending on the purpose of the analysis.
Contingency tables are commonly used in statistical analysis to examine the relationship between two or more categorical variables, such as gender and occupation or region of residence and political party affiliation. They can help identify patterns and associations between variables, and can be used to test hypotheses about the relationship between them. Chi-square tests and Fisher’s exact test are commonly used statistical tests for analyzing contingency tables.
dat <- Poids_naissance[,c(5,11)] %>%
tbl_summary(by = LOW) %>%
add_p() %>%
add_overall() %>%
bold_labels()
dat
| Characteristic | Overall, N = 1891 | 0, N = 1301 | 1, N = 591 | p-value2 |
|---|---|---|---|---|
| SMOKE | 0.026 | |||
| 0 | 115 (61%) | 86 (66%) | 29 (49%) | |
| 1 | 74 (39%) | 44 (34%) | 30 (51%) | |
| 1 n (%) | ||||
| 2 Pearson's Chi-squared test | ||||
Poids_naissance [,c(2,3,5,10)] %>%
tbl_summary(by = SMOKE) %>%
add_p() %>%
add_overall() %>%
bold_labels()
| Characteristic | Overall, N = 1891 | 0, N = 1151 | 1, N = 741 | p-value2 |
|---|---|---|---|---|
| AGE | 23 (19, 26) | 23 (20, 26) | 22 (19, 26) | 0.5 |
| LWT | 121 (110, 140) | 124 (112, 142) | 120 (107, 137) | 0.2 |
| BWT | 2,977 (2,414, 3,475) | 3,100 (2,509, 3,622) | 2,776 (2,370, 3,246) | 0.007 |
| 1 Median (IQR) | ||||
| 2 Wilcoxon rank sum test | ||||
The table represents a contingency table with two categorical variables: “SMOKE” and “LOW”, and the corresponding counts or percentages for each combination of the two variables.
The “LOW” variable has two levels: 0 and 1, with 130 observations in level 0 and 59 observations in level 1, and the “SMOKE” variable also has two levels: 0 and 1, with 189 observations in total.
The p-value of 0.026 indicates the result of a statistical test to evaluate the association between the two variables using Pearson’s Chi-squared test. The null hypothesis is that the two variables are independent, meaning that the distribution of one variable does not depend on the distribution of the other variable. The alternative hypothesis is that the two variables are dependent, meaning that the distribution of one variable does depend on the distribution of the other variable. The p-value of 0.026 is less than the commonly used threshold of 0.05, indicating that we can reject the null hypothesis and conclude that there is a statistically significant association between the “SMOKE” variable and the “LOW” variable. Specifically, it appears that individuals in level 1 of the “LOW” variable are more likely to have a value of 1 in the “SMOKE” variable than individuals in level 0 of the “Characteristic” variable. In other words, individuals who smoke during pregnancy are more likely going to give birth to children with birth weight less than or equal to 2500 g.
TBL<-table(Poids_naissance$SMOKE, Poids_naissance$LOW)
TBL
0 1
0 86 29
1 44 30
test <- fisher.test(TBL)
test
Fisher's Exact Test for Count Data
data: TBL
p-value = 0.03618
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.028780 3.964904
sample estimates:
odds ratio
2.014137
Fisher’s Exact Test is a statistical test used to determine the association between two categorical variables, as well as the odds ratio confidence interval. From the output you provided, I have conducted conducted Fisher’s Exact Test on count data and obtained a p-value of 0.03618. The null hypothesis is that there is no association between the two categorical variables, while the alternative hypothesis is that there is an association.
The output also shows that the calculated odds ratio is 2.014137, which means that the odds of the event occurring in one group are two times greater than the odds of the event occurring in the other group. The 95% confidence interval for the odds ratio ranges from 1.028780 to 3.964904, indicating that there is a 95% probability that the true odds ratio falls within this range. Since the p-value is less than 0.05, which is a commonly used significance level, we can reject the null hypothesis and conclude that there is evidence of an association between the two categorical variables.
tbl <- matrix(c(86, 29, 44, 30), nrow = 2, byrow = TRUE)
colnames(tbl) <- c("LOW[yes]", "LOW[no]")
rownames(tbl) <- c("SMOKE[yes]", "SMOKE[no]")
tbl
LOW[yes] LOW[no]
SMOKE[yes] 86 29
SMOKE[no] 44 30
# conduct the test
prop.test(tbl)
Warning in stats::prop.test(x = count, n = n, p = p, alternative =
alternative, : Chi-squared approximation may be incorrect
1-sample proportions test with continuity correction
data: tbl [with success = 29]
X-squared = 0.25, df = 1, p-value = 0.6171
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.01319116 0.78057348
sample estimates:
p
0.25
The output above suggests that a 1-sample proportions test with continuity correction was conducted on a dataset presented in the contingency table tbl, with a sample size of 29 and a success count of 29*0.25=7.25 (assuming that the expected proportion of success was 0.5). The null hypothesis is that the true proportion of success is equal to 0.5, while the alternative hypothesis is that it is not equal to 0.5. The test statistic used is the chi-squared statistic, which has a chi-squared distribution with 1 degree of freedom.
The output shows that the calculated chi-squared statistic is 0.25 and the p-value is 0.6171. Since the p-value is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the true proportion of success is different from 0.5. The 95% confidence interval for the true proportion of success ranges from 0.01319116 to 0.78057348. This indicates that there is a 95% probability that the true proportion of success falls within this range. The sample estimate of the proportion of success is 0.25.
epitools::riskratio(tbl,rev='both',method = 'wald')
$data
LOW[no] LOW[yes] Total
SMOKE[no] 30 44 74
SMOKE[yes] 29 86 115
Total 59 130 189
$measure
NA
risk ratio with 95% C.I. estimate lower upper
SMOKE[no] 1.000000 NA NA
SMOKE[yes] 1.257708 1.013374 1.560953
$p.value
NA
two-sided midp.exact fisher.exact chi.square
SMOKE[no] NA NA NA
SMOKE[yes] 0.02914865 0.0361765 0.02649064
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The output above is from a 2x2 contingency table analysis using Fisher’s exact test to compare the proportion of “LOW” responses between two groups (“SMOKE[no]” and “SMOKE[yes]”). The contingency table shows the frequency distribution of the “LOW” responses for each group, as well as the total number of observations in each group.
The “measure” section of the output shows the estimated relative risk (risk ratio) and its 95% confidence interval (CI) based on the contingency table. The relative risk is 1.2577, which suggests that the proportion of “LOW” responses is higher in the “SMOKE[yes]” group compared to the “SMOKE[no]” group. The 95% CI for the relative risk ranges from 1.0134 to 1.5610.
The “p.value” section of the output shows the p-value for the test of the null hypothesis that the proportions of “LOW” responses in the two groups are equal. The p-value is 0.0291, which is less than the significance level of 0.05, suggesting that there is evidence of a significant difference between the two groups. The p-value is provided for different test options: midp.exact, fisher.exact, and chi-square.
The “correction” section of the output indicates whether a continuity correction was applied to the test statistic. In this case, the continuity correction was not applied. The “method” attribute indicates that the unconditional maximum likelihood estimation (MLE) and normal approximation (Wald) CI method was used to estimate the relative risk and its confidence interval.
TBL2<-table(Poids_naissance$UI, Poids_naissance$LOW)
TBL2
0 1
0 116 45
1 14 14
tbl_1 <- matrix(c(116, 45, 14, 14), nrow = 2, byrow = TRUE)
colnames(tbl_1) <- c("LOW[yes]", "LOW[no]")
rownames(tbl_1) <- c("UI[yes]", "UI[no]")
tbl_1
LOW[yes] LOW[no]
UI[yes] 116 45
UI[no] 14 14
test <- fisher.test(tbl_1)
test
Fisher's Exact Test for Count Data
data: tbl_1
p-value = 0.02692
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.041921 6.324135
sample estimates:
odds ratio
2.563399
The p-value for the test is 0.02692, which is less than the significance level of 0.05, indicating that there is evidence of a significant association between the two variables in the contingency table. The alternative hypothesis is that the true odds ratio is not equal to 1.
The 95% confidence interval for the odds ratio ranges from 1.0419 to 6.3241. This means that with 95% confidence, we can say that the true odds ratio lies within this interval. Since the interval excludes 1, it suggests that the odds of having a certain outcome are significantly different between the two groups in the contingency table.
The estimated odds ratio is 2.5634, which is the point estimate of the odds ratio based on the contingency table. This suggests that the odds of having the outcome of interest is 2.5634 times higher in one group compared to the other group.
prop.test(tbl_1)
Warning in stats::prop.test(x = count, n = n, p = p, alternative =
alternative, : Chi-squared approximation may be incorrect
1-sample proportions test without continuity correction
data: tbl_1 [with success = 14]
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.150039 0.849961
sample estimates:
p
0.5
The output shows the test statistic, degrees of freedom, and p-value for the test. In this case, the test statistic is X-squared with 1 degree of freedom, and the p-value is 1. This means that there is no evidence to reject the null hypothesis that the true proportion is equal to 0.5. The alternative hypothesis is that the true proportion is not equal to 0.5.
The 95% confidence interval for the proportion ranges from 0.1500 to 0.8500. This means that with 95% confidence, we can say that the true proportion lies within this interval. Since the interval includes 0.5, it suggests that the true proportion may be equal to 0.5. The sample estimate of the proportion is 0.5, which is the point estimate of the proportion based on the data used in the analysis.
epitools::riskratio(tbl_1,rev='both',method = 'wald')
$data
LOW[no] LOW[yes] Total
UI[no] 14 14 28
UI[yes] 45 116 161
Total 59 130 189
$measure
NA
risk ratio with 95% C.I. estimate lower upper
UI[no] 1.000000 NA NA
UI[yes] 1.440994 0.9827936 2.112817
$p.value
NA
two-sided midp.exact fisher.exact chi.square
UI[no] NA NA NA
UI[yes] 0.02648209 0.02691811 0.02012792
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The contingency table used in the analysis shows the counts of two categorical variables, UI and LOW, with two levels each. The measure section of the output shows the estimated risk ratio with a 95% confidence interval. The risk ratio estimates the ratio of the probability of an event occurring in one group relative to the probability of the same event occurring in another group. In this case, the estimated risk ratio for UI[yes] compared to UI[no] is 1.44. The estimated risk ratio of 1.44 means that the risk (or probability) of the event occurring in the UI[yes] group is 1.44 times higher than the risk of the same event occurring in the UI[no] group. In other words, if the event under consideration was rare, and its probability was 0.05 in the UI[no] group, then we would expect its probability to be 0.05 * 1.44 = 0.072 in the UI[yes] group. The 95% confidence interval for the risk ratio ranges from 0.98 to 2.11. This means that with 95% confidence, we can say that the true risk ratio lies within this interval. The confidence interval includes 1, which suggests that there may not be a significant difference in risk between the two groups.
The p-value section of the output shows the p-value for the two-sided test of the null hypothesis that the risk ratio is equal to 1. In this case, the p-value is 0.027, which is less than the typical significance level of 0.05. Therefore, we can reject the null hypothesis and conclude that there is evidence of a significant difference in risk between the two groups.
The correction section of the output shows that no continuity correction was applied in the analysis. Finally, the method section of the output describes the statistical method used to compute the confidence interval. In this case, an unconditional maximum likelihood estimate (MLE) was used with a normal approximation (Wald) confidence interval.