This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
datas <- read.csv("C:\\Users\\karth\\Downloads\\Child Growth and Malnutrition.csv")
view(datas)
datas$Wasting = as.numeric(datas$Wasting)
## Warning: NAs introduced by coercion
datas$Overweight = as.numeric(datas$Overweight)
## Warning: NAs introduced by coercion
a <- datas |>
group_by(Country.Short.Name, Year.period) |>
summarise(sum(Wasting))
## `summarise()` has grouped output by 'Country.Short.Name'. You can override
## using the `.groups` argument.
a
## # A tibble: 1,197 × 3
## # Groups: Country.Short.Name [173]
## Country.Short.Name Year.period `sum(Wasting)`
## <chr> <chr> <dbl>
## 1 "" "0. - 4.… NA
## 2 "" "Not selec… NA
## 3 "" "Selected … NA
## 4 " Office of the Chief Government Statistician (OC… " and ICF\… NA
## 5 " Uverhkangai and" "" NA
## 6 " and Khovd aimags\"" "" NA
## 7 "Afghanistan" "1997" 18.2
## 8 "Afghanistan" "2004" 169.
## 9 "Afghanistan" "2013" 629.
## 10 "Afghanistan" "2018" 386.
## # ℹ 1,187 more rows
datas_new <- datas |>
mutate(BMI_Over = Overweight/(Stunting), na.rm = TRUE) |>
mutate(BMI_Under = Underweight/(Stunting), na.rm = TRUE) |>
mutate(SD_Difference = Wasting - Overweight, na.rm = TRUE)
view(datas_new)
datas_1 <- datas_new |>
select(Overweight, Stunting, BMI_Over)
datas_2 <- datas_new |>
select(Underweight, Stunting, BMI_Under)
datas_3 <- datas_new |>
select(Wasting, Overweight, SD_Difference)
view(datas_1)
a <- datas_1 |>
ggplot() +
geom_point(mapping = aes(x = Overweight, y = BMI_Over))
a
## Warning: Removed 2357 rows containing missing values (`geom_point()`).
The above graph shows the BMI for Overweight, but stunted
children. We see the values are somewhat minimal, stating that these
children are not healthy, but some children have enormous BMIs - which
means that their weight for height is too high.
b <- datas_2 |>
ggplot() +
geom_point(mapping = aes(x = Underweight, y = BMI_Under))
b
## Warning: Removed 1812 rows containing missing values (`geom_point()`).
The above graph shows the BMI for Underweight, but stunted
children. We see the values are somewhat minimal, stating that these
children are not healthy, but some children have enormous BMIs - which
means that their weight for their somewhat short height is too high. We
also see that as the weight increases, the height will slightly go up
for these children.
c <- datas_3 |>
ggplot() +
geom_point(mapping = aes(x = Wasting, y = SD_Difference))
c
## Warning: Removed 2172 rows containing missing values (`geom_point()`).
The above graph shows the difference between Standard deviation
for Wasting and Overweight children. These children are not only
prevalent for their height - for - weight ratio, but some of them are
enormous.
any(!is.finite(datas_2$BMI_Under))
## [1] TRUE
view(datas_2)
cor(na.omit(datas_2))
## Underweight Stunting BMI_Under
## Underweight 1.0000000 0.8129589 NaN
## Stunting 0.8129589 1.0000000 NaN
## BMI_Under NaN NaN 1
We see that Underweight and Stunting go in the same direction, but at different speeds. We have used the Pearson Coefficient. This makes sense, as a malnourished child is going to be both Underweight and Stunted, but as one increases in a child, so will the other
cor(na.omit(datas_1))
## Overweight Stunting BMI_Over
## Overweight 1.0000000 -0.2899611 NaN
## Stunting -0.2899611 1.0000000 NaN
## BMI_Over NaN NaN 1
We see that Overweight and Stunting go in opposite directions, at different speeds. This also makes sense, as if a child is overweight, he/she is not going to be stunted
cor(na.omit(datas_3))
## Wasting Overweight SD_Difference
## Wasting 1.0000000 -0.2582857 0.8231593
## Overweight -0.2582857 1.0000000 -0.7611542
## SD_Difference 0.8231593 -0.7611542 1.0000000
We see that Wasting and Overweight go in different directions at different speeds. SInce they are both defined for weight - for - height, and the definitions are almost polar opposites of one another, this correlation makes sense.
x_m <- mean(datas_1$BMI_Over, na.rm = TRUE)
y_m <- mean(datas_2$BMI_Under, na.rm = TRUE)
z_m <- mean(datas_3$SD_Difference, na.rm = TRUE)
cat(x_m, y_m, z_m, "\n")
## Inf Inf 1.114192
x_sd <- sd(datas_1$BMI_Over, na.rm = TRUE)
y_sd <- sd(datas_2$BMI_Under, na.rm = TRUE)
z_sd <- sd(datas_3$SD_Difference, na.rm = TRUE)
cat(x_sd, y_sd, z_sd)
## NaN NaN 8.647883
x_ma <- qt(0.975, df=nrow(datas_1)-1)*x_sd/sqrt(nrow(datas_1))
y_ma <- qt(0.975, df=nrow(datas_2)-1)*y_sd/sqrt(nrow(datas_2))
z_ma <- qt(0.975, df=nrow(datas_3)-1)*z_sd/sqrt(nrow(datas_3))
x_lo <- x_m - x_ma
x_hi <- x_m + x_ma
y_lo <- y_m - y_ma
y_hi <- y_m + y_ma
z_lo <- z_m - z_ma
z_hi <- z_m + z_ma
cat(x_lo, x_hi, "\n")
## NaN NaN
cat(y_lo, y_hi, "\n")
## NaN NaN
cat(z_lo, z_hi, "\n")
## 1.028928 1.199457
I was able to get the confidence interval of the mean for SD_Difference column, which is (1.028928, 1.199457) and the mean as seen above is 1.114192. The confidence interval built was 95%. This means that the average of difference between Wasting and Overweight for many children will fall between 1.03 and 1.2. The mean of this variable was 1.114 and the standard deviation was 8.648