leprosy <- read.csv("/Users/cindyfan/Desktop/SDS 313/homework2/Homework2_leprosy.csv")
max(leprosy$Cases, na.rm = TRUE)
## [1] 75394
na.omit(leprosy[(leprosy$Cases) == 75394, ])
## Country Code Region Population GDP LandArea Cases
## 76 India IND Asia/Pacific 1399179585 3176.295 1147955 75394
India has the highest number of new leprosy cases in 2021. It is not a fair comparison to look at the raw number of cases because every country has a different population.
leprosy$density = (leprosy$Cases/leprosy$Population) * 1e+05
dp_ggplot <- ggplot(leprosy)
dp_ggplot + geom_histogram(aes(x = density), binwidth = 3, col = "black",
fill = "aquamarine") + labs(title = "distribution of leprocy cases per 100k people across all countries",
x = "leprocy cases per 100k people", y = "Frequency")
fivenum(leprosy$density)
## [1] 0.000000000 0.002424845 0.242289314 1.278899578 30.450669915
min(leprosy$density, na.rm = TRUE)
## [1] 0
median(leprosy$density, na.rm = TRUE)
## [1] 0.2422893
max(leprosy$density, na.rm = TRUE)
## [1] 30.45067
The five number summary for leprocy cases per 100k people: 0, 0.0024248, 0.2422893, 1.2788996, 30.4506699 shows it is skewed right and has a median of 0.2422893. It ranges from 0 to 30.4506699.
library(ggplot2)
dp_ggplot + geom_histogram(aes(x = density), col = "black", fill = "red",
alpha = 1, binwidth = 10, position = "identity") + labs(title = "Frequency of New Leprosy Cases per 100k by Region",
x = "New Leprocy Cases per 100k people", y = "Frequency") +
facet_grid(~Region) + theme(legend.position = "bottom")
TotalCases = as.data.frame(table(leprosy$Region))
TotalCases$Median = round(aggregate(leprosy$density ~ leprosy$Region,
lep = leprosy, FUN = median)[, 2], 2)
library(kableExtra)
kable_styling(kbl(TotalCases, col.names = c("Regions", "Number of Countries",
"Median Cases per 100k")))
| Regions | Number of Countries | Median Cases per 100k |
|---|---|---|
| Africa | 45 | 1.07 |
| Americas | 34 | 0.18 |
| Asia/Pacific | 33 | 0.30 |
| Europe | 51 | 0.00 |
| Middle East | 20 | 0.07 |
Differences in leprosy prevalence across regions: Leprosy is relatively more prevalent in both Africa and Asia/Pacific, with their median cases per 100k (Africa: 1.07; Asia/Pacific: 0.30) higher than the overall median cases per 100k(0.2422893). Leprosy is relatively less prevalent in Europe, Middle East, and Americas, with their median cases per 100k (Europe: 0.00; Middle East: 0.07; Americas: 0.18) lower than the overall median cases per 100k(0.2422893). Overall, the regions ranked from the most leprosy prevalent to the least leprosy prevalent are: Africa > Asia/Pacific > Americas > Middle East > Europe
library(ggplot2)
dp_ggplot + geom_point(aes(x = density, y = GDP)) + labs(title = "relationship between cases per 100k and GDP",
x = "Cases per 100k", y = "GDP") + theme_classic()
The correlation coefficient between the cases per 100K and GDP is -0.0551929, indicating there is a negative, very weak, linear relationship between the cases per 100K and GDP.
Number of leprosy cases per 100k of a country is negatively correlated with its GDP. Therefore, countries with better economic conditions are expected to have lower prevalence of leprosy, which might be accounted by better medical technology and better access to medicines.
Click on the following hyperlink if you want to know more about leprocy: International Leprosy Association’s website