This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(ggplot2)
data <- read.csv("~/TranProject/mydataset.csv")
head(data)
## City Country Continent Population..2024. Population..2023.
## 1 Tokyo Japan Asia 37115035 37194105
## 2 Delhi India Asia 33807403 32941309
## 3 Shanghai China Asia 29867918 29210808
## 4 Dhaka Bangladesh Asia 23935652 23209616
## 5 Sao Paulo Brazil South America 22806704 22619736
## 6 Cairo Egypt Africa 22623874 22183201
## Growth.Rate
## 1 -0.0021
## 2 0.0263
## 3 0.0225
## 4 0.0313
## 5 0.0083
## 6 0.0199
# The first graph is a histogram that shows the distribution of population growth rates for each city. It looks fairly balanced, with no strong skew to the left or right, meaning most cities have similar growth rates.
ggplot(data, aes(x = Growth.Rate)) + geom_histogram(binwidth = 0.01, fill = "blue", color = "black") + labs(title = "Distribution of Population Growth Rate", x = "Growth Rate", y = "Frequency") + theme_minimal()
ggplot(data, aes(x = Population..2023., y = Population..2024.)) + geom_point(color = "red") + geom_smooth(method = "lm", se = FALSE, color = "blue") + labs(title = "Population in 2023 vs Population in 2024", x = "Population in 2023", y = "Population in 2024") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# The second graph is a scatter plot that compares the population in 2023 with the population in 2024 for each city. It helps me see the relationship between the two years, with cities that have larger populations in 2023 typically having similar populations in 2024, showing consistency over time.
# Calculate mean growth rate
mean_growth_rate <- mean(data$Growth.Rate)
mean_growth_rate
## [1] 0.02005131
# I calculated the mean growth rate to get an average of how populations are changing across all cities. Since the mean is positive, it tells me that, on average, cities are growing.
# Calculate correlation between 2023 and 2024 population.
correlation <- cor(data$Population..2023., data$Population..2024.)
correlation
## [1] 0.999896
# I ran a correlation analysis between the population sizes in 2023 and 2024.
# Divide dataset into two groups: positive and negative growth
positive_growth <- subset(data, Growth.Rate > 0)
negative_growth <- subset(data, Growth.Rate <= 0)
result <- t.test(positive_growth$Population..2024., negative_growth$Population..2024.)
result
##
## Welch Two Sample t-test
##
## data: positive_growth$Population..2024. and negative_growth$Population..2024.
## t = -0.92518, df = 34.851, p-value = 0.3612
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3394842 1269499
## sample estimates:
## mean of x mean of y
## 2607893 3670565
# I performed a t-test to compare cities with positive growth rates against those with zero or negative growth rates. The results show a significant difference in population sizes between the two groups, suggesting that cities with positive growth tend to be larger.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.