R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(ggplot2)
data <- read.csv("~/TranProject/mydataset.csv")
head(data)
##        City    Country     Continent Population..2024. Population..2023.
## 1     Tokyo      Japan          Asia          37115035          37194105
## 2     Delhi      India          Asia          33807403          32941309
## 3  Shanghai      China          Asia          29867918          29210808
## 4     Dhaka Bangladesh          Asia          23935652          23209616
## 5 Sao Paulo     Brazil South America          22806704          22619736
## 6     Cairo      Egypt        Africa          22623874          22183201
##   Growth.Rate
## 1     -0.0021
## 2      0.0263
## 3      0.0225
## 4      0.0313
## 5      0.0083
## 6      0.0199
# The first graph is a histogram showing the distribution of population growth rates across cities. This helps me understand whether most cities are growing or shrinking, and how the growth rates are distributed. 
ggplot(data, aes(x = Growth.Rate)) + geom_histogram(binwidth = 0.01, fill = "blue", color = "black") + labs(title = "Distribution of Population Growth Rate", x = "Growth Rate", y = "Frequency") + theme_minimal()

ggplot(data, aes(x = Population..2023., y = Population..2024.)) + geom_point(color = "red") + geom_smooth(method = "lm", se = FALSE, color = "blue") + labs(title = "Population in 2023 vs Population in 2024", x = "Population in 2023", y = "Population in 2024") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

# In the scatter plot, I compared the populations of cities in 2023 and 2024. Each point represents a city, showing how its population has changed over the year. If the points are close to a diagonal line, it indicates that cities with large populations in 2023 maintained similar sizes in 2024. This plot helps me visualize the relationship between the two years and identify any significant outlines or trends in population change.

# Calculate mean growth rate
mean_growth_rate <- mean(data$Growth.Rate)
mean_growth_rate
## [1] 0.02005131
# The mean growth rate helps me understand the overall trend in population changes across the cities. If the mean is positive, it indicates that, on average, cities are experiencing population growth. However, if the mean is negative, it suggests that cities are generally seeing a decline in population. This gives me a clear sense of whether populations are increasing or shrinking as a whole.


# Calculate correlation between 2023 and 2024 population
correlation <- cor(data$Population..2023., data$Population..2024.)
correlation
## [1] 0.999896
# Divide dataset into two groups: positive and negative growth
positive_growth <- subset(data, Growth.Rate > 0)
negative_growth <- subset(data, Growth.Rate <= 0)
result <- t.test(positive_growth$Population..2024., negative_growth$Population..2024.)
result
## 
##  Welch Two Sample t-test
## 
## data:  positive_growth$Population..2024. and negative_growth$Population..2024.
## t = -0.92518, df = 34.851, p-value = 0.3612
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3394842  1269499
## sample estimates:
## mean of x mean of y 
##   2607893   3670565
# The t-test I conducted compares the mean population sizes between cities with positive growth rates and those with zero or negative growth rates. If the p-value is below 0.05, it indicates a significant difference in population sizes between the two groups. This analysis helps me explore the differences in population sizes based on growth rates, giving me valuable insights into how city populations are distributed and how they relate to growth trends.

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.