This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(ggplot2)
data <- read.csv("~/TranProject/mydataset.csv")
head(data)
## City Country Continent Population..2024. Population..2023.
## 1 Tokyo Japan Asia 37115035 37194105
## 2 Delhi India Asia 33807403 32941309
## 3 Shanghai China Asia 29867918 29210808
## 4 Dhaka Bangladesh Asia 23935652 23209616
## 5 Sao Paulo Brazil South America 22806704 22619736
## 6 Cairo Egypt Africa 22623874 22183201
## Growth.Rate
## 1 -0.0021
## 2 0.0263
## 3 0.0225
## 4 0.0313
## 5 0.0083
## 6 0.0199
# The first graph is a histogram showing the distribution of population growth rates across cities. This helps me understand whether most cities are growing or shrinking, and how the growth rates are distributed.
ggplot(data, aes(x = Growth.Rate)) + geom_histogram(binwidth = 0.01, fill = "blue", color = "black") + labs(title = "Distribution of Population Growth Rate", x = "Growth Rate", y = "Frequency") + theme_minimal()
ggplot(data, aes(x = Population..2023., y = Population..2024.)) + geom_point(color = "red") + geom_smooth(method = "lm", se = FALSE, color = "blue") + labs(title = "Population in 2023 vs Population in 2024", x = "Population in 2023", y = "Population in 2024") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# In the scatter plot, I compared the populations of cities in 2023 and 2024. Each point represents a city, showing how its population has changed over the year. If the points are close to a diagonal line, it indicates that cities with large populations in 2023 maintained similar sizes in 2024. This plot helps me visualize the relationship between the two years and identify any significant outlines or trends in population change.
# Calculate mean growth rate
mean_growth_rate <- mean(data$Growth.Rate)
mean_growth_rate
## [1] 0.02005131
# The mean growth rate helps me understand the overall trend in population changes across the cities. If the mean is positive, it indicates that, on average, cities are experiencing population growth. However, if the mean is negative, it suggests that cities are generally seeing a decline in population. This gives me a clear sense of whether populations are increasing or shrinking as a whole.
# Calculate correlation between 2023 and 2024 population
correlation <- cor(data$Population..2023., data$Population..2024.)
correlation
## [1] 0.999896
# Divide dataset into two groups: positive and negative growth
positive_growth <- subset(data, Growth.Rate > 0)
negative_growth <- subset(data, Growth.Rate <= 0)
result <- t.test(positive_growth$Population..2024., negative_growth$Population..2024.)
result
##
## Welch Two Sample t-test
##
## data: positive_growth$Population..2024. and negative_growth$Population..2024.
## t = -0.92518, df = 34.851, p-value = 0.3612
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3394842 1269499
## sample estimates:
## mean of x mean of y
## 2607893 3670565
# The t-test I conducted compares the mean population sizes between cities with positive growth rates and those with zero or negative growth rates. If the p-value is below 0.05, it indicates a significant difference in population sizes between the two groups. This analysis helps me explore the differences in population sizes based on growth rates, giving me valuable insights into how city populations are distributed and how they relate to growth trends.
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.