Loading the libraries and view the “healthylifestye21” dataset

setwd("~/Data 110 Folder")
healthylifestye21 <- read.csv("healthylifestye21.csv")

Loading libraries and viewing the data

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
view(healthylifestye21)

The structure of the data

str(healthylifestye21)

## 'data.frame':    44 obs. of  12 variables:
##  $ City                                  : chr  "Amsterdam" "Sydney" "Vienna" "Stockholm" ...
##  $ Rank                                  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Sunshine.hours.Annual.Avg..City.      : int  1858 2636 1884 1821 1630 1662 2769 1626 2591 1938 ...
##  $ Cost.of.a.bottle.of.water.City.       : chr  "$2.09" "$1.61" "$2.11" "$1.87" ...
##  $ Obesity.levels.Country.               : chr  "20.40%" "29.00%" "20.10%" "20.60%" ...
##  $ Life.expectancy.years...Country.      : num  81.2 82.1 81 81.8 79.8 80.4 83.2 80.6 82.2 81.7 ...
##  $ Pollution.Index.score...City.         : num  30.9 26.9 17.3 19.6 21.2 ...
##  $ Annual.avg..hours.worked              : int  1434 1712 1501 1452 1380 1540 1644 1386 1686 1670 ...
##  $ Happiness.levels.Country.             : num  7.44 7.22 7.29 7.35 7.64 7.8 5.87 7.07 6.4 7.23 ...
##  $ Outdoor.activities.Annual.Hours.City. : int  422 406 132 129 154 113 35 254 585 218 ...
##  $ Number.of.take.out.places.City.       : int  1048 1103 1008 598 523 309 539 1729 2344 788 ...
##  $ Cost.of.a.monthly.gym.membership.City.: chr  "$37.97" "$45.33" "$28.01" "$40.59" ...

Finding the top ten healthiest city ranks by subsetting columns

city_ranks10 <- subset(healthylifestye21, select = c("City", "Rank")) %>%
  head(10)

View the top ten list of healthiest cities

view(city_ranks10)

Creating a bar graph in order of top cities

ggplot(data = city_ranks10, aes(x = reorder (City, Rank), y = Rank, fill = City,)) +
  geom_bar(stat = "identity")+
theme(axis.text.x = element_text(angle = 90))+
ggtitle("Top Ten Healthiest Cities")+
xlab("City")+
ylab("Rank")

Loading the Janitor package to

library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Use library Janitor to clean up and format the column names

healthylifestye21 <- healthylifestye21%>%
janitor::clean_names()

Assigning happiness ratings to happiness levels

happiness_levels_country1 <-
Low_Happiness<-healthylifestye21$happiness_levels_country[healthylifestye21$happiness_levels_country <= 4.99] 
Medium_Happiness<-healthylifestye21$happiness_levels_country[healthylifestye21$happiness_levels_country  [5:6]] 
High_Happiness <-healthylifestye21$happiness_levels_country[healthylifestye21$happiness_levels_country >= 7]

Only taking the top ten cities

happiness <- mutate(healthylifestye21, happiness_levels_country1 = happiness_levels_country)%>%
  head(10)

Is happiness associated with healthiness? Let us take a look

Graph of top 10 cities in regard to happiness

ggplot(data = happiness, mapping = aes(city, happiness_levels_country1, color = city)) +
geom_point() +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(happiness_levels_country1)) +
  ggtitle("Is Happiness Contributing to Health?")+
ylab("Happiness Level Rating")

According to the data, all 10 cities that have the highest happiness rating are ranked top 10 in helathiest cities around the world.

Creating a new column named “latitude”

latitude <- c(52.353218, -33.8696, 48.20254, 59.332787, 55.567576, 60.169807,33.58325, 52.501408, 41.3875, 49.26038)

Creating a new column named “longitude”

longitude <- c(5.0027695, 151.20695, 16.3688, 18.064487, 12.569023, 24.938128, 130.3836, 13.4023285, 2.16835, -123.11336)

Adding column named “latitude”

city_ranks10['latitude'] <- latitude

Adding column named “longitude”

city_ranks10['longitude'] <- longitude

View “city_ranks10”

view(city_ranks10)

Assigning map_data to “world”

world <- map_data("world")

View “world”

view(world)

Introducing a new dataset

setwd("~/Data 110 Folder")
Book2 <- read.csv("Book2.csv")

View new dataset with Longitude and Laditude

view(Book2)

Remove “Rank” from “city_ranks10”

city_ranks10 <- subset (city_ranks10, select = -Rank)

Combine both datasets

City_15 <- rbind(Book2, city_ranks10)

Create a world map of top 10 cities

ggplot() +
  geom_map(
    data = world, map = world,
    aes(long, lat, map_id = region),
    color = "white", fill = "lightgray", size = 0.1
  ) +
  ggtitle("Where The Healthiest Cities Locatced")+
  geom_point(
    data = city_ranks10,
    aes(longitude, latitude, color = City),
    alpha = 0.7)

## Warning: Ignoring unknown aesthetics: x, y

View “City_15”

view(City_15)

Add new column “hours_worked”

City_15['hours_worked'] <- c(2137, 1965, 1779, 1779, 1718, 1434, 1712, 1501, 1452, 1380, 1540, 1644, 1386, 1686, 1670)

World map of hours worked for top 10 and lowest 5 cities

ggplot() +
  geom_map(
    data = world, map = world,
    aes(long, lat, map_id = region),
    color = "white", fill = "lightgray", size = 0.1
  ) +
  ggtitle("Does Work Affect Health?")+
  geom_point(
    data = City_15,
    aes(longitude, latitude, color = City, size = hours_worked), 
    alpha = 0.9
  )

## Warning: Ignoring unknown aesthetics: x, y

The dataset is from Kaggle.com and the topic is healthy lifestyle cities of 2021 around the world. The context of this data is a discussion of healthy lifestyle metrics of the top 44 cities. The observable measures include: Sunshine hours(City), Cost of a bottle of water(City), Obesity levels(Country), Life expectancy(years) (Country), Pollution(Index score) (City), Annual avg. hours worked, Happiness levels(Country), Outdoor activities(In minutes), Number of taking out places(City), Cost of a monthly gym membership(City). I cleaned the data in many ways. I converted some of my data by turning the currency of Euros into dollars so that people could understand the data about their currency. I switched the water bottle cost and gym membership cost currency to the US dollar to get a better feel for what each price was. I also categorized the data; I categorized/assigned “happiness level” ratings to low, medium, and high variables. I downsized the data by only focusing on the first 10 cities and the last 5 cities so that viewers could get a better idea of what is going on in the dataset. I merged and removed columns to make new datasets for example “city_rank10”, I took the city and rank for the top 10 healthiest cities and then merged latitude and longitude columns onto the data frame for each city. I had some problems with the column names and I continued to get error messages every time I used them. So, to help this issue, I used the “janitor” clean_names(), and what this does is it returns names with only lowercase letters, with “_” as a separator. It handles special characters and spaces, appends numbers to duplicated names, converts “%” to “percent” to retain meaning. In the janitor package, the main janitor functions format data frame column names; isolate partially-duplicate records, and provide quick tabulations (frequency tables and crosstabs). The reason I chose this dataset was that I wanted data where I could evaluate and hypothesize different causes or characteristics of a healthy city. The dataset includes both quantitative and categorical variables, which allow me to come up with some unique results. The benefit of having quantitative data is that quantitative variables let you quickly collect information, including randomized samples with the ability to reach larger groups and duplicate easily. It also allows you to focus on facts that don’t require direct observation and can be anonymous making your analysis easier to complete. Categorical data is unique and does not have the same kind of statistical analysis that can be performed on other data. The results of categorical data are concrete, without subjective open-ended questions. The first visualization is a bar graph of the top ten cities and their rank. The rank consists of many different ratings and aspects of what a healthy city has more or less of. For example, the highest-ranked cities have the lowest cost of water, the greatest amount of outdoor activity, the least amount of hours worked, and the lowest obesity percentage. The second visualization includes the level of happiness for each of those top ten cities. I assigned each happiness rating to low, medium, and high levels of happiness based on a scale I created. Each of the cities within the top ten rankings had the highest levels of happiness compared to the rest of the cities. The reason why I used facet wrap for the graph is that I wanted viewers to see each city’s rating. I did a map graph for my third visual. The interesting aspect of this graph is that it shows where the top ten healthiest cities were and most of those places were located in Europe and only one of these cities was located in the US. We can attribute this to Europe’s great health care system, attention to the environment (great air and water quality). These cities also have high levels of education, low unemployment rate, and general happiness. A city like Helsinki Finland, has a fairly low number of cars, low pollution levels, and high life expectancy. Helsinki also has a great work-life balance (the city ranks 16th in the world) and a high number of annual vacation days. Some features of Vienna that aren’t displayed in the data are low crime rates, good public services, and a great doctor-to-citizen ratio. Berlin has proved to have a better business ecosystem, including the number of jobs, the availability of essentials like housing, food, and transport. Openness in terms of gender equality and tolerance is also very high in the German capital. Amsterdam has one of the highest annual vacation days in the world. The city is also known for its high tolerance and diversity. Additionally, it has all of the advantages of a big city, while having a very compact size. It is bike-friendly and it features the second-highest number of electric car points in the world, which contributes to its low pollution rates. My last visualization included the top 10 and lowest 5 ranked cities in the world. I think that this visual was one of the more comprehensive ones that I made. It included the aspect of 15 cities and average hours worked. The longest hours worked were in the Western hemisphere which could be attributed to many things such as culture and the overall importance of work-life balance. In conclusion, for the last two visuals, I searched up coordinates(longitude and latitude) of the cities to make the maps. I merged longitude and latitude columns to the new datasets that I made. One thing that I could not get to work on was creating a line graph or some type of graph that could display every variable for each city in the data frame. I didn’t want to use different shapes or sizes to display the visual. I think it would be beneficial to include the top 10 and bottom 5 cities and see how they compare regarding the observable measures.

Project 1 Healthy Cities

Nate Jack

3/8/2022

Loading the libraries and view the “healthylifestye21” dataset

Loading libraries and viewing the data

The structure of the data

Finding the top ten healthiest city ranks by subsetting columns

View the top ten list of healthiest cities

Creating a bar graph in order of top cities

Loading the Janitor package to

Use library Janitor to clean up and format the column names

Assigning happiness ratings to happiness levels

Only taking the top ten cities

Is happiness associated with healthiness? Let us take a look

Graph of top 10 cities in regard to happiness

According to the data, all 10 cities that have the highest happiness rating are ranked top 10 in helathiest cities around the world.

Creating a new column named “latitude”

Creating a new column named “longitude”

Adding column named “latitude”

Adding column named “longitude”

View “city_ranks10”

Assigning map_data to “world”

View “world”

Introducing a new dataset

View new dataset with Longitude and Laditude

Remove “Rank” from “city_ranks10”

Combine both datasets

Create a world map of top 10 cities

View “City_15”

Add new column “hours_worked”

World map of hours worked for top 10 and lowest 5 cities