Project 4

Author

Andrew George

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/andre/Downloads/project 4")
jobs <- read_csv("upwork-jobs.csv")
Rows: 53058 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): title, link, description, country
dbl  (3): hourly_low, hourly_high, budget
lgl  (1): is_hourly
dttm (1): published_date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Removing NA’s

jobs2 <- jobs |>
  filter(!is.na(country) & !is.na(is_hourly)) 

Task 3 “Identify countries with the highest number of job postings”

jobs2 |>
  count(country, sort = TRUE)
# A tibble: 181 × 2
   country                  n
   <chr>                <int>
 1 United States        17831
 2 United Kingdom        3701
 3 India                 3095
 4 Australia             2211
 5 Canada                1971
 6 Pakistan              1150
 7 Germany                924
 8 Netherlands            763
 9 United Arab Emirates   714
10 France                 614
# ℹ 171 more rows

Shows that starting with the U.S followed by the UK, India, Australia, Canada, Pakistan, Germany, Netherlands, UAE and France have the top 10 number of job postings on Upwork over the past 2 weeks.

jobs3 <- jobs2 |>
  filter(country %in% c("United States", "United Kingdom", "India", "Australia", "Canada", "Pakistan", "Germany", "Netherlands", "United Arab Emirates", "France"))

Plot to show countries with 10 most number of job postings

ggplot(jobs3) +
  geom_bar(aes(x = country, fill = is_hourly), position = "dodge") +
  ## factoring in whether the jobs are hourly are not
  coord_flip() +
  labs(title = "Top ten countries of job postings on Upwork",
       x = "Country",
       y = "Number of job postings", 
       fill = "Hourly",
       caption = "Source: Upwork Job Postings Dataset 2024")

Another task I will explore, is the hourly rate for jobs in each country by taking the average of hourly high and low

jobs4 <- jobs3 |>
  filter(is_hourly == TRUE) |>
  mutate(
    avg_rate = (hourly_low + hourly_high) / 2
  )

Specifially looking at jobs that have an average hourly rate lower than 50 dollars for the top ten countries with job postings

jobs5 <- jobs4 |>
  filter(avg_rate < 50) |>
  group_by(country) |>
  ## Taking a random sample of job postings just to see how each result might vary
  slice_sample(n = 50)

Plot

jobs5 |>
  ggplot(aes(country, avg_rate, fill = country)) +
  coord_flip() +
  geom_boxplot() +
  labs(title = "Average hourly rate for 10 countries for jobs that pay less than $50",
       x = "Country",
       y = "Average Hourly Rate", 
       fill = "Country",
       caption = "Source: Upwork Job Postings Dataset 2024")