DAT 301 Midterm: Data Analysis

2025-03-30

Data

The dataset is about freelance jobs and earnings where each entry represents a freelance worker. There are fields that describe the work such as job category and platform of connection, and fields that describe the worker like experience level, total earning (USD), and job success rate. The dataset has 1950 entries and 15 fields. (Source: kaggle.com | User: Shohinur Pervez Shohan)

As the dataset was already cleaned, no further cleaning or filtering was necessary. The absence of missing and duplicated values was double checked with the following code:

sum(is.na(df))

## [1] 0

sum(duplicated(df))

## [1] 0

Data

3 Samples

##   Freelancer_ID    Job_Category Platform Experience_Level Client_Region
## 1             1 Web Development   Fiverr         Beginner          Asia
## 2             2 App Development   Fiverr         Beginner     Australia
## 3             3 Web Development   Fiverr         Beginner            UK

##   Payment_Method Job_Completed Earnings_USD Hourly_Rate Job_Success_Rate
## 1 Mobile Banking           180         1620       95.79            68.73
## 2 Mobile Banking           218         9078       86.38            97.54
## 3         Crypto            27         3455       85.17            86.60

##   Client_Rating Job_Duration_Days Project_Type Rehire_Rate Marketing_Spend
## 1          3.18                 1        Fixed       40.19              53
## 2          3.44                54        Fixed       36.53             486
## 3          4.20                46       Hourly       74.05             489

Categories of Jobs

The even spread of the categories among the data suggests the samples were not taken at random, but that the samples were selected to have fairly equal representation of each job category.

Spread of the Marketplace

Once again we see a fairly even distribution, this time between the five included platforms. This may a reflection of market trends or an attempt to get samples that cover a wide variety of marketplaces.

Jobs Completed, Hourly Rate, and Rehire Rate

Unexpectedly, not much of a pattern can be seen when comparing numbers of jobs completed, hourly rates, and rehire rates. A heatmap may be better suited to spot some sort of correlation visually.

Hourly Rates of Different Job Categories

While the average hourly rate differs from job to job, I was curious to see if there were significant differences between the project type in the job categories. For most jobs, we can see having a fixed rate appears to result in a higher average hourly rate. This may be something to consider depending on the type of job a freelance worker takes.

T-Test on Hourly Rates of Project Types

To test whether there is a significant difference of means between project types, I will perform a T-Test using the t.test() function.

ttest = t.test(Hourly_Rate ~ Project_Type, data = df)

## data:  Hourly_Rate by Project_Type

## confidence level: 95%

## p-value:  0.128673

## confidence interval:  -0.5375771 4.241801

T-Test Results

We can look at two pieces of information from the T-Test output to determine significance. First is the p-value. Since the p-value was greater than our alpha (.05) we can determine the difference is not statistically significant. Second, the 95% confidence interval includes 0, which also tells us the difference is means is not statistically significant. Looking at the results, it’s evident that project type (between fixed and hourly rates) should not be a consideration when looking for higher hourly rates.