Use of AI:
Acknowledgement: We used (Copilot and ChatGPT) to assist us with this assignment
How We Used AI:
• AI was used to help us understand and debug coding errors.
• We used AI to generate simple code snippets, which we then adapted to fit our dataset and analysis.
• AI assisted in answering specific technical questions related to R programming, such as data visualisation and cleaning techniques.
How We Did Not Use AI:
• AI was not used to replace critical thinking or analysis.
• We did not use AI to rewrite sections of our assignment for clarity or grammar.
• AI was not used to generate content beyond our understanding; all interpretations and explanations are our own.
• All conceptualization, critical analysis, and final edits were conducted by the authors to ensure accuracy and adherence to academic integrity. Any potential errors or misinterpretations remain the responsibility of the authors.
Prompts We Used:
1. “How to add percentages and labels to pie charts in RStudio?”values <- c(30, 20, 50)
# Example values
labels <- c(“A”, “B”, “C”) # Labels
percentages <- round(values / sum(values) * 100, 1) # Calculate percentages
labels_with_percent <- paste(labels, percentages, “%”) # Combine labels with percentages
# Create pie chart
pie(values, labels = labels_with_percent, main = “Pie Chart with Percentages”)
2. “How to use data cleaning to get more accurate data in RStudio?”
# Remove outliers
clean_data <- subset(data, values >= lower_bound & values <= upper_bound)
# Print cleaned data print(clean_data)
3. “How to add colors to graphs?”
values <- c(30, 20, 50)
labels <- c(“A”, “B”, “C”)
# Define colors
colors <- c(“skyblue”, “light green”, “salmon”)
# Create pie chart with colors
pie(values, labels = labels, col = colors, main = “Colored Pie Chart”)
“How to Data clean to removing unnecessary data”
library(dplyr)
data1_clean <- data1 %>%
filter(!if_any(everything(), ~ grepl(“value”, ., ignore.case = TRUE)))
“How to make a Comparative histogram”
Limits + IQR for the comparative histogram
# Load necessary libraries
library(ggplot2)
library(dplyr)
# Example dataset
set.seed(123)
data1_clean <- data.frame(
student_type = rep(c(“Full-time”, “Part-time”), each = 50),
hours_work = c(rnorm(50, mean = 10, sd = 5), rnorm(50, mean = 20, sd = 7)))
# Compute Summary Statistics (IQR)
summary_stats <- data1_clean %>%
group_by(student_type) %>%
summarise( Mean = mean(hours_work, na.rm = TRUE), Median = median(hours_work, na.rm = TRUE),
Q1 = quantile(hours_work, 0.25, na.rm = TRUE),
Q3 = quantile(hours_work, 0.75, na.rm = TRUE))
print(summary_stats) # Display the summary statistics
# Create Comparative Histogram
ggplot(data1_clean, aes(x = hours_work, fill = student_type)) +
geom_histogram(alpha = 0.6, position = “identity”, bins = 15) +
scale_x_continuous(limits = c(0, 30), breaks = seq(0, 30, by = 5)) +
scale_y_continuous(limits = c(0, 20), breaks = seq(0, 20, by = 5)) +
labs(title = “Comparative Histogram of Hours Worked”, x = “Hours Worked”, y = “Count”, fill = “Student Type”) + theme_minimal()
“How to remove outliers from the scatterplot”
# Function to remove outliers
remove_outliers <- function(x) { Q1 <- quantile(x, 0.25, na.rm = TRUE) Q3 <- quantile(x, 0.75, na.rm = TRUE)
IQR_value <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR_value
upper_bound <- Q3 + 1.5 * IQR_value
return(x >= lower_bound & x <= upper_bound)}
“How to create a Plot with color customization and scale limits”
ggplot(data, aes(x = hours_work)) +
geom_histogram(binwidth = 2, fill = “skyblue”, color = “black”, alpha = 0.7) +
scale_x_continuous(limits = c(10, 30), breaks = seq(10, 30, by = 5)) + # Set X-axis limits
scale_y_continuous(limits = c(0, 20), breaks = seq(0, 20, by = 5)) + # Set Y-axis limits
labs(title = “Histogram of Hours Worked”, x = “Hours Worked”, y = “Frequency”) + theme_minimal()