library(readr)
library(dplyr)
library(kableExtra) #enhance and customize tables
Attaching package: ‘kableExtra’
The following object is masked from ‘package:dplyr’:
group_rows
library(magrittr) #provides pipe operator
Attaching package: ‘magrittr’
The following object is masked from ‘package:purrr’:
set_names
The following object is masked from ‘package:tidyr’:
extract
Student name | Student number | Percentage of contribution |
---|---|---|
Chamudi Abeysinghe | s4150303 | 100% |
AKSHAY KUMAR (2025) states that ‘This datas real-world trends in children’s screen time usage. It includes data on educational, recreational, and total screen time for children aged 5 to 15 years, with breakdowns by gender (Male, Female, Other/Prefer not to say) and day type (Weekday, Weekend)’.
Available at:https://www.kaggle.com/datasets/ak0212/average-daily-screen-time-for-children
Variable Description:
Age: Children age between 5-15 years. Helps analyze how screen time habits change as children grow older.
Gender: The gender of the child (Male,Female,Other/Prefer not to say). Allows exploration of gender-based differences in screen time behavior.
Screen Type Time: The category of screen use being measured. Includes data on educational, recreational, and total screen time.
Day Type: Indicates whether the data was collected for a weekday or weekend. Helps analyze how screen time varies between school days and off days.
Average Screen Time (hours): The average number of hours spent on screens per day for the given category.
Sample Size: Number of survey respondents represented by the data point. Larger sample sizes usually mean more reliable data.
getwd()
[1] "/Users/chamudi/Desktop/RMIT/1st Year 1st Semester/Data Wrangling/Assignments/Updated Assignment 1"
screen_time_ds <- read.csv('Datasets/screen_time.csv')
head(screen_time_ds)
str(screen_time_ds) #to show the structure of the data (types, columns, etc.)
'data.frame': 198 obs. of 6 variables:
$ Age : int 5 5 5 5 5 5 5 5 5 5 ...
$ Gender : chr "Male" "Male" "Male" "Male" ...
$ Screen.Time.Type : chr "Educational" "Recreational" "Total" "Educational" ...
$ Day.Type : chr "Weekday" "Weekday" "Weekday" "Weekend" ...
$ Average.Screen.Time..hours.: num 0.44 1.11 1.55 0.5 1.44 1.93 0.49 0.96 1.45 0.5 ...
$ Sample.Size : int 500 500 500 500 500 500 500 500 500 500 ...
summary(screen_time_ds) #to get descriptive statistics
Age Gender Screen.Time.Type Day.Type
Min. : 5 Length:198 Length:198 Length:198
1st Qu.: 7 Class :character Class :character Class :character
Median :10 Mode :character Mode :character Mode :character
Mean :10
3rd Qu.:13
Max. :15
Average.Screen.Time..hours. Sample.Size
Min. :0.440 Min. :300
1st Qu.:1.403 1st Qu.:340
Median :2.490 Median :400
Mean :2.993 Mean :400
3rd Qu.:4.397 3rd Qu.:460
Max. :8.190 Max. :500
dim(screen_time_ds) #number of rows and columns
[1] 198 6
names(screen_time_ds) #column names
[1] "Age" "Gender"
[3] "Screen.Time.Type" "Day.Type"
[5] "Average.Screen.Time..hours." "Sample.Size"
First I downloaded a csv data set from Kaggle. Then loaded the libraries. Then I made sure which working directory I am working in. And then using the read.csv package, I imported the data set into an object called ‘screen_time_ds’. Next I loaded the first few rows of the data frame to confirm that the data was read correctly.
I have used other outputs, beside head().
In R, when we use read.csv() and assign it to a object (like screen_time_ds), we are storing the data set in a data frame. In R, a data frame is simply an object that holds tabular data, and by default, read.csv() returns a data frame.
#Check the dimensions of the data frame.
dim(screen_time_ds)
[1] 198 6
#Check the column names in the data frame, rename them if required.
colnames(screen_time_ds)
[1] "Age" "Gender"
[3] "Screen.Time.Type" "Day.Type"
[5] "Average.Screen.Time..hours." "Sample.Size"
screen_time_ds <- screen_time_ds %>%
rename(
Screen_Type = Screen.Time.Type,
Day_Type = Day.Type,
Avg_Screen_Time = Average.Screen.Time..hours.,
Sample_Size = Sample.Size
)
#Summaries the types of variables
sapply(screen_time_ds, class) #https://stackoverflow.com/questions/21125222/determine-the-data-types-of-a-data-frames-columns
Age Gender Screen_Type Day_Type Avg_Screen_Time
"integer" "character" "character" "character" "numeric"
Sample_Size
"integer"
str(screen_time_ds)
'data.frame': 198 obs. of 6 variables:
$ Age : int 5 5 5 5 5 5 5 5 5 5 ...
$ Gender : chr "Male" "Male" "Male" "Male" ...
$ Screen_Type : chr "Educational" "Recreational" "Total" "Educational" ...
$ Day_Type : chr "Weekday" "Weekday" "Weekday" "Weekend" ...
$ Avg_Screen_Time: num 0.44 1.11 1.55 0.5 1.44 1.93 0.49 0.96 1.45 0.5 ...
$ Sample_Size : int 500 500 500 500 500 500 500 500 500 500 ...
#Check the levels of factor variables, rename/rearrange them if required.
levels(screen_time_ds$Age)
NULL
levels(screen_time_ds$Gender)
NULL
levels(screen_time_ds$Screen_Type)
NULL
levels(screen_time_ds$Day_Type)
NULL
levels(screen_time_ds$Avg_Screen_Time)
NULL
levels(screen_time_ds$Sample_Size)
NULL
#convert 'Gender' and 'Day_Type' character column into factor variable
screen_time_ds$Gender <- as.factor(screen_time_ds$Gender)
screen_time_ds$Day_Type <- as.factor(screen_time_ds$Day_Type)
#Check the levels of factor variables after converting
levels(screen_time_ds$Gender)
[1] "Female" "Male" "Other/Prefer not to say"
levels(screen_time_ds$Day_Type)
[1] "Weekday" "Weekend"
str(screen_time_ds)
'data.frame': 198 obs. of 6 variables:
$ Age : int 5 5 5 5 5 5 5 5 5 5 ...
$ Gender : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 2 2 1 1 1 1 ...
$ Screen_Type : chr "Educational" "Recreational" "Total" "Educational" ...
$ Day_Type : Factor w/ 2 levels "Weekday","Weekend": 1 1 1 2 2 2 1 1 1 2 ...
$ Avg_Screen_Time: num 0.44 1.11 1.55 0.5 1.44 1.93 0.49 0.96 1.45 0.5 ...
$ Sample_Size : int 500 500 500 500 500 500 500 500 500 500 ...
First, I checked the dimensions of the data frame. I used the dim() function and it gave the output 198 observations and 6 variables.
Secondly, I checked the column names in the data frame. I used the colnames() function to get the column names in the data frame. I decided to rename “Screen.Time.Type”, “Day.Type”, “Average.Screen.Time..hours.” and “Sample.Size” column names (replacing long and awkward column names with cleaner versions). Therefore I used rename() function. I saved new column names into the same object where the data set is saved before. I passed data frame using pipe operator for the rename().
Next, I got the summary of the data set. I used the ‘sapply(dataset,class)’ function. I used stack overflow to find a way other that str() function to do this task. Then I used str() and gave the structure summary of our data set, including the class/type of each column. All data types are correct in this data set. Therefore, there is no need of conversion.
If there is a incorrect type, it is given as integer and I want to make it as numeric, so, I use, screen_time_ds\('columnName' <- as.Numeric(screen_time_ds\)’columnName’) function.
Finally, Columns like Gender or Day_Type has a small set of categories, storing them as factors is more appropriate than as strings. So, I decided to change them into factor variables. I didn’t label those factor variables newly and I kept them as it is.
#subset data
subset_ds_10 <-screen_time_ds[1:10, ] #The comma , separates rows from columns.
#convert it to a matrix
matrix_10 <- as.matrix(subset_ds_10)
print(matrix_10)
Age Gender Screen_Type Day_Type Avg_Screen_Time Sample_Size
1 "5" "Male" "Educational" "Weekday" "0.44" "500"
2 "5" "Male" "Recreational" "Weekday" "1.11" "500"
3 "5" "Male" "Total" "Weekday" "1.55" "500"
4 "5" "Male" "Educational" "Weekend" "0.50" "500"
5 "5" "Male" "Recreational" "Weekend" "1.44" "500"
6 "5" "Male" "Total" "Weekend" "1.93" "500"
7 "5" "Female" "Educational" "Weekday" "0.49" "500"
8 "5" "Female" "Recreational" "Weekday" "0.96" "500"
9 "5" "Female" "Total" "Weekday" "1.45" "500"
10 "5" "Female" "Educational" "Weekend" "0.50" "500"
str(matrix_10)
chr [1:10, 1:6] "5" "5" "5" "5" "5" "5" "5" "5" "5" "5" "Male" "Male" "Male" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "1" "2" "3" "4" ...
..$ : chr [1:6] "Age" "Gender" "Screen_Type" "Day_Type" ...
First, I had to subset data for first 10 observations with all variables. 1:10 means select rows from 1 to 10. The comma , separates rows from columns. Space after the comma means give all the columns. And I am saving the subset data into a new object call ‘subset_ds_10’.
Then to convert it for a matrix, I simply used the as.matrix() function to convert the data frame into a matrix and saved it in a new object called ‘matrix_10’.
A matrix can only hold one data type. In my data frame, I have both numbers and characters (like Age and Gender), R will convert everything into character strings in the matrix. So, we can see that after using the str(matrix_10), it gives the summary as all are chr now.
# Create integer variable (Number of minutes the child spends on physical activity)
physical_activity_minutes <- c(60,30,20,120,40,20,100,15,10,90)
# Create ordinal variable (Parental monitoring or restrictions over the child’s screen usage)
parental_control_level <- c("Low", "High", "High", "None", "Moderate", "High", "None", "High", "High", "Low")
str(parental_control_level)
chr [1:10] "Low" "High" "High" "None" "Moderate" "High" "None" "High" "High" ...
class(parental_control_level)
[1] "character"
# Convert parental_control_level to ordered factor
parental_control_level <- factor(parental_control_level,
levels = c("None", "Low", "Moderate", "High"),
ordered = TRUE)
str(parental_control_level)
Ord.factor w/ 4 levels "None"<"Low"<"Moderate"<..: 2 4 4 1 3 4 1 4 4 2
levels(parental_control_level)
[1] "None" "Low" "Moderate" "High"
class(parental_control_level)
[1] "ordered" "factor"
# Combine into data frame
child_df <- data.frame(Physical_Activity_Minutes = physical_activity_minutes, Parental_Control_Level = parental_control_level)
str(child_df)
'data.frame': 10 obs. of 2 variables:
$ Physical_Activity_Minutes: num 60 30 20 120 40 20 100 15 10 90
$ Parental_Control_Level : Ord.factor w/ 4 levels "None"<"Low"<"Moderate"<..: 2 4 4 1 3 4 1 4 4 2
# Create numeric variable (Sleep Duration (hours))
sleep_duration_hours <- c(9.2,8.7, 7.5, 9.0, 6.8, 8.3, 7.0, 9.5, 6.5, 8.0)
#use cbind() to add this vector to your data frame.
child_df <- cbind(child_df, sleep_duration_hours)
#get dimensions
dim(child_df)
[1] 10 3
str(child_df)
'data.frame': 10 obs. of 3 variables:
$ Physical_Activity_Minutes: num 60 30 20 120 40 20 100 15 10 90
$ Parental_Control_Level : Ord.factor w/ 4 levels "None"<"Low"<"Moderate"<..: 2 4 4 1 3 4 1 4 4 2
$ sleep_duration_hours : num 9.2 8.7 7.5 9 6.8 8.3 7 9.5 6.5 8
Even though it is independent from the previous data set, I made one integer variable and ordinal variable which goes with my previous data set. For each,it has 10 observations. ‘parental_control_level’ class is character. As per instructed, I converted ordinal variable in to a ordered factor variable. Then I got the class as “ordered” “factor”. And using the data.frame (), I finally made the data frame using integer variable and ordered factor variable. Again created a numeric variable. Using cbind() function, I added the new numeric vector variable with the previous data frame. Using dim() function, it confirms that in the end there are 10 observations with 3 variables.
Presentation link: https://rmit-arc.instructuremedia.com/embed/f20ce682-1340-4aba-b1a5-3ff3139a24d1
Stack overflow (2025), Stack overflow website, accessed 09 April 2025. https://stackoverflow.com/questions/21125222/determine-the-data-types-of-a-data-frames-columns
W3Schools (2025) R Tutorial, w3schools website, accessed 29 March 2025. https://www.w3schools.com/r/r_data_frames.asp
AKSHAY KUMAR (2025) Average Daily Screen Time for Children, Kaggle website, accessed 29 March 2025. https://www.kaggle.com/datasets/ak0212/average-daily-screen-time-for-children
DataCamp (2024) Subsetting in R Tutorial, datacamp website, 10 April 2025. https://www.datacamp.com/tutorial/subsets-in-r