Intro: Tasked with gathering movie ratings for my imaginary friends,
I took their responses and wrote them into a table using SQL. After
connecting to the SQL database, I removed the null values from the
ratings table.
Connecting to the DB in R
# Load libraries
library('DBI')
library('RMySQL')
# Enter parameters and connect to SQL DB in R
mydb <- dbConnect(MySQL(), user='william.berritt09', password='southdrive7', dbname='william.berritt09', host='cunydata607sql.mysql.database.azure.com')
# Preview tables in my DB
dbListTables(mydb)
## [1] "movies"
Create data frame and preview it
# Query data from table into df1
df1 <- dbGetQuery(mydb,'select * from movies')
# Preview df1
print(df1)
## id movie_title friend rating
## 1 1 Dune Alice 4
## 2 2 No Time to Die Alice 3
## 3 3 Eternals Alice <NA>
## 4 4 Shang-Chi and the Legend of the Ten Rings Alice 4
## 5 5 The French Dispatch Alice <NA>
## 6 6 Spider-Man: No Way Home Alice 5
## 7 7 Dune Bob 3
## 8 8 No Time to Die Bob <NA>
## 9 9 Eternals Bob 4
## 10 10 Shang-Chi and the Legend of the Ten Rings Bob 3
## 11 11 The French Dispatch Bob <NA>
## 12 12 Spider-Man: No Way Home Bob 4
## 13 13 Dune Charlie 4
## 14 14 No Time to Die Charlie <NA>
## 15 15 Eternals Charlie 3
## 16 16 Shang-Chi and the Legend of the Ten Rings Charlie <NA>
## 17 17 The French Dispatch Charlie 4
## 18 18 Spider-Man: No Way Home Charlie 4
## 19 19 Dune David <NA>
## 20 20 No Time to Die David 4
## 21 21 Eternals David 3
## 22 22 Shang-Chi and the Legend of the Ten Rings David 4
## 23 23 The French Dispatch David 3
## 24 24 Spider-Man: No Way Home David 4
## 25 25 Dune Emily <NA>
## 26 26 No Time to Die Emily 4
## 27 27 Eternals Emily 3
## 28 28 Shang-Chi and the Legend of the Ten Rings Emily <NA>
## 29 29 The French Dispatch Emily 4
## 30 30 Spider-Man: No Way Home Emily 4
## 31 31 Dune Frank 5
## 32 32 No Time to Die Frank <NA>
## 33 33 Eternals Frank 4
## 34 34 Shang-Chi and the Legend of the Ten Rings Frank 3
## 35 35 The French Dispatch Frank 5
## 36 36 Spider-Man: No Way Home Frank 4
## 37 37 Dune Grace 3
## 38 38 No Time to Die Grace <NA>
## 39 39 Eternals Grace 4
## 40 40 Shang-Chi and the Legend of the Ten Rings Grace <NA>
## 41 41 The French Dispatch Grace 4
## 42 42 Spider-Man: No Way Home Grace <NA>
## 43 43 Dune Henry 4
## 44 44 No Time to Die Henry <NA>
## 45 45 Eternals Henry 3
## 46 46 Shang-Chi and the Legend of the Ten Rings Henry 4
## 47 47 The French Dispatch Henry 3
## 48 48 Spider-Man: No Way Home Henry 3
## 49 49 Dune Isabel <NA>
## 50 50 No Time to Die Isabel 4
## 51 51 Eternals Isabel 3
## 52 52 Shang-Chi and the Legend of the Ten Rings Isabel 4
## 53 53 The French Dispatch Isabel 3
## 54 54 Spider-Man: No Way Home Isabel 4
## 55 55 Dune Jack 3
## 56 56 No Time to Die Jack 3
## 57 57 Eternals Jack 4
## 58 58 Shang-Chi and the Legend of the Ten Rings Jack <NA>
## 59 59 The French Dispatch Jack 3
## 60 60 Spider-Man: No Way Home Jack 4
## 61 61 Dune Katherine <NA>
## 62 62 No Time to Die Katherine 4
## 63 63 Eternals Katherine 3
## 64 64 Shang-Chi and the Legend of the Ten Rings Katherine 4
## 65 65 The French Dispatch Katherine 3
## 66 66 Spider-Man: No Way Home Katherine 4
Missing data strategy
# Remove null values
no_null_df <- na.omit(df1)
# Preview new data frame
no_null_df
## id movie_title friend rating
## 1 1 Dune Alice 4
## 2 2 No Time to Die Alice 3
## 4 4 Shang-Chi and the Legend of the Ten Rings Alice 4
## 6 6 Spider-Man: No Way Home Alice 5
## 7 7 Dune Bob 3
## 9 9 Eternals Bob 4
## 10 10 Shang-Chi and the Legend of the Ten Rings Bob 3
## 12 12 Spider-Man: No Way Home Bob 4
## 13 13 Dune Charlie 4
## 15 15 Eternals Charlie 3
## 17 17 The French Dispatch Charlie 4
## 18 18 Spider-Man: No Way Home Charlie 4
## 20 20 No Time to Die David 4
## 21 21 Eternals David 3
## 22 22 Shang-Chi and the Legend of the Ten Rings David 4
## 23 23 The French Dispatch David 3
## 24 24 Spider-Man: No Way Home David 4
## 26 26 No Time to Die Emily 4
## 27 27 Eternals Emily 3
## 29 29 The French Dispatch Emily 4
## 30 30 Spider-Man: No Way Home Emily 4
## 31 31 Dune Frank 5
## 33 33 Eternals Frank 4
## 34 34 Shang-Chi and the Legend of the Ten Rings Frank 3
## 35 35 The French Dispatch Frank 5
## 36 36 Spider-Man: No Way Home Frank 4
## 37 37 Dune Grace 3
## 39 39 Eternals Grace 4
## 41 41 The French Dispatch Grace 4
## 43 43 Dune Henry 4
## 45 45 Eternals Henry 3
## 46 46 Shang-Chi and the Legend of the Ten Rings Henry 4
## 47 47 The French Dispatch Henry 3
## 48 48 Spider-Man: No Way Home Henry 3
## 50 50 No Time to Die Isabel 4
## 51 51 Eternals Isabel 3
## 52 52 Shang-Chi and the Legend of the Ten Rings Isabel 4
## 53 53 The French Dispatch Isabel 3
## 54 54 Spider-Man: No Way Home Isabel 4
## 55 55 Dune Jack 3
## 56 56 No Time to Die Jack 3
## 57 57 Eternals Jack 4
## 59 59 The French Dispatch Jack 3
## 60 60 Spider-Man: No Way Home Jack 4
## 62 62 No Time to Die Katherine 4
## 63 63 Eternals Katherine 3
## 64 64 Shang-Chi and the Legend of the Ten Rings Katherine 4
## 65 65 The French Dispatch Katherine 3
## 66 66 Spider-Man: No Way Home Katherine 4
# How many nulls are there?
paste('There are ', nrow(df1)-nrow(no_null_df), ' rows with NULL values in this data set', sep='')
## [1] "There are 17 rows with NULL values in this data set"
Conclusion: There are a number of reasons you may want to remove
Null values. In this case, doing analyses with missing data can
negatively impact the accuracy of your analysis as the data is
techincally incomplete. Another reason is null values can introduce
biases. If the missing data is not random and instead part of a pattern,
exlcuding the Nulls can reduce bias.