Hw6HuidiDing

An exploratin of leetcode questions and other determining factors.

Huidi Ding
2022-07-04

Introuction

LeetCode questions seem to be an unavoidable topic during the tech interview process and is among one of the most recognizable coding website for practicing questions covered with data structure and algorithms.

LeetCode is by far the most effective tool that applicants used to improve at interview-style algorithms problems. The goal of this project is not to illustrate on how to solve some specific problems related to hash tables, dynamic programming, Depth-First Search or greedy solutions, but to help whoever are interested in to know where to start and some characteristics of leetCode questions as a whole.

Research questions

The questions are identified as below so far with this dataset:

-How is the difficulty of the questions correlated with the total submitted attempts?

-How is the acceptance rate which we have added as a new column correlate to the difficulty of the questions?

-Are all hard questions more difficult than all of the medium and easy questions?

-How is the video/written solution correlate to the premium/paid subscription service?

-Are more of the medium questions locked behind the premium service rather than the easy ones?

Data:

Identify the variables in the dataset and the dataset

Read in/clean the dataset leetcode.csv which will be used in the analysis

Import readr and assign csv dataset to a variable

  library(readr)
  library(tidyverse)
  library(tidyr) 
  library(rstudioapi)
  library(ggplot2)
  library(dplyr)
  library(plotrix)
  library(reshape2)

# Getting the path of your current open file automatically
leetcode <- read_csv("leetcode.csv")
#newData <- data.frame(leetcode)
view(leetcode)
# Preview and get the column names of the dataset
head(leetcode)
# A tibble: 6 × 8
  question_id video title      link  `total Accepted` `total Submitt…`
        <dbl> <lgl> <chr>      <chr>            <dbl>            <dbl>
1        1959 NA    Minimum P… http…              262              434
2        1949 NA    Implement… http…              691             1126
3        1947 NA    Number of… http…             1976             7740
4        1946 NA    Minimum A… http…             5411            13595
5        1945 NA    Finding t… http…             5890             7460
6        1944 NA    Truncate … http…             7056             8979
# … with 2 more variables: difficulty <dbl>, isPaid <lgl>
colnames(leetcode)
[1] "question_id"     "video"           "title"          
[4] "link"            "total Accepted"  "total Submitted"
[7] "difficulty"      "isPaid"         

As you may see with colnames(), we have listed all variables

Varible type classification:

String/char type : title, link, trueDifficult

title definition: the leetcode question description titles on the official website

link definition: offcial link to the specific question

trueDifficulty definition: here as you may know, we tranform 1 to easy difficulty, 2 to medium difficulty, 3 to hard difficulty. Usuallly, higher the difficulty, the more likely the user might spend time

Numeric type : question_id, total Accepted, total Submitted, difficulty, total Accepted(updated)

question_id definition: the number which corresponds to each question total Accepted definition: number of submitted solutions which can compile and run well

total Submitted definition: number of solutions/coding attempt submitted to the website, could be wrong

difficulty definition: in general, this represents the learning curve and how challenging a typical question could be

rate definition: we use the division between totalAccepted andd totalSubmitted to get the percentage which the submitted solution have been approved by the server

Logical type : video, isPaid

video definition: TRUE means the solution has video format, FALSE means written, NA means none

isPaid definition: TRUE means the question requires premium membership, FALSE means not requiring

#Use filter() to list all the questions without solutions
solvedQuestions <- filter(leetcode, is.na(`video`)) 
head(solvedQuestions)
# A tibble: 6 × 8
  question_id video title      link  `total Accepted` `total Submitt…`
        <dbl> <lgl> <chr>      <chr>            <dbl>            <dbl>
1        1959 NA    Minimum P… http…              262              434
2        1949 NA    Implement… http…              691             1126
3        1947 NA    Number of… http…             1976             7740
4        1946 NA    Minimum A… http…             5411            13595
5        1945 NA    Finding t… http…             5890             7460
6        1944 NA    Truncate … http…             7056             8979
# … with 2 more variables: difficulty <dbl>, isPaid <lgl>

Preview the whole dataset before any operations; We now find the dimensions of this data set (which we have imported as a data frame) by using thr R function dim(.) and then create a table containing the count of the different data types we have in this data frame, by using the R function sapply(.) nested within table(.).

dim(leetcode)
[1] 1650    8
head(leetcode, n = 1650)
# A tibble: 1,650 × 8
   question_id video title     link  `total Accepted` `total Submitt…`
         <dbl> <lgl> <chr>     <chr>            <dbl>            <dbl>
 1        1959 NA    Minimum … http…              262              434
 2        1949 NA    Implemen… http…              691             1126
 3        1947 NA    Number o… http…             1976             7740
 4        1946 NA    Minimum … http…             5411            13595
 5        1945 NA    Finding … http…             5890             7460
 6        1944 NA    Truncate… http…             7056             8979
 7        1943 NA    Count Pa… http…              584              850
 8        1937 NA    Maximize… http…              571              819
 9        1936 NA    Maximize… http…             2958            10959
10        1935 NA    Minimum … http…             7134            10125
# … with 1,640 more rows, and 2 more variables: difficulty <dbl>,
#   isPaid <lgl>
# Check NA availability if any
colSums(is.na(leetcode))
    question_id           video           title            link 
              0             701               0               0 
 total Accepted total Submitted      difficulty          isPaid 
              0               0               0               0 

Replace the NA value in videos as “No solution” since FALSE and TRUE have been written down, rename TRUE as “Video” and FALSE as “Written”. We can observe that, there is lots of missing data in this column. Let’s perform analysis by ignoring those missing values

#replace_na(leetcode, list(video = "No solution"))
# Calling str() function to confirm
str(leetcode)
spec_tbl_df [1,650 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ question_id    : num [1:1650] 1959 1949 1947 1946 1945 ...
 $ video          : logi [1:1650] NA NA NA NA NA NA ...
 $ title          : chr [1:1650] "Minimum Path Cost in a Hidden Grid" "Implement Trie II (Prefix Tree)" "Number of Different Subsequences GCDs" "Minimum Absolute Sum Difference" ...
 $ link           : chr [1:1650] "https://leetcode.com/problems/minimum-path-cost-in-a-hidden-grid" "https://leetcode.com/problems/implement-trie-ii-prefix-tree" "https://leetcode.com/problems/number-of-different-subsequences-gcds" "https://leetcode.com/problems/minimum-absolute-sum-difference" ...
 $ total Accepted : num [1:1650] 262 691 1976 5411 5890 ...
 $ total Submitted: num [1:1650] 434 1126 7740 13595 7460 ...
 $ difficulty     : num [1:1650] 2 2 3 2 2 1 2 3 3 2 ...
 $ isPaid         : logi [1:1650] TRUE TRUE FALSE FALSE FALSE FALSE ...
 - attr(*, "spec")=
  .. cols(
  ..   question_id = col_double(),
  ..   video = col_logical(),
  ..   title = col_character(),
  ..   link = col_character(),
  ..   `total Accepted` = col_double(),
  ..   `total Submitted` = col_double(),
  ..   difficulty = col_double(),
  ..   isPaid = col_logical()
  .. )
 - attr(*, "problems")=<externalptr> 
leetcode <- leetcode %>%
  mutate(solution = case_when(
    video == TRUE ~ "Video",
    video == FALSE ~ "Written",
    video == NA ~ "No solution",))
# Define the column rate which could be calculated by division between totalAccepted and totalSubmitted
acceptedRate <- select(leetcode, `total Accepted`)
totalSubmitted <- select(leetcode, `total Submitted`)
# Add new column named acceptedRate to leetcode dataset
leetcode <- mutate(leetcode, rate = (`total Accepted` / `total Submitted`))
#rate <- transform(rate, ar = accepetedRate / totalSubmitted)
# Preview the rate to check whether it is right
head(leetcode, n = 1650)
# A tibble: 1,650 × 10
   question_id video title     link  `total Accepted` `total Submitt…`
         <dbl> <lgl> <chr>     <chr>            <dbl>            <dbl>
 1        1959 NA    Minimum … http…              262              434
 2        1949 NA    Implemen… http…              691             1126
 3        1947 NA    Number o… http…             1976             7740
 4        1946 NA    Minimum … http…             5411            13595
 5        1945 NA    Finding … http…             5890             7460
 6        1944 NA    Truncate… http…             7056             8979
 7        1943 NA    Count Pa… http…              584              850
 8        1937 NA    Maximize… http…              571              819
 9        1936 NA    Maximize… http…             2958            10959
10        1935 NA    Minimum … http…             7134            10125
# … with 1,640 more rows, and 4 more variables: difficulty <dbl>,
#   isPaid <lgl>, solution <chr>, rate <dbl>

Compute descriptive statistics for each of variable including mean, median, and standard deviation for numerical variables, and frequencies for categorical variables.

# Compute the median for the accepted solutions and submitted solutions
summarize(leetcode, mean.totalAccepted = mean(`total Accepted`, na.rm = TRUE))
# A tibble: 1 × 1
  mean.totalAccepted
               <dbl>
1            128905.
summarize(leetcode, mean.totalSubmitted = mean(`total Submitted`, na.rm = TRUE))
# A tibble: 1 × 1
  mean.totalSubmitted
                <dbl>
1             280704.
# Compute the average for the difficulty, 1.978 which means a bit below medium
summarize(leetcode, mean.difficulty = mean(difficulty, na.rm = TRUE))
# A tibble: 1 × 1
  mean.difficulty
            <dbl>
1            1.98
# Compute the sd for the accepted solutions
summarize(leetcode, sd.totalAccepted = sd(`total Accepted`, na.rm = TRUE))
# A tibble: 1 × 1
  sd.totalAccepted
             <dbl>
1          233157.
# Group by the dataset by the number level of difficulty and calculate the average for submitted count for each difficulty;Group according to difficulty and performing aggregate functions
leetcode %>%
  group_by(difficulty) %>%
  summarize(mean.totoalSubmittedDifficult=mean(`total Submitted`), mean.totalAcceptedDifficult=mean(`total Accepted`), mean.totalAcceptedRate=(`mean.totalAcceptedDifficult`/`mean.totoalSubmittedDifficult`))
# A tibble: 3 × 4
  difficulty mean.totoalSubmittedDi… mean.totalAccep… mean.totalAccep…
       <dbl>                   <dbl>            <dbl>            <dbl>
1          1                 418797.          213609.            0.510
2          2                 269992.          119730.            0.443
3          3                 155624.           58597.            0.377

From above, we can conclude that hard questions have less average ratio when compared to easy and medium. Ratio decreases as difficulty of questions increases in general. In addition, we could conclude that harder the difficulty is, fewer accepted attempts and lower accepted ratio will be.

Compute freq table with leetcode$difficulty, this shows the average Submission for different difficulties

frequencyDifficulty <- leetcode %>%
 select(difficulty) %>%
 table()
frequencyDifficulty
difficulty
  1   2   3 
397 891 362 
frequencyDifficulty %>%
 prop.table()
difficulty
        1         2         3 
0.2406061 0.5400000 0.2193939 

Visualize using ggplot2 with both univariate and bivariate; Explain these graphs with variables, questions and conclusion.

Identify limitations of the visualization if any.

Visualize the pie chart graph showing the total submitted attempts over the 3 levels of difficulty, the graph has variables including frequency for each difficulty level; There are about 55% medium questions, 12% easy questions, 33% hard questions.

#pie(frequencyDifficulty,labels = frequencyDifficulty, main = "Total count for each difficulty")

# Descend to find the questions with highest accepted rate
rateDescending <- arrange(leetcode, desc(`rate`))
head(rateDescending, n = 1650)
# A tibble: 1,650 × 10
   question_id video title     link  `total Accepted` `total Submitt…`
         <dbl> <lgl> <chr>     <chr>            <dbl>            <dbl>
 1        1404 NA    Print Im… http…            23790            25230
 2        1713 FALSE Dot Prod… http…            35694            39106
 3        1089 NA    Remove V… http…            64124            70819
 4        1603 FALSE Running … http…           287068           323746
 5        1205 NA    Defangin… http…           297302           336414
 6        1791 NA    Richest … http…            92205           104538
 7        1528 NA    Kids Wit… http…           197377           223831
 8        1580 NA    Shuffle … http…           186267           211271
 9        1572 NA    Subrecta… http…            39364            44719
10        1635 NA    Number o… http…           156334           178288
# … with 1,640 more rows, and 4 more variables: difficulty <dbl>,
#   isPaid <lgl>, solution <chr>, rate <dbl>

Hence we know could rank the questions in another order and list the top 30 most difficult questions

From above there are 5 Hard questions easy to solve when compared to other hard ones according to acceptance rate in the list of the easiest questions. These questions include: 1022, 771, 865, 1503 and 1061

From above there are some questions with easy or medium difficult level but difficult to solve include: 1646, 1644, 1648, 1634, 1635, 1637, 1638, 1639, 1640, 1642, 1643, 1625, 1626, 1627, 1631, 1632, 1633 and so on.

We can conclude that not all hard questions are guaranteed to be more challenging than all easy or medium difficult level.

On the contrary, there are many problem with medium level but are difficult to solve as they have less acceptance/submissions ratio. (In general a question is considered as very difficult when the ratio is below 0.3 = 30%)

#Visualization

#Visualization #1: 

# Visualize the bar graph showing the question difficulty as x-axis and the count for each question as y-axis
ggplot(leetcode, aes(`difficulty`)) + geom_bar(color="grey", fill="black") +
labs(title = "diffuculty frequency") +
  scale_x_discrete(labels=c("Easy", "Medium", "Hard"))
List of 93
 $ line                      :List of 6
  ..$ colour       : chr "black"
  ..$ size         : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                      :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ size         : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                      :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                     : NULL
 $ aspect.ratio              : NULL
 $ axis.title                : NULL
 $ axis.title.x              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top          :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom       : NULL
 $ axis.title.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left         : NULL
 $ axis.title.y.right        :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top           :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom        : NULL
 $ axis.text.y               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left          : NULL
 $ axis.text.y.right         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                :List of 6
  ..$ colour       : chr "grey20"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ axis.ticks.x              : NULL
 $ axis.ticks.x.top          : NULL
 $ axis.ticks.x.bottom       : NULL
 $ axis.ticks.y              : NULL
 $ axis.ticks.y.left         : NULL
 $ axis.ticks.y.right        : NULL
 $ axis.ticks.length         : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x       : NULL
 $ axis.ticks.length.x.top   : NULL
 $ axis.ticks.length.x.bottom: NULL
 $ axis.ticks.length.y       : NULL
 $ axis.ticks.length.y.left  : NULL
 $ axis.ticks.length.y.right : NULL
 $ axis.line                 : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ axis.line.x               : NULL
 $ axis.line.x.top           : NULL
 $ axis.line.x.bottom        : NULL
 $ axis.line.y               : NULL
 $ axis.line.y.left          : NULL
 $ axis.line.y.right         : NULL
 $ legend.background         :List of 5
  ..$ fill         : NULL
  ..$ colour       : logi NA
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ legend.margin             : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing            : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x          : NULL
 $ legend.spacing.y          : NULL
 $ legend.key                :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : logi NA
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ legend.key.size           : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height         : NULL
 $ legend.key.width          : NULL
 $ legend.text               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.align         : NULL
 $ legend.title              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.align        : NULL
 $ legend.position           : chr "right"
 $ legend.direction          : NULL
 $ legend.justification      : chr "center"
 $ legend.box                : NULL
 $ legend.box.just           : NULL
 $ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background     : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing        : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ panel.background          :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : logi NA
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ panel.border              :List of 5
  ..$ fill         : logi NA
  ..$ colour       : chr "grey20"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ panel.spacing             : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ panel.spacing.x           : NULL
 $ panel.spacing.y           : NULL
 $ panel.grid                :List of 6
  ..$ colour       : chr "grey92"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ panel.grid.major          : NULL
 $ panel.grid.minor          :List of 6
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.5
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ panel.grid.major.x        : NULL
 $ panel.grid.major.y        : NULL
 $ panel.grid.minor.x        : NULL
 $ panel.grid.minor.y        : NULL
 $ panel.ontop               : logi FALSE
 $ plot.background           :List of 5
  ..$ fill         : NULL
  ..$ colour       : chr "white"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ plot.title                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.title.position       : chr "panel"
 $ plot.subtitle             :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : num 1
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption.position     : chr "panel"
 $ plot.tag                  :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.tag.position         : chr "topleft"
 $ plot.margin               : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ strip.background          :List of 5
  ..$ fill         : chr "grey85"
  ..$ colour       : chr "grey20"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ strip.background.x        : NULL
 $ strip.background.y        : NULL
 $ strip.placement           : chr "inside"
 $ strip.text                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey10"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.text.x              : NULL
 $ strip.text.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.switch.pad.grid     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ strip.switch.pad.wrap     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ strip.text.y.left         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE
#Visualization #2:
# Visualize the bar graph showing the isPaid as x-axis and the count for each question as y-axis
ggplot(leetcode, aes(`isPaid`)) + geom_bar(color="grey", fill="black") +
labs(title = "Premium sub and question counts") +
theme_minimal()
# Visualization #3: Combine the two bar charts
# make the plot
ggplot(leetcode, aes(x = `difficulty`, y = `total Submitted` + `total Accepted`, fill = isPaid)) +
  geom_bar( 
           stat="identity", position = "dodge", width = 0.7) +
  scale_fill_manual("isPaid\n", values = c("red","blue"), 
                    labels = c(" unpaid", " paid")) +
  labs(x="\ndifficulty",y="totalAttempts\n", title ="Total Attempts for each difficulty group related to premium service ") +              
  theme_bw(base_size = 14)
# Among medium questions, there is more percent of submitted and accepted premium questions when compared to easy and hard. There are less percent of easy premium questions.

# Visualization #4:

# Visualize the bar graph showing the question difficulty as x-axis and the average submission count as y-axis, add standard error bars to visualize the uncertainty around your estimate
leetcode %>%
  group_by(difficulty) %>%
  summarize(mean.totoalSubmittedDifficult=mean(`total Submitted`), mean.totalAcceptedDifficult=mean(`total Accepted`), mean.totalAcceptedRate=(`mean.totalAcceptedDifficult`/`mean.totoalSubmittedDifficult`), sd.totalSubmitted=sd(`total Submitted`))
# A tibble: 3 × 5
  difficulty mean.totoalSubmittedDi… mean.totalAccep… mean.totalAccep…
       <dbl>                   <dbl>            <dbl>            <dbl>
1          1                 418797.          213609.            0.510
2          2                 269992.          119730.            0.443
3          3                 155624.           58597.            0.377
# … with 1 more variable: sd.totalSubmitted <dbl>
ggplot(leetcode, aes(x = `difficulty`, y = `total Submitted`)) +
    geom_bar(stat="identity", fill="grey", position=position_dodge()) +
    geom_errorbar( aes(ymin=`total Submitted`-533771, ymax=`total Submitted`+533771), colour="orange", position=position_dodge(.9))
# Visualization #5:

# Replace the numeral value in difficulty, recode leetcode to make a boxplot
#The result should be three separate box plots for different values of year_postgrad
leetcode<-mutate(leetcode, trueDifficulty = recode(difficulty, `1` = "easy", `2` = "medium", `3` = "difficult"))

# Visualize the boxplot graph showing the relationship between question difficulty and the count for total submitted each question as y-axis
ggplot(leetcode, aes(`trueDifficulty`, `total Submitted`)) + 
  geom_boxplot() +
   labs(title = "Difficultity vs.TotalSubittedAttempts ", y = "difficulty level", x = "total submmitted attempts")
# The graph has variables including question difficulty and the count for total submitted; 

# Visualization #6:

# Visualize the histogram graph showing the question id as x-axis and the total submission count as y-axis

ggplot(leetcode, aes(question_id, fill = isPaid)) + 
  geom_histogram(binwidth = 50) + 
  labs(title = "question number in all difficulty levels") + 
  theme_bw() +
  facet_wrap(vars(difficulty), scales = "free")

Reflection:

#Conclusion

#Answer the questions

– What is missing (if anything) in your analysis process so far?

Some complex graphs are missing since I mainly uterlized a few Univariate basics and should focus more on Bivariate basics more. I added standard error bars to visualize the uncertainty but it is not correct.

– What conclusions can you make about your research questions at this point?

In summary, we could answer the questions in the following:

“How is the difficulty of the questions correlated with the total submitted attempts?” we could conclude that harder the difficulty is, fewer accepted attempts and lower accepted ratio will be.

“How is the acceptance rate which we have added as a new column correlate to the difficulty of the questions?” We can conclude that hard questions have less average ratio when compared to easy and medium. Ratio decreases as difficulty of questions increases in general.

“Are all hard questions more difficult than all of the medium and easy questions?” We can conclude that not all hard questions are guaranteed to be more challenging than all easy or medium difficult level. On the contrary, there are many problem with medium level but are difficult to solve as they have less acceptance/submissions ratio.

“How is the video/written solution correlate to the premium/paid subscrition service?” # We can conclude that there are only 3.6 percent of questions with video solutions. We can see that more percent of premium questions have video solutions when compared to non premium questions (10% vs 2%)

“Are more of the medium questions locked bebhind the premium service rather than the easy ones?” Among medium questions, there is more percent of submitted and accepted premium questions when compared to easy and hard. There are less percent of easy premium questions.

– What do you think a naive reader would need to fully understand your graphs?

From my perspective, a naive reader needs to know something about coding website and their business model, for instance, how leetcode Premium works. Make sure he or she understands the elements in the dataset spreadsheet. Some coding background would be ideal since for an attempt to be accepted, the code needs to be complied and pass the additional test cases.

– Is there anything you want to answer with your dataset, but can’t?

There are a number of questions I initially wish to answer including:

1.In general, do questions with higher difficulty level usually have longer question titls/descriptions?

2.Based on these provided information, is there a model or algorithm that we could uterlize to predict the difficulty for each question? If so, how are we going to train the model?

3.How can I provide a list of questions which suitable for each difficulty which extracts the outliers including some hard medium questions or some easy hard questions?

Bibliography