Project 1

Author

Long Lin

Project 1 Overview

The goal of this project is to use a given dataset, of players in a chess tournament, and create an R Markdown file that can generate a .csv file with the following information for all of the players:

  • Player’s Name
  • Player’s State
  • Total Number of Points
  • Player’s Pre-Rating
  • Average Pre Chess Rating of Opponents

Source: https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%201/tournamentinfo.txt

Creating the data frame

First, I’ll read the data from my github and parse the info into the tournament_df dataframe.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# 1. Load the raw text
raw_lines <- readLines("https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%201/tournamentinfo.txt")
Warning in
readLines("https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%201/tournamentinfo.txt"):
incomplete final line found on
'https://raw.githubusercontent.com/longflin/DATA-607/refs/heads/main/Project%201/tournamentinfo.txt'
# 2. Remove dashes and empty lines
clean_lines <- raw_lines[!grepl("^-+$", raw_lines) & nzchar(raw_lines)]

# 3. Skip the header (first 2 lines of the remaining text)
data_lines <- clean_lines[3:length(clean_lines)]

# 4. Separate the two-row structure
# Every odd line is a "Name" row, every even line is a "ID/Rating" row
name_rows <- data_lines[seq(1, length(data_lines), by = 2)]
info_rows <- data_lines[seq(2, length(data_lines), by = 2)]

# 5. Parse the Name Rows (Fixed Width)
# Column positions: Pair Num (1-5), Name (9-40), Total Pts (36-40), Rounds (42-end)
names_df <- data.frame(
  pair_num = as.numeric(str_sub(name_rows, 1, 5)),
  player_name = str_trim(str_sub(name_rows, 9, 40)),
  total_pts = as.numeric(str_sub(name_rows, 42, 44)),
  stringsAsFactors = FALSE
)

# 6. Parse the Info Rows (Extracting State and Pre-Rating)
info_df <- data.frame(
  state = str_trim(str_sub(info_rows, 1, 5)),
  uscf_id = str_extract(info_rows, "\\d{8}"),
  pre_rating = as.numeric(str_extract(info_rows, "(?<=R: )\\s*\\d+")),
  stringsAsFactors = FALSE
)

# 7. Merge names with info
tournament_df <- cbind(names_df, info_df)

head(tournament_df, 100)
   pair_num                player_name total_pts state  uscf_id pre_rating
1         1                   GARY HUA       6.0    ON 15445895       1794
2         2            DAKSHESH DARURI       6.0    MI 14598900       1553
3         3               ADITYA BAJAJ       6.0    MI 14959604       1384
4         4        PATRICK H SCHILLING       5.5    MI 12616049       1716
5         5                 HANSHI ZUO       5.5    MI 14601533       1655
6         6                HANSEN SONG       5.0    OH 15055204       1686
7         7          GARY DEE SWATHELL       5.0    MI 11146376       1649
8         8           EZEKIEL HOUGHTON       5.0    MI 15142253       1641
9         9                STEFANO LEE       5.0    ON 14954524       1411
10       10                  ANVIT RAO       5.0    MI 14150362       1365
11       11   CAMERON WILLIAM MC LEMAN       4.5    MI 12581589       1712
12       12             KENNETH J TACK       4.5    MI 12681257       1663
13       13          TORRANCE HENRY JR       4.5    MI 15082995       1666
14       14               BRADLEY SHAW       4.5    MI 10131499       1610
15       15     ZACHARY JAMES HOUGHTON       4.5    MI 15619130       1220
16       16               MIKE NIKITIN       4.0    MI 10295068       1604
17       17         RONALD GRZEGORCZYK       4.0    MI 10297702       1629
18       18              DAVID SUNDEEN       4.0    MI 11342094       1600
19       19               DIPANKAR ROY       4.0    MI 14862333       1564
20       20                JASON ZHENG       4.0    MI 14529060       1595
21       21              DINH DANG BUI       4.0    ON 15495066       1563
22       22           EUGENE L MCCLURE       4.0    MI 12405534       1555
23       23                   ALAN BUI       4.0    ON 15030142       1363
24       24          MICHAEL R ALDRICH       4.0    MI 13469010       1229
25       25           LOREN SCHWIEBERT       3.5    MI 12486656       1745
26       26                    MAX ZHU       3.5    ON 15131520       1579
27       27             GAURAV GIDWANI       3.5    MI 14476567       1552
28       28 SOFIA ADINA STANESCU-BELLU       3.5    MI 14882954       1507
29       29           CHIEDOZIE OKORIE       3.5    MI 15323285       1602
30       30         GEORGE AVERY JONES       3.5    ON 12577178       1522
31       31               RISHI SHETTY       3.5    MI 15131618       1494
32       32      JOSHUA PHILIP MATHEWS       3.5    ON 14073750       1441
33       33                    JADE GE       3.5    MI 14691842       1449
34       34     MICHAEL JEFFERY THOMAS       3.5    MI 15051807       1399
35       35           JOSHUA DAVID LEE       3.5    MI 14601397       1438
36       36              SIDDHARTH JHA       3.5    MI 14773163       1355
37       37       AMIYATOSH PWNANANDAM       3.5    MI 15489571        980
38       38                  BRIAN LIU       3.0    MI 15108523       1423
39       39              JOEL R HENDON       3.0    MI 12923035       1436
40       40               FOREST ZHANG       3.0    MI 14892710       1348
41       41        KYLE WILLIAM MURPHY       3.0    MI 15761443       1403
42       42                   JARED GE       3.0    MI 14462326       1332
43       43          ROBERT GLEN VASEY       3.0    MI 14101068       1283
44       44         JUSTIN D SCHILLING       3.0    MI 15323504       1199
45       45                  DEREK YAN       3.0    MI 15372807       1242
46       46   JACOB ALEXANDER LAVALLEY       3.0    MI 15490981        377
47       47                ERIC WRIGHT       2.5    MI 12533115       1362
48       48               DANIEL KHAIN       2.5    MI 14369165       1382
49       49           MICHAEL J MARTIN       2.5    MI 12531685       1291
50       50                 SHIVAM JHA       2.5    MI 14773178       1056
51       51             TEJAS AYYAGARI       2.5    MI 15205474       1011
52       52                  ETHAN GUO       2.5    MI 14918803        935
53       53              JOSE C YBARRA       2.0    MI 12578849       1393
54       54                LARRY HODGE       2.0    MI 12836773       1270
55       55                  ALEX KONG       2.0    MI 15412571       1186
56       56               MARISA RICCI       2.0    MI 14679887       1153
57       57                 MICHAEL LU       2.0    MI 15113330       1092
58       58               VIRAJ MOHILE       2.0    MI 14700365        917
59       59          SEAN M MC CORMICK       2.0    MI 12841036        853
60       60                 JULIA SHEN       1.5    MI 14579262        967
61       61              JEZZEL FARKAS       1.5    ON 15771592        955
62       62              ASHWIN BALAJI       1.0    MI 15219542       1530
63       63       THOMAS JOSEPH HOSMER       1.0    MI 15057092       1175
64       64                     BEN LI       1.0    MI 15006561       1163

Round Data

Next I parsed the round data in order to calculate the average ratings of each person’s opponents.

# Extract the string containing the rounds
rounds_raw <- str_sub(name_rows, 42, -1)

# Extract just the digits (Opponent Pair Numbers)
# We use str_extract_all to get every number in the row
opponents_list <- str_extract_all(rounds_raw, "(?<=[WLD])\\s*\\d+")

# Convert the list into a data frame where each row is a game
opponents_df <- data.frame(
  pair_num = rep(tournament_df$pair_num, lengths(opponents_list)),
  opponent_id = as.numeric(unlist(opponents_list))
)

# Join with tournament_df to get the rating for each opponent_id
avg_ratings <- opponents_df |>
  left_join(tournament_df |> 
    select(pair_num, pre_rating), 
    by = c("opponent_id" = "pair_num")) |>
      group_by(pair_num) |>
        summarize(avg_opp_rating = round(mean(pre_rating, na.rm = TRUE), 0))

# Merge back into your main data frame
tournament_df <- tournament_df |>
  left_join(avg_ratings, by = "pair_num")

head(tournament_df, 100)
   pair_num                player_name total_pts state  uscf_id pre_rating
1         1                   GARY HUA       6.0    ON 15445895       1794
2         2            DAKSHESH DARURI       6.0    MI 14598900       1553
3         3               ADITYA BAJAJ       6.0    MI 14959604       1384
4         4        PATRICK H SCHILLING       5.5    MI 12616049       1716
5         5                 HANSHI ZUO       5.5    MI 14601533       1655
6         6                HANSEN SONG       5.0    OH 15055204       1686
7         7          GARY DEE SWATHELL       5.0    MI 11146376       1649
8         8           EZEKIEL HOUGHTON       5.0    MI 15142253       1641
9         9                STEFANO LEE       5.0    ON 14954524       1411
10       10                  ANVIT RAO       5.0    MI 14150362       1365
11       11   CAMERON WILLIAM MC LEMAN       4.5    MI 12581589       1712
12       12             KENNETH J TACK       4.5    MI 12681257       1663
13       13          TORRANCE HENRY JR       4.5    MI 15082995       1666
14       14               BRADLEY SHAW       4.5    MI 10131499       1610
15       15     ZACHARY JAMES HOUGHTON       4.5    MI 15619130       1220
16       16               MIKE NIKITIN       4.0    MI 10295068       1604
17       17         RONALD GRZEGORCZYK       4.0    MI 10297702       1629
18       18              DAVID SUNDEEN       4.0    MI 11342094       1600
19       19               DIPANKAR ROY       4.0    MI 14862333       1564
20       20                JASON ZHENG       4.0    MI 14529060       1595
21       21              DINH DANG BUI       4.0    ON 15495066       1563
22       22           EUGENE L MCCLURE       4.0    MI 12405534       1555
23       23                   ALAN BUI       4.0    ON 15030142       1363
24       24          MICHAEL R ALDRICH       4.0    MI 13469010       1229
25       25           LOREN SCHWIEBERT       3.5    MI 12486656       1745
26       26                    MAX ZHU       3.5    ON 15131520       1579
27       27             GAURAV GIDWANI       3.5    MI 14476567       1552
28       28 SOFIA ADINA STANESCU-BELLU       3.5    MI 14882954       1507
29       29           CHIEDOZIE OKORIE       3.5    MI 15323285       1602
30       30         GEORGE AVERY JONES       3.5    ON 12577178       1522
31       31               RISHI SHETTY       3.5    MI 15131618       1494
32       32      JOSHUA PHILIP MATHEWS       3.5    ON 14073750       1441
33       33                    JADE GE       3.5    MI 14691842       1449
34       34     MICHAEL JEFFERY THOMAS       3.5    MI 15051807       1399
35       35           JOSHUA DAVID LEE       3.5    MI 14601397       1438
36       36              SIDDHARTH JHA       3.5    MI 14773163       1355
37       37       AMIYATOSH PWNANANDAM       3.5    MI 15489571        980
38       38                  BRIAN LIU       3.0    MI 15108523       1423
39       39              JOEL R HENDON       3.0    MI 12923035       1436
40       40               FOREST ZHANG       3.0    MI 14892710       1348
41       41        KYLE WILLIAM MURPHY       3.0    MI 15761443       1403
42       42                   JARED GE       3.0    MI 14462326       1332
43       43          ROBERT GLEN VASEY       3.0    MI 14101068       1283
44       44         JUSTIN D SCHILLING       3.0    MI 15323504       1199
45       45                  DEREK YAN       3.0    MI 15372807       1242
46       46   JACOB ALEXANDER LAVALLEY       3.0    MI 15490981        377
47       47                ERIC WRIGHT       2.5    MI 12533115       1362
48       48               DANIEL KHAIN       2.5    MI 14369165       1382
49       49           MICHAEL J MARTIN       2.5    MI 12531685       1291
50       50                 SHIVAM JHA       2.5    MI 14773178       1056
51       51             TEJAS AYYAGARI       2.5    MI 15205474       1011
52       52                  ETHAN GUO       2.5    MI 14918803        935
53       53              JOSE C YBARRA       2.0    MI 12578849       1393
54       54                LARRY HODGE       2.0    MI 12836773       1270
55       55                  ALEX KONG       2.0    MI 15412571       1186
56       56               MARISA RICCI       2.0    MI 14679887       1153
57       57                 MICHAEL LU       2.0    MI 15113330       1092
58       58               VIRAJ MOHILE       2.0    MI 14700365        917
59       59          SEAN M MC CORMICK       2.0    MI 12841036        853
60       60                 JULIA SHEN       1.5    MI 14579262        967
61       61              JEZZEL FARKAS       1.5    ON 15771592        955
62       62              ASHWIN BALAJI       1.0    MI 15219542       1530
63       63       THOMAS JOSEPH HOSMER       1.0    MI 15057092       1175
64       64                     BEN LI       1.0    MI 15006561       1163
   avg_opp_rating
1            1605
2            1469
3            1564
4            1574
5            1501
6            1519
7            1372
8            1468
9            1523
10           1554
11           1468
12           1506
13           1498
14           1515
15           1484
16           1386
17           1499
18           1480
19           1426
20           1411
21           1470
22           1300
23           1214
24           1357
25           1363
26           1507
27           1222
28           1522
29           1314
30           1144
31           1260
32           1379
33           1277
34           1375
35           1150
36           1388
37           1385
38           1539
39           1430
40           1391
41           1248
42           1150
43           1107
44           1327
45           1152
46           1358
47           1392
48           1356
49           1286
50           1296
51           1356
52           1495
53           1345
54           1206
55           1406
56           1414
57           1363
58           1391
59           1319
60           1330
61           1327
62           1186
63           1350
64           1263
# Check GARY HUA (Pair 1)
print(tournament_df[1, c("player_name", "avg_opp_rating")])
  player_name avg_opp_rating
1    GARY HUA           1605

In order to check the results, I pulled the average opponent rating value from the dataframe for the first person (Gary Hua) and got 1605. I then pulled the values from the .txt file to verify that it was correct. From the .txt file, I got the following:

    1 | GARY HUA                        |6.0  |W  39|W  21|W  18|W  14|W   7|D  12|D   4|
   ON | 15445895 / R: 1794   ->1817     |N:2  |W    |B    |W    |B    |W    |B    |W    |

From there, I went through each round opponent and got their value pre rating:

W 39 corresponds to pair num 39 with a pre-rating of 1436:

   39 | JOEL R HENDON                   |3.0  |L   1|W  54|W  40|L  16|W  44|L  21|L  24|
   MI | 12923035 / R: 1436P23->1413     |N:4  |B    |W    |B    |W    |B    |W    |W    |

W 21 corresponds to pair num 21 with a pre-rating of 1563:

   21 | DINH DANG BUI                   |4.0  |W  43|L   1|W  47|L   3|W  40|W  39|L   6|
   ON | 15495066 / R: 1563P22->1562     |N:3  |B    |W    |B    |W    |W    |B    |W    |

w 18 corresponds to pair num 18 with a pre-rating of 1600:

   18 | DAVID SUNDEEN                   |4.0  |W  47|W   9|L   1|W  32|L  19|W  38|L  10|
   MI | 11342094 / R: 1600   ->1600     |N:3  |B    |W    |B    |W    |B    |W    |B    |

W 14 corresponds to pair num 14 with a pre-rating of 1610:

   14 | BRADLEY SHAW                    |4.5  |W  54|W  44|W   8|L   1|D  27|L   5|W  31|
   MI | 10131499 / R: 1610   ->1618     |N:3  |W    |B    |W    |W    |B    |B    |W    |

W 7 corresponds to pair num 7 with a pre-rating of 1649:

    7 | GARY DEE SWATHELL               |5.0  |W  57|W  46|W  13|W  11|L   1|W   9|L   2|
   MI | 11146376 / R: 1649   ->1673     |N:3  |W    |B    |W    |B    |B    |W    |W    |

D 12 corresponds to pair num 12 with a pre-rating of 1663:

   12 | KENNETH J TACK                  |4.5  |W  42|W  33|D   5|W  38|H    |D   1|L   3|
   MI | 12681257 / R: 1663   ->1670     |N:3  |W    |B    |W    |B    |     |W    |B    |

D 4 corresponds to pair num 4 with a pre-rating of 1716:

    4 | PATRICK H SCHILLING             |5.5  |W  23|D  28|W   2|W  26|D   5|W  19|D   1|
   MI | 12616049 / R: 1716   ->1744     |N:2  |W    |B    |W    |B    |W    |B    |B    |

Now, taking the total of the pre-ratings, we get 1436+1563+1600+1610+1649+1663+1716 = 11237 / 7 = 1605. That matches up with the data frame above.

  pre_ratings <- c(1436, 1563, 1600, 1610, 1649, 1663, 1716)
  total <- sum(pre_ratings)
  print(total)
[1] 11237
  average = mean(pre_ratings)
  print(average)
[1] 1605.286

Creating the .csv file

The .csv file will be created using the readr library and the write_csv function.

# Load the library
library(readr)

# Grab the relevant columns and rename the titles for readability
export_df <- tournament_df[order(tournament_df$pair_num), c("player_name", "state", "total_pts", "pre_rating", "avg_opp_rating")] |>
  rename(
    "Player Name" = "player_name",
    "Player State" = "state",
    "Total Number of Points" = "total_pts",
    "Player Pre-Rating" = "pre_rating",
    "Average Pre Chess Rating of Opponents" = "avg_opp_rating"
  )

head(export_df, 64)
                  Player Name Player State Total Number of Points
1                    GARY HUA           ON                    6.0
2             DAKSHESH DARURI           MI                    6.0
3                ADITYA BAJAJ           MI                    6.0
4         PATRICK H SCHILLING           MI                    5.5
5                  HANSHI ZUO           MI                    5.5
6                 HANSEN SONG           OH                    5.0
7           GARY DEE SWATHELL           MI                    5.0
8            EZEKIEL HOUGHTON           MI                    5.0
9                 STEFANO LEE           ON                    5.0
10                  ANVIT RAO           MI                    5.0
11   CAMERON WILLIAM MC LEMAN           MI                    4.5
12             KENNETH J TACK           MI                    4.5
13          TORRANCE HENRY JR           MI                    4.5
14               BRADLEY SHAW           MI                    4.5
15     ZACHARY JAMES HOUGHTON           MI                    4.5
16               MIKE NIKITIN           MI                    4.0
17         RONALD GRZEGORCZYK           MI                    4.0
18              DAVID SUNDEEN           MI                    4.0
19               DIPANKAR ROY           MI                    4.0
20                JASON ZHENG           MI                    4.0
21              DINH DANG BUI           ON                    4.0
22           EUGENE L MCCLURE           MI                    4.0
23                   ALAN BUI           ON                    4.0
24          MICHAEL R ALDRICH           MI                    4.0
25           LOREN SCHWIEBERT           MI                    3.5
26                    MAX ZHU           ON                    3.5
27             GAURAV GIDWANI           MI                    3.5
28 SOFIA ADINA STANESCU-BELLU           MI                    3.5
29           CHIEDOZIE OKORIE           MI                    3.5
30         GEORGE AVERY JONES           ON                    3.5
31               RISHI SHETTY           MI                    3.5
32      JOSHUA PHILIP MATHEWS           ON                    3.5
33                    JADE GE           MI                    3.5
34     MICHAEL JEFFERY THOMAS           MI                    3.5
35           JOSHUA DAVID LEE           MI                    3.5
36              SIDDHARTH JHA           MI                    3.5
37       AMIYATOSH PWNANANDAM           MI                    3.5
38                  BRIAN LIU           MI                    3.0
39              JOEL R HENDON           MI                    3.0
40               FOREST ZHANG           MI                    3.0
41        KYLE WILLIAM MURPHY           MI                    3.0
42                   JARED GE           MI                    3.0
43          ROBERT GLEN VASEY           MI                    3.0
44         JUSTIN D SCHILLING           MI                    3.0
45                  DEREK YAN           MI                    3.0
46   JACOB ALEXANDER LAVALLEY           MI                    3.0
47                ERIC WRIGHT           MI                    2.5
48               DANIEL KHAIN           MI                    2.5
49           MICHAEL J MARTIN           MI                    2.5
50                 SHIVAM JHA           MI                    2.5
51             TEJAS AYYAGARI           MI                    2.5
52                  ETHAN GUO           MI                    2.5
53              JOSE C YBARRA           MI                    2.0
54                LARRY HODGE           MI                    2.0
55                  ALEX KONG           MI                    2.0
56               MARISA RICCI           MI                    2.0
57                 MICHAEL LU           MI                    2.0
58               VIRAJ MOHILE           MI                    2.0
59          SEAN M MC CORMICK           MI                    2.0
60                 JULIA SHEN           MI                    1.5
61              JEZZEL FARKAS           ON                    1.5
62              ASHWIN BALAJI           MI                    1.0
63       THOMAS JOSEPH HOSMER           MI                    1.0
64                     BEN LI           MI                    1.0
   Player Pre-Rating Average Pre Chess Rating of Opponents
1               1794                                  1605
2               1553                                  1469
3               1384                                  1564
4               1716                                  1574
5               1655                                  1501
6               1686                                  1519
7               1649                                  1372
8               1641                                  1468
9               1411                                  1523
10              1365                                  1554
11              1712                                  1468
12              1663                                  1506
13              1666                                  1498
14              1610                                  1515
15              1220                                  1484
16              1604                                  1386
17              1629                                  1499
18              1600                                  1480
19              1564                                  1426
20              1595                                  1411
21              1563                                  1470
22              1555                                  1300
23              1363                                  1214
24              1229                                  1357
25              1745                                  1363
26              1579                                  1507
27              1552                                  1222
28              1507                                  1522
29              1602                                  1314
30              1522                                  1144
31              1494                                  1260
32              1441                                  1379
33              1449                                  1277
34              1399                                  1375
35              1438                                  1150
36              1355                                  1388
37               980                                  1385
38              1423                                  1539
39              1436                                  1430
40              1348                                  1391
41              1403                                  1248
42              1332                                  1150
43              1283                                  1107
44              1199                                  1327
45              1242                                  1152
46               377                                  1358
47              1362                                  1392
48              1382                                  1356
49              1291                                  1286
50              1056                                  1296
51              1011                                  1356
52               935                                  1495
53              1393                                  1345
54              1270                                  1206
55              1186                                  1406
56              1153                                  1414
57              1092                                  1363
58               917                                  1391
59               853                                  1319
60               967                                  1330
61               955                                  1327
62              1530                                  1186
63              1175                                  1350
64              1163                                  1263
# Export the dataframe to a CSV file
write_csv(export_df, file = "output_file_readr.csv")