Final Project
Analysis of TCG data
Introduction
In the last few years there has been a raise in the number of people who participate in the PTCG market, from those who are buying and selling cards. With this recent growth in popularity because of the increased demand many card prices have seen drastic increases in prices. For those who participate in this many are those who wish to collect new and rare cards, others are participate because they have created a hobby of collecting, while others buy in for the profit that can be made. Over the past year there have been a growing number of incidents of scalpers joining the market and inflating prices. Through using this data set I want to look at factors such as average price, rarity, and other variables per each card that may be influenced through the current market condition, and see any trends that may emerge. The website I used to get the data was https://docs.pokemontcg.io/.
Analysis Question
Is there any correlation between a higher price and other variables such as rarity, set, super type, and of the various price variables.
Data Dictionary
``{r} #| label: Data Dictionary Creation #| echo: FALSE
library(tidyverse)
library(httr)
library(jsonlite)
library(purrr)
tcg_data <- read_csv(“https://myxavier-my.sharepoint.com/:x:/g/personal/rylec_xavier_edu/Ec0aL0uXxelFhoH8rGdXJwABSAwPdJlLvO2dOntdEDjnAA?e=P9pcp3”)
data_dictionary_tcg <- tibble( Column_Name = colnames(tcg_df), Data_Type = sapply(tcg_df, class), Description = c( “Unique identifier for each card”, “Name of the card”, “Supertype of the card (e.g., Pokemon, Trainer, Energy)”, “Subtypes of the card (e.g., Basic, Stage 1)”, “Hit points (HP) of the card”, “Types of the card (e.g., Fire, Water)”, “Rarity of the card (e.g., Common, Rare, Holo)”, “National Pokedex numbers associated with the card”, “Converted retreat cost of the card (in energy units)”, “Lowest price for a holofoil card”, “Mid-range price for a holofoil card”, “Highest price for a holofoil card”, “Market price for a holofoil card”, “Primary type of the card (extracted from types)”, “Secondary type of the card (if available, extracted from types)” ) )
kable(data_dictionary_tcg) ```
Summary Statistics
```{r} #| label: Summary Statistics #| echo: FALSE
library(knitr) library(kableExtra)
selected_numeric <- tcg_data[, c(“hp”, “convertedRetreatCost”, “price_low”, “price_mid”, “price_high” , “price_market”)]
summary_df <- as.data.frame(t(summary(selected_numeric)))
kable(summary_df, format = “html”, digits = 2, caption = “Summary Statistics”) %>% kable_styling(“striped”, full_width = FALSE, position = “center”) %>% row_spec(0, bold = TRUE)
```
Descriptive Statistics
This section explores key patterns in the TCG dataset using five visualizations. We examine how card rarity relates to market price, the frequency of different card types, and the distribution of rarities across types. We also compare card prices by type and present a faceted boxplot showing price variations by rarity and price tier. Together, these visuals highlight core trends in rarity, type, and value within the card market.
```{r} #| label: Library Loading #| echo: FALSE
library(ggplot2) library(dplyr) library(corrplot) library(tidyverse) ```
Card Rarity by Market Price
The following box plot looks at the Rarities for cards and the Market Price. The aim is to see which rarities have the highest spread of cost per cards. The output shows us that the rarities with the highest market prices include Rare Holo, and Rare Holo EX being the 2 top categories. It is interesting to see cards that could be considered non ex would have higher prices than regular holos. But it also could be looked at in an alternative outlook to see what factors may drive this difference, as well as between other categories.
```{r} #| label: Market Price by Rarity #| echo: FALSE ggplot(tcg_df, aes(x = rarity, y = price_market)) + geom_boxplot(fill = “lightblue”) + labs(title = “Market Price by Rarity”, x = “Rarity”, y = “Market Price”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ```
Frequency of Card Types
The following bar chart looks at the frequency of the Pokemon typing. The outlook of the graph shows that the type with the highest frequency is grass, followed by fire and water for direct typing. This could be representative of the popularity of certain Pokemon from these types. or these being the main 3 types. Another selection of colorless also appears frequently, the colorless should be the normal typing, which shows up frequency in many packs. This could be from it being a common card in each pack allowing it to appear more often.
```{r} #| label: Frequency of Card Types #| echo: FALSE ggplot(tcg_df, aes(x = type_1)) + geom_bar(fill = “coral”) + labs(title = “Frequency of Card Types”, x = “Primary Type”, y = “Count”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ```
Type Vs Rarity
The following bar graph shows an output of the distribution of the Rarity of cards across typing. The main reason this is useful is that it helps identify which Pokémon types are more likely to have rare cards, offering insights into potential value trends and the design balance of the trading card game. From this it may not be uprising to see that grass has one of the largest counts as in the last graph it did have the largest frequency. It is interesting to see the grass rarity of Rare Holo is high, and the same being for water, but while looking at colorless the high is for promo cards, and for fire it is Rare Holo EX.
```{r} #| label: Type Vs Rarity #| echo: FALSE ggplot(tcg_df, aes(x = type_1, fill = rarity)) + geom_bar(position = “dodge”) + labs(title = “Distribution of Rarity Across Card Types”, x = “Primary Type”, y = “Count”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ```
Price by Type
The following chart shows an output of card prices by Pokemon card type. The output can show important typing for which cards and typing may be more valuable for someone who is in the market. The main output for this is that most typing have a range between near $0 and about $50 dollars. Further analysis to take a look at the average amount of cards per $10 increments may be interesting to gain more insight. But for typing like grass with high frequency and count with a Holo Rarity it is interesting to see the spread so low for that typing. While other typing with lower frequency have a larger array of prices for them. This could indicate that there is a pattern with cards and their typing being more expensive the less frequent, but to answer that you would need a more in depth analysis.
```{r} #| label: Price by Type #| echo: FALSE ggplot(tcg_df, aes(x = type_1, y = price_market)) + geom_boxplot(fill = “skyblue”) + labs( title = “Market Price by Primary Card Type”, x = “Primary Type”, y = “Market Price” ) + theme_minimal() + theme( axis.text.x = element_text(angle = 45, hjust = 1) ) ```
Price by Rarity, Faceted by Price Type
The following analysis aims to illustrate the price distribution by rarity and price category, offering valuable insights into how different rarity levels influence card values across low, mid, and high price estimates. This approach enables a clearer comparison of price variability both within and between rarities. Notable observations include the fact that high-priced cards with the Holo rarity tend to have the highest prices, while Holo EX cards exhibit a larger median price spread. Across the other price segments, there appears to be a consistent pattern in the spread of values for each category.
```{r} #| label: Price by Type #| echo: FALSE price_long <- tcg_df %>% select(rarity, price_low, price_mid, price_high) %>% pivot_longer(cols = starts_with(“price_”), names_to = “price_type”, values_to = “price”)
Boxplot: Price by Rarity, Faceted by Price Type
ggplot(price_long, aes(x = rarity, y = price)) + geom_boxplot(fill = “lightgreen”) + facet_wrap(~ price_type, scales = “free_y”) + labs( title = “Price Distribution by Rarity and Price Type”, x = “Rarity”, y = “Price” ) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ```
Secondary Data Source
The secondary data source I am going to use is the Magic the Gathering api card set.
```{r} #| label: Secondary Data Data Set Loading #| echo: FALSE
mtg_api <- read_csv(“https://myxavier-my.sharepoint.com/:x:/g/personal/rylec_xavier_edu/EYo28ntV491AjYC3WpDOFRkBGxntE3LCWdO-IRJZHUWzEg?e=zft1L0”)
```
#| label: Type by Rarity #| echo: FALSE
library(ggplot2)
ggplot(mtg_df, aes(x = type_line, fill = rarity)) + geom_bar(position = “dodge”) + labs( title = “Card Type Count by Rarity”, x = “Card Type”, y = “Count”, fill = “Rarity” ) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
What I find interesting about these two is that while at first both games may seem similar—they’re both collectible card games with strategic elements—they are in fact quite different, both in terms of gameplay mechanics and the communities that engage with them. This graph highlights that difference effectively. While in the Pokémon TCG (pkn_tcg), the card types are more limited and standardized (such as Pokémon, Trainer, and Energy), in Magic: The Gathering (mgck_tcg), there is a much wider variety of card types, including Creatures, Enchantments, Instants, Sorceries, Artifacts, and more.
One particularly notable pattern is the relationship between card type and rarity. For example, among Enchantment and Creature cards, the Rare rarity appears with higher frequency. This could indicate that non-creature card types, like Enchantments, tend to be designed with more specialized or powerful effects, making them less common and more valuable. On the other hand, Creature cards—being central to gameplay—are produced in a broader range of rarities but still show a noticeable presence in the Rare category. This trend suggests that Magic: The Gathering places emphasis on a broader strategic diversity and complexity in its card design, compared to the more role-focused card type system of Pokémon.
Conclusion
As the Pokémon Trading Card Game (PTCG) market continues to grow in popularity, driven by collectors, hobbyists, and profit-seekers alike, understanding the factors that influence card prices has become increasingly important. Through our analysis of the dataset, several trends emerged that offer insights into how rarity, type, and market conditions impact card value.
Our visualizations revealed that rarer cards, particularly Rare Holo and Rare Holo EX, tend to command higher market prices, though their price variability differs Holo EX cards showed a broader spread in high end price estimates. Additionally, certain Pokémon types like Grass and Fire appeared more frequently, but this high frequency did not always translate into higher prices. In contrast, less common types sometimes exhibited wider price ranges, suggesting that scarcity within type may contribute to value.
The distribution of rarity across types highlighted which Pokémon are more likely to appear in high value rarities, with Fire and Water types often associated with premium cards. Finally, faceted boxplots comparing price tiers across rarities provided a clear view of how value is distributed across the market from low to high end cards nreinforcing the influence of rarity and highlighting pricing consistencies and outliers.
Altogether, these findings reflect the complexity of the current PTCG market, where card value is not solely determined by rarity or type, but by a combination of factors influenced by both consumer demand and collector behavior. As market conditions continue to evolve, further analysis could explore how external influences such as promotional releases or resale trends impact these dynamics over time.