This document was made using Rmarkdown (Xie, Allaire, and Grolemund 2018) and Reticulate (Ushey, Allaire, and Tang 2020) as an interface to Python. All plots are made using Plotly (Inc. 2015). Along the right side of the document are buttons to hide/show the code.
The data used in these visualizations was provided by Jim McDonald using the chaind tool: https://github.com/wealdtech/chaind. Go to the medalla-data-challenge channel of the ethstaker discord to find links to various sized .dmp files and instructions to load them into a psql database.
My understanding of the data comes mostly from the Ethstaker discord and this blog post. I skip some code chunks for brevity; if you want to see the complete code, please see the rmarkdown document in this repo.
from plotly import graph_objects as go
import pandas as pd
import numpy as np
import plotly.io as pio
import pprint
from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pyspark.sql.functions as F
pp = pprint.PrettyPrinter()
spark = (
SparkSession.builder
.appName("chain")
.master("local[4]")
.config("spark.driver.memory", '8G')
.config("spark.driver.maxResultsSize", "2G")
.config("spark.kryoserializer.buffer.max", "500m")
.config("spark.jars", "/mnt/sda5/java_jars/postgresql-42.2.16.jar")
.getOrCreate()
)
dbname = "chain-388705"
with open('/mnt/sda5/git_repos/medalla-viz/.user-chain-cred.txt', 'r') as f:
password = f.readline().rstrip("\n")
# To load a whole table
dbloader = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://127.0.0.1:5432/{}".format(dbname)) \
.option("user", "chain") \
.option("password", password) \
.option("driver", "org.postgresql.Driver")
# If we need to query parts of large tables (t_attestations is the only real offender)
dbquery = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://127.0.0.1:5432/{}".format(dbname)) \
.option("user", "chain") \
.option("password", password) \
.option("driver", "org.postgresql.Driver")
The database contains many tables with information on committees, validators, attestations, slashings, blocks and more.
The two tables we will look at are t_blocks and t_attester_slashings, which contain information about blocks and slashing offenses as a result of bad attestations (I do not include proposer slashings, as there are very few examples in the database). The two types of slashable offences contained in the data are surround votes and double votes as described in the “Slashable Offences” section of the above mentioned article. It will probably be useful to review the section on beacon chain checkpoints.
We load the tables using pyspark. Taking a look at the columns…blocks = dbloader.option("dbtable", "t_blocks").load()
attSlashings = dbloader.option("dbtable", "t_attester_slashings").load()
block_cols = ", ".join(blocks.columns)
slash_cols = ", ".join(attSlashings.columns)
Block Columns: f_slot, f_proposer_index, f_root, f_graffiti, f_randao_reveal, f_body_root, f_parent_root, f_state_root, f_eth1_block_hash, f_eth1_deposit_count, f_eth1_deposit_root
Slashing Columns: f_inclusion_slot, f_inclusion_block_root, f_inclusion_index, f_attestation_1_indices, f_attestation_1_slot, f_attestation_1_committee_index, f_attestation_1_beacon_block_root, f_attestation_1_source_epoch, f_attestation_1_source_root, f_attestation_1_target_epoch, f_attestation_1_target_root, f_attestation_1_signature, f_attestation_2_indices, f_attestation_2_slot, f_attestation_2_committee_index, f_attestation_2_beacon_block_root, f_attestation_2_source_epoch, f_attestation_2_source_root, f_attestation_2_target_epoch, f_attestation_2_target_root, f_attestation_2_signature
…we can see we have information about the slot number and hash of the block root in the blocks table. The slashings table has information about pairs of attestations that resulted in a slashing, tagged as attestation_1 and attestation_2. In each row, we will find validator ids (f_attestation_(1/2)_indices) that made these attestations, there will be at least one that did both, resulting in a slashable offense. Also, we have the block root hashes of the sources and targets of all attestations.
A more complete explanation of the columns can be found in this article.
I want a table with the slot # for the source and targets for each pair of attestations, so I perform several joins based on the hash of the block root to append the slot # information, as well as compute the intersect of the columns containing the validator ids to get the slashed validators:
slashings_blocks = (
attSlashings
# Get the slot # for the source of first attestation ...
.join(
blocks.select(F.col("f_root").alias("f_attestation_1_source_root"),
F.col("f_slot").alias("src_1_slot"),
F.col("f_parent_root").alias("src_1_parent_root")
),
on = "f_attestation_1_source_root"
)
# ... slot # for the target of first attestation ...
.join(
blocks.select(F.col("f_root").alias("f_attestation_1_target_root"),
F.col("f_slot").alias("tgt_1_slot"),
F.col("f_parent_root").alias("tgt_1_parent_root")
),
on = "f_attestation_1_target_root"
)
#... slot # for the source of second attestation ...
.join(
blocks.select(F.col("f_root").alias("f_attestation_2_source_root"),
F.col("f_slot").alias("src_2_slot"),
F.col("f_parent_root").alias("src_2_parent_root")
),
on = "f_attestation_2_source_root"
)
#... slot # for the target of second attestation ...
.join(
blocks.select(F.col("f_root").alias("f_attestation_2_target_root"),
F.col("f_slot").alias("tgt_2_slot"),
F.col("f_parent_root").alias("tgt_2_parent_root")
),
on = "f_attestation_2_target_root"
)
# ... and the indices of the slashed validators.
.withColumn("slashed_validators", F.array_intersect('f_attestation_1_indices', 'f_attestation_2_indices'))
)
py$sb_df %>%
dplyr::select(c('src_1_slot', 'tgt_1_slot', 'src_2_slot', 'tgt_2_slot', 'slashed_validators')) %>%
head(20) %>%
DT::datatable(options = list(dom = 'tp'))
You can see most pairs of attestations have only a single offender, though there are some with multiple.
To plot the slashing activity, I need to include slots that did not have any slashing activity, so I join the dataframe to a single column dataframe containing all slot numbers, I also computing a rolling average of slashings across 500 slots to smooth out the plot.
# number of validators slashed for a given bad pair of attestations
sb_df['n_slashed_validators'] = sb_df['slashed_validators'].apply(len)
# total number of slashed validators per slot
join_df = sb_df.groupby(['f_inclusion_slot'], as_index = False).agg({"n_slashed_validators":"sum"})
# join to create zero rows for slots with no slashings
slot_df = pd.DataFrame({"slot":range(sb_df.f_inclusion_slot.max())})
slashing_activity = pd.merge(slot_df, join_df, left_on="slot", right_on = "f_inclusion_slot", how="left").fillna(0)
# create a rolling average and downsample for a more descriptive plot
slashing_activity['avgslashings'] = slashing_activity.rolling(500)['n_slashed_validators'].mean()
slashing_activity_ds = slashing_activity.iloc[::5,:]
The roughtime event
The medalla chain went through a bit of a rough patch described here. As such it seems useful to separate out the data from the slots corresponding to this event. I eyeball it and choose slots 100000 to 150000 to separate out. The graph clearly shows the roughtime event, click the legend entry to toggle it off and see only the data for ‘normal operation’.
# plot the roughtime event separately
roughtime_df = slashing_activity_ds.query("slot > 100000 & slot < 150000")
normal_df = slashing_activity_ds.query("slot < 100000 | slot > 150000")
# plot slashing activity
fig = go.Figure(
[go.Scatter(x = roughtime_df['slot'],
y = roughtime_df['avgslashings'],
mode = 'lines', line = {'color':"red"}, name = "Roughtime event"),
go.Scatter(x = normal_df['slot'],
y = normal_df['avgslashings'],
mode = 'lines', line = {'color':"blue"}, name = "Normal operation")],
layout = go.Layout(xaxis_title="Slot", yaxis_title="Average slashings over 500 slots")
)
pio.write_html(fig, file='./html_plots/average_slashings.html', auto_open=False)
I decided to look into the distances between the sources and targets for pairs of attestations. Specifically, what is the difference between the slot index between the source of attesation 1 and the source of attestation 2 (and similarly for the targets). For each combination of these differences, I also get the total number of slashings occuring at that combination.
# differences in source and target between two attestations
sb_df['diff_src'] = sb_df['src_1_slot'] - sb_df['src_2_slot']
sb_df['diff_tgt'] = sb_df['tgt_1_slot'] - sb_df['tgt_2_slot']
# how many slashings occurred for pairs of attestations with this particular combination of distances?
sb_df['size'] = sb_df.groupby(['diff_src', 'diff_tgt'])['src_1_slot'].transform("size")
sb_df['nslashed_by_dist'] = sb_df.groupby(['diff_src', 'diff_tgt'])['n_slashed_validators'].transform("sum")
# (USED LATER) number of validators making similar attestations and distribution of attestation pairs by number of validators making similar attestations.
sb_df['n_attestations_1'] = sb_df['f_attestation_1_indices'].apply(len)
sb_df['n_attestations_2'] = sb_df['f_attestation_2_indices'].apply(len)
sb_df['nslashed_by_natts'] = sb_df.groupby(['n_attestations_1', 'n_attestations_2'])['n_slashed_validators'].transform("sum")
# separate roughtime and normal for plotting
df_roughtime = sb_df.query("f_inclusion_slot > 100000 & f_inclusion_slot < 150000")
df_normal = sb_df.query("f_inclusion_slot < 100000 | f_inclusion_slot > 150000")
# for plotting the number of
plot_df_roughtime = df_roughtime.groupby(['diff_src', 'diff_tgt'], as_index = False).agg('n_slashed_validators').sum()
plot_df_normal = df_normal.groupby(['diff_src', 'diff_tgt'], as_index = False).agg('n_slashed_validators').sum()
py$sb_df %>%
dplyr::select(c("diff_src", "diff_tgt", "nslashed_by_dist", "n_attestations_1", "n_attestations_2", "nslashed_by_natts")) %>%
head(20) %>%
DT::datatable(options = list(dom = 'tp'))
fig = go.Figure(
[go.Scatter(x=plot_df_normal['diff_src'], y=plot_df_normal['diff_tgt'], mode = 'markers',
marker = dict(size=np.log2(plot_df_normal['n_slashed_validators'])/np.log2(1.5)+10),
hovertemplate =
'<b>diff src</b>: %{x}'+
'<br><b>diff_tgt</b>: %{y}<br>'+
'<b>Number of slashed pairs: %{text}</b>',
text = plot_df_normal['n_slashed_validators'],
name = "Normal Operation",
showlegend=True),
go.Scatter(x=plot_df_roughtime['diff_src'], y=plot_df_roughtime['diff_tgt'], mode = 'markers',
marker = dict(size=np.log2(plot_df_roughtime['n_slashed_validators'])/np.log2(1.5)+10),
hovertemplate =
'<b>diff src</b>: %{x}'+
'<br><b>diff_tgt</b>: %{y}<br>'+
'<b>Number of slashed pairs: %{text}</b>',
text = plot_df_roughtime['n_slashed_validators'],
name = "Roughtime",
showlegend=True)],
go.Layout(
xaxis_title = "Slot distance between sources",
yaxis_title = "Slot distance between targets",
title = {"text":"Source and target slot distances for pairs of attestations resulting in one or more slashings",
"y":0.85, "xanchor":"center", "x":0.5},
legend={"orientation":"h", "yanchor":"top", "y":0.99, "x":0.005}
)
)
pio.write_html(fig, file='./html_plots/slashing_distances.html', auto_open=False)
The plot shows that the roughtime slashings occurred exclusively with pairs of attestations for which the source was the same but the target differed. Again click the legend to toggle on/off the roughtime data. Of note is also that the ‘normal operation’ slashing pairs targets differ by roughly multiples of 32, the number of slots in an epoch, while the target slots differ more irregularly, I only have a rough guess as to why this is.
According to the data, there is a single slashing corresponding to a ‘surround vote’, the single point in the top left quadrant.
Finally, I’ll plot where the slashings lie in terms of: Given a pair of attestations resulting in a slashing, how many validators attested to the first attestation, and how many validators attested to the second attestation, again separated by roughtime vs. normal operation. Additionally, I’ll show how many slashings have resulted from pairs of attestations with a particular combination of validator counts.
fig = go.Figure(
[
go.Scatter(x=df_normal['n_attestations_1'], y=df_normal['n_attestations_2'],
mode='markers',
marker=dict(
color=np.log2(df_normal['nslashed_by_natts']),
line=dict(color='black', width=0.3),
colorbar=dict(title="# slashings <br> (log2, normal)", len = 0.25, y=0.75, ticktext=None)
),
text=df_normal['nslashed_by_natts'],
name = "Normal Operation"),
go.Scatter(x=df_roughtime['n_attestations_1'], y=df_roughtime['n_attestations_2'],
mode='markers',
marker=dict(
color=np.log2(df_roughtime['nslashed_by_natts']),
symbol="cross",
line=dict(color='black', width=0.3),
colorbar=dict(title="# slashings <br> (log2, roughtime)", len = 0.25),
colorscale='magma'
),
text=df_roughtime['nslashed_by_natts'],
name = "Roughtime")
],
go.Layout(
autosize=False,
width=900,
height=900,
xaxis_title="Number of validators making attestation 1",
yaxis_title="Number of validators making attestation 2",
title = {"text":"For a pair of attestations resulting in a slashing, <br>how many other validators made the same attestation as the first and/or second attestation?", "y":0.92, "xanchor":"left", "x":0.1}
)
)
pio.write_html(fig, file='./html_plots/voting_group_sizes.html', auto_open=False)
The most common occurrence of a slashing is a validator making two orphan attestations (ones no one else voted for, bottom left point). We can see during the roughtime event that the number of validators making each attestation is more balanced. In normal operation, the most common case is that a validator makes one orphan attestation in addition to another that is also made by other validators.
Thanks for reading, I hope this has been informative and/or interesting. I am no expert, so please let me know of any mistakes and read the linked articles to hear it from people who are.
Inc., Plotly Technologies. 2015. “Collaborative Data Science.” Montreal, QC: Plotly Technologies Inc. 2015. https://plot.ly.
Ushey, Kevin, JJ Allaire, and Yuan Tang. 2020. Reticulate: Interface to ’Python’. https://CRAN.R-project.org/package=reticulate.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Chapman and Hall. https://bookdown.org/yihui/rmarkdown.