Unclear Columns:
target_overs: All the rows has same no. of target overs i.e., 20
overs. As the target overs are same for all the matches, it is kept as
20 overs.But after reading documentation, I found that there are some
instances where match has been played for only 10 overs due to rain
effect. In that case, the target overs are changed from 20 to 10.
date: The date of the matches played is not specified in the data
itself. Without documentation, it’s unclear whether the macthes are in
played in afternoon or evening.
city: city column has some indian names and some international city
names. It is not clear that which matches are played overseas and which
are in domestic conditions.
Reasoning for Encoding:
The choice of encoding these data elements might be influenced by
factors such as standard practices, compatibility with existing systems,
or ease of data entry. For example, if we include time of match too, the
data will be vast. And by seeing the name of city, user can analyze
which country the city is in. This helps in reducing data
redundancy.
Consequences of Not Reading Documentation:
Without consulting the documentation, misinterpretations are likely.
Eliminator and Elimination Final both are same, we consider them as
different match types. Similarly due to different team names for a team,
the no. of teams became around 25 while there are actually 20.
Element Unclear After Documentation:
result_margin: some values has NA as result margin. It is unclear if
the match has been stopped or ended as a tie. Even after referring
documentation it is still unclear. There is no reason mentioned in the
documentation that explains this. This may have happened due to
unavailability of data or change of rules overtime.
match_type : The different names for a same type of match has been
used. For example, there are Eliminator, Elimination Final. We don’t
know the reason for this as the naming convention has been changed over
the years. This has happened to Team names too. For example Kings XI
Punjab and Punjab Kings are the same.
Visualization:
Let’s visualize the distribution of winner by result margin without
indicating same teams:
ggplot(data, aes(x = winner, y = result_margin, color = winner)) +
geom_point(size = 2) +
labs(title = "Scatter Plot of winner by result margin",
x = "winner", y = "result margin") +
theme_minimal()+
annotate("text", x = 17, y = 100,
label = "Teams are repeated",
color = "red",
size = 2.5)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Warning: Removed 19 rows containing missing values or values outside the scale range (`geom_point()`).

Here we can see that there are 5 teams with similar name but are
considered as different teams. This creates anamolies in analyzing data.
This also makes calculations and interpretations difficult. If we add
two teams of Punjab, it might beat Rajasthan in terms of result margin.
Just like this, there can be mixed interpretations for this.
Significant Risks and Mitigation:
Major risks are incorrect judgements while making computations. Main
purpose of giving the user an accurate data is disturbed due to this
issue. And Some times if an decision is made by considering the same
teams as different teams, we might get into issues just like I said in
the above visualization. If someone bets by using this information that
rajasthan has won more times with result margin of 100 than punjab. But
in reality punjab won more matches with result margin of 100 than
rajasthan. To mitigate this, we can process data by combining teams with
same cities as one team. But for that, first we have to know that both
teams are from same city and under same management.
Conclusion:
Understanding the nuances of data documentation is crucial for
accurate analysis and interpretation. Ambiguities in column names,
values, or formats can lead to misinterpretations and flawed
conclusions. By critically examining the data and referencing
documentation, analysts can ensure more reliable insights and minimize
risks associated with data ambiguity. Further investigation may be
needed to clarify unclear elements and improve the overall quality of
analysis.
LS0tCnRpdGxlOiAiQWhsYWRfV2VlazUiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCiMjIyMgVW5jbGVhciBDb2x1bW5zOgoKdGFyZ2V0X292ZXJzOiBBbGwgdGhlIHJvd3MgaGFzIHNhbWUgbm8uIG9mIHRhcmdldCBvdmVycyBpLmUuLCAyMCBvdmVycy4gQXMgdGhlIHRhcmdldCBvdmVycyBhcmUgc2FtZSBmb3IgYWxsIHRoZSBtYXRjaGVzLCBpdCBpcyBrZXB0IGFzIDIwIG92ZXJzLkJ1dCBhZnRlciByZWFkaW5nIGRvY3VtZW50YXRpb24sIEkgZm91bmQgdGhhdCB0aGVyZSBhcmUgc29tZSBpbnN0YW5jZXMgd2hlcmUgbWF0Y2ggaGFzIGJlZW4gcGxheWVkIGZvciBvbmx5IDEwIG92ZXJzIGR1ZSB0byByYWluIGVmZmVjdC4gSW4gdGhhdCBjYXNlLCB0aGUgdGFyZ2V0IG92ZXJzIGFyZSBjaGFuZ2VkIGZyb20gMjAgdG8gMTAuIAoKZGF0ZTogVGhlIGRhdGUgb2YgdGhlIG1hdGNoZXMgcGxheWVkIGlzIG5vdCBzcGVjaWZpZWQgaW4gdGhlIGRhdGEgaXRzZWxmLiBXaXRob3V0IGRvY3VtZW50YXRpb24sIGl04oCZcyB1bmNsZWFyIHdoZXRoZXIgdGhlIG1hY3RoZXMgYXJlIGluIHBsYXllZCBpbiBhZnRlcm5vb24gb3IgZXZlbmluZy4KCmNpdHk6IGNpdHkgY29sdW1uIGhhcyBzb21lIGluZGlhbiBuYW1lcyBhbmQgc29tZSBpbnRlcm5hdGlvbmFsIGNpdHkgbmFtZXMuIEl0IGlzIG5vdCBjbGVhciB0aGF0IHdoaWNoIG1hdGNoZXMgYXJlIHBsYXllZCBvdmVyc2VhcyBhbmQgd2hpY2ggYXJlIGluIGRvbWVzdGljIGNvbmRpdGlvbnMuCgojIyMjIFJlYXNvbmluZyBmb3IgRW5jb2Rpbmc6CgpUaGUgY2hvaWNlIG9mIGVuY29kaW5nIHRoZXNlIGRhdGEgZWxlbWVudHMgbWlnaHQgYmUgaW5mbHVlbmNlZCBieSBmYWN0b3JzIHN1Y2ggYXMgc3RhbmRhcmQgcHJhY3RpY2VzLCBjb21wYXRpYmlsaXR5IHdpdGggZXhpc3Rpbmcgc3lzdGVtcywgb3IgZWFzZSBvZiBkYXRhIGVudHJ5LiBGb3IgZXhhbXBsZSwgaWYgd2UgaW5jbHVkZSB0aW1lIG9mIG1hdGNoIHRvbywgdGhlIGRhdGEgd2lsbCBiZSB2YXN0LiBBbmQgYnkgc2VlaW5nIHRoZSBuYW1lIG9mIGNpdHksIHVzZXIgY2FuIGFuYWx5emUgd2hpY2ggY291bnRyeSB0aGUgY2l0eSBpcyBpbi4gVGhpcyBoZWxwcyBpbiByZWR1Y2luZyBkYXRhIHJlZHVuZGFuY3kuCgoKIyMjIyBDb25zZXF1ZW5jZXMgb2YgTm90IFJlYWRpbmcgRG9jdW1lbnRhdGlvbjoKCldpdGhvdXQgY29uc3VsdGluZyB0aGUgZG9jdW1lbnRhdGlvbiwgbWlzaW50ZXJwcmV0YXRpb25zIGFyZSBsaWtlbHkuIEVsaW1pbmF0b3IgYW5kIEVsaW1pbmF0aW9uIEZpbmFsIGJvdGggYXJlIHNhbWUsIHdlIGNvbnNpZGVyIHRoZW0gYXMgZGlmZmVyZW50IG1hdGNoIHR5cGVzLiBTaW1pbGFybHkgZHVlIHRvIGRpZmZlcmVudCB0ZWFtIG5hbWVzIGZvciBhIHRlYW0sIHRoZSBuby4gb2YgdGVhbXMgYmVjYW1lIGFyb3VuZCAyNSB3aGlsZSB0aGVyZSBhcmUgYWN0dWFsbHkgMjAuCgojIyMjIEVsZW1lbnQgVW5jbGVhciBBZnRlciBEb2N1bWVudGF0aW9uOgoKcmVzdWx0X21hcmdpbjogc29tZSB2YWx1ZXMgaGFzIE5BIGFzIHJlc3VsdCBtYXJnaW4uIEl0IGlzIHVuY2xlYXIgaWYgdGhlIG1hdGNoIGhhcyBiZWVuIHN0b3BwZWQgb3IgZW5kZWQgYXMgYSB0aWUuIEV2ZW4gYWZ0ZXIgcmVmZXJyaW5nIGRvY3VtZW50YXRpb24gaXQgaXMgc3RpbGwgdW5jbGVhci4gVGhlcmUgaXMgbm8gcmVhc29uIG1lbnRpb25lZCBpbiB0aGUgZG9jdW1lbnRhdGlvbiB0aGF0IGV4cGxhaW5zIHRoaXMuIFRoaXMgbWF5IGhhdmUgaGFwcGVuZWQgZHVlIHRvIHVuYXZhaWxhYmlsaXR5IG9mIGRhdGEgb3IgY2hhbmdlIG9mIHJ1bGVzIG92ZXJ0aW1lLgoKbWF0Y2hfdHlwZSA6IFRoZSBkaWZmZXJlbnQgbmFtZXMgZm9yIGEgc2FtZSB0eXBlIG9mIG1hdGNoIGhhcyBiZWVuIHVzZWQuIEZvciBleGFtcGxlLCB0aGVyZSBhcmUgRWxpbWluYXRvciwgRWxpbWluYXRpb24gRmluYWwuIFdlIGRvbid0IGtub3cgdGhlIHJlYXNvbiBmb3IgdGhpcyBhcyB0aGUgbmFtaW5nIGNvbnZlbnRpb24gaGFzIGJlZW4gY2hhbmdlZCBvdmVyIHRoZSB5ZWFycy4gVGhpcyBoYXMgaGFwcGVuZWQgdG8gVGVhbSBuYW1lcyB0b28uIEZvciBleGFtcGxlIEtpbmdzIFhJIFB1bmphYiBhbmQgUHVuamFiIEtpbmdzIGFyZSB0aGUgc2FtZS4KCiMjIyMgVmlzdWFsaXphdGlvbjogCkxldOKAmXMgdmlzdWFsaXplIHRoZSBkaXN0cmlidXRpb24gb2Ygd2lubmVyIGJ5IHJlc3VsdCBtYXJnaW4gd2l0aG91dCBpbmRpY2F0aW5nIHNhbWUgdGVhbXM6CgpgYGB7cn0KZ2dwbG90KGRhdGEsIGFlcyh4ID0gd2lubmVyLCB5ID0gcmVzdWx0X21hcmdpbiwgY29sb3IgPSB3aW5uZXIpKSArCiAgZ2VvbV9wb2ludChzaXplID0gMikgKwogIGxhYnModGl0bGUgPSAiU2NhdHRlciBQbG90IG9mIHdpbm5lciBieSByZXN1bHQgbWFyZ2luIiwKICAgICAgIHggPSAid2lubmVyIiwgeSA9ICJyZXN1bHQgbWFyZ2luIikgKwogIHRoZW1lX21pbmltYWwoKSsKICBhbm5vdGF0ZSgidGV4dCIsIHggPSAxNywgeSA9IDEwMCwgICAgICAgICAgICAgCiAgICAgICAgICAgbGFiZWwgPSAiVGVhbXMgYXJlIHJlcGVhdGVkIiwgICAgICAgICAgICAgCiAgICAgICAgICAgY29sb3IgPSAicmVkIiwgICAgICAgICAgICAgCiAgICAgICAgICAgc2l6ZSA9IDIuNSkrCiAgdGhlbWUoYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA5MCwgaGp1c3QgPSAxKSkKYGBgCgogSGVyZSB3ZSBjYW4gc2VlIHRoYXQgdGhlcmUgYXJlIDUgdGVhbXMgd2l0aCBzaW1pbGFyIG5hbWUgYnV0IGFyZSBjb25zaWRlcmVkIGFzIGRpZmZlcmVudCB0ZWFtcy4gVGhpcyBjcmVhdGVzIGFuYW1vbGllcyBpbiBhbmFseXppbmcgZGF0YS4gVGhpcyBhbHNvIG1ha2VzIGNhbGN1bGF0aW9ucyBhbmQgaW50ZXJwcmV0YXRpb25zIGRpZmZpY3VsdC4gSWYgd2UgYWRkIHR3byB0ZWFtcyBvZiBQdW5qYWIsIGl0IG1pZ2h0IGJlYXQgUmFqYXN0aGFuIGluIHRlcm1zIG9mIHJlc3VsdCBtYXJnaW4uIEp1c3QgbGlrZSB0aGlzLCB0aGVyZSBjYW4gYmUgbWl4ZWQgaW50ZXJwcmV0YXRpb25zIGZvciB0aGlzLgoKIyMjIyBTaWduaWZpY2FudCBSaXNrcyBhbmQgTWl0aWdhdGlvbjoKCiBNYWpvciByaXNrcyBhcmUgaW5jb3JyZWN0IGp1ZGdlbWVudHMgd2hpbGUgbWFraW5nIGNvbXB1dGF0aW9ucy4gTWFpbiBwdXJwb3NlIG9mIGdpdmluZyB0aGUgdXNlciBhbiBhY2N1cmF0ZSBkYXRhIGlzIGRpc3R1cmJlZCBkdWUgdG8gdGhpcyBpc3N1ZS4gQW5kIFNvbWUgdGltZXMgaWYgYW4gZGVjaXNpb24gaXMgbWFkZSBieSBjb25zaWRlcmluZyB0aGUgc2FtZSB0ZWFtcyBhcyBkaWZmZXJlbnQgdGVhbXMsIHdlIG1pZ2h0IGdldCBpbnRvIGlzc3VlcyBqdXN0IGxpa2UgSSBzYWlkIGluIHRoZSBhYm92ZSB2aXN1YWxpemF0aW9uLiBJZiBzb21lb25lIGJldHMgYnkgdXNpbmcgdGhpcyBpbmZvcm1hdGlvbiB0aGF0IHJhamFzdGhhbiBoYXMgd29uIG1vcmUgdGltZXMgd2l0aCByZXN1bHQgbWFyZ2luIG9mIDEwMCB0aGFuIHB1bmphYi4gQnV0IGluIHJlYWxpdHkgcHVuamFiIHdvbiBtb3JlIG1hdGNoZXMgd2l0aCByZXN1bHQgbWFyZ2luIG9mIDEwMCB0aGFuIHJhamFzdGhhbi4gVG8gbWl0aWdhdGUgdGhpcywgd2UgY2FuIHByb2Nlc3MgZGF0YSBieSBjb21iaW5pbmcgdGVhbXMgd2l0aCBzYW1lIGNpdGllcyBhcyBvbmUgdGVhbS4gQnV0IGZvciB0aGF0LCBmaXJzdCB3ZSBoYXZlIHRvIGtub3cgdGhhdCBib3RoIHRlYW1zIGFyZSBmcm9tIHNhbWUgY2l0eSBhbmQgdW5kZXIgc2FtZSBtYW5hZ2VtZW50LgoKIyMjIyBDb25jbHVzaW9uOiAKVW5kZXJzdGFuZGluZyB0aGUgbnVhbmNlcyBvZiBkYXRhIGRvY3VtZW50YXRpb24gaXMgY3J1Y2lhbCBmb3IgYWNjdXJhdGUgYW5hbHlzaXMgYW5kIGludGVycHJldGF0aW9uLiBBbWJpZ3VpdGllcyBpbiBjb2x1bW4gbmFtZXMsIHZhbHVlcywgb3IgZm9ybWF0cyBjYW4gbGVhZCB0byBtaXNpbnRlcnByZXRhdGlvbnMgYW5kIGZsYXdlZCBjb25jbHVzaW9ucy4gQnkgY3JpdGljYWxseSBleGFtaW5pbmcgdGhlIGRhdGEgYW5kIHJlZmVyZW5jaW5nIGRvY3VtZW50YXRpb24sIGFuYWx5c3RzIGNhbiBlbnN1cmUgbW9yZSByZWxpYWJsZSBpbnNpZ2h0cyBhbmQgbWluaW1pemUgcmlza3MgYXNzb2NpYXRlZCB3aXRoIGRhdGEgYW1iaWd1aXR5LiBGdXJ0aGVyIGludmVzdGlnYXRpb24gbWF5IGJlIG5lZWRlZCB0byBjbGFyaWZ5IHVuY2xlYXIgZWxlbWVudHMgYW5kIGltcHJvdmUgdGhlIG92ZXJhbGwgcXVhbGl0eSBvZiBhbmFseXNpcy4KCg==