Error and Bias in Data Science

Author

Shanteé Enitencio

Introduction

In today’s data-driven world, the ability to collect, analyze, and visualize data is essential for understanding the complexity of the world in which we live. Effective data visualization maximizes the value of data; however, bad actors and the misuse of data have serious and lasting repercussions on society. Therefore, it is imperative that ethical standards be utilized to ensure a better life for us all.  

The Effects of Good and Bad Data

Compiling, organizing, and utilizing data to interpret the world around us is imperative for us all. From understanding the effectiveness of medicine, protecting democracy and the economy, to sending rockets into space, data analysis, and thereby data visualization, are essential to us all. Data visualization allows us to be more efficient and understand the world around us in a digestible format; however, this is not always the case. The world is riddled with poor data presentation that spurs misinformation, the waste of resources, and the erosion of trust in the providers of the information. This highlights the critical importance of good data to maximize the value of data-driven insights.

\(Figure .1\)

The following figure is the total deaths from COVID-19 compared to Gun Violence Victims over time

This figure violates several of the rules established by the general rules of data visualization:

1. Manipulation of scales (McNutt et al.,2020)

2. Biases in interpretation (McNutt et al.,2020)

3. Distortion & Confusion on nonexistent axis labeling with double unscaled axes (Lo et al., 2022)

4. Garbage-in, the dataset is measuring two distinct variable over time, very few observations to make any clear conclusion.

5. Dual Axis

\(Figure.2\)

This figure violates several of the rules established by the general rules of data visualization:

1. Biases in interpretation (McNutt et al.,2020)

2. Distortion & Confusion (Lo et al., 2022)

3. Garbage-in, the dataset is measuring US and China total deaths for 24 dates and saying the days with the most deaths happened at school(Lo et al., 2022)

When bias or error enters data visualization, it distorts the truth and misleads those who relied on the information it presented. For businesses specifically, this can be disastrous because it could lead them to have more confidence in taking certain decisions that could result in the misallocation of resources, amounting to billions of dollars and hundreds of productive hours lost. Additionally, it can result in a damaged reputation and decreased trust from the companies’ stakeholders if they put their money and time into a company that they thought was strong, only to later learn that this was not the case. Ensuring good data is fundamental to growth in all sectors, and all data scientists must uphold themselves to ensure the accuracy of their information and its presentation.

\(Figure.3.1\)

Different/ Fixed charts

\(Figure.3.2\)

Different/ Fixed charts

The Christian Obligation

This is also the case for Christian data scientist, who are called to be the earthly representation of Jesus Christ. Christians have the moral and spiritual obligation to reflect the light of God in anything and everything we do, regardless of the job or position we hold. It is part of our command to distinguish ourselves by our words and deeds and to not conform to earthly standards. If Christian data scientists conform to the same deceptive strategies as their non-Christian counterparts enact, then we are not reflective of the sacrificial suffering of Christ, and we become like everyone else. Matthew 5: 13 – 16 calls us to be the salt and light of the world and emphasizes that both salt without flavor and a light that is in darkness are not of use because they becomes like everything else. It really is our duty to live by his commands like in 1 Peter 1: 15-16 where it is written “Be holy, because I am holy.” As Christians, we cannot claim to have Christ or be followers of Christ if we do not represent him by our actions and deeds especially if those deeds affect others. Lastly, the Bible calls us to be just in everything we do and to not be a stumbling block for any of our brethren. This means that we should not be the people that  lie, cheat and steal just to gain favor or money.  All data scientists, especially Christian ones, should strive to ensure that all of the data they use and all of the outcomes predicted should be unbiased, accurate, and representative of the actual underlying facts that each point portrays.

\(Figure.4\)

This figure violates several of the rules established by the general rules of data visualization:

1. Manipulation of scales (McNutt et al.,2020)

2. Biases in interpretation (McNutt et al.,2020)

3. Selective data and color pallete to make airports seem more dangerous than they actually are and to make school shootings seem less severe.(Lo et al., 2022)

\(Figure.5\)

Different/ Fixed charts

Conclusion

In conclusion, effective data visualization is not simply about conveying some random information but is really about fostering understanding to allow people to make more informed decisions in their everyday lives. Because the world is now more data-driven than ever, whether the data is in business, academics, health, or the public sector, it is the data scientist’s ethical obligation to be the light and the salt and to contribute to the enhancement of society at large. Therefore, data visualization is the art and science of telling a story using data because a story without data is not believable, and data without a story is unusable.  

  

Sources Used

Camba, J.D., Company, P., & Byrd, V.L. (2022). Identifying Deception as a Critical Component of Visualization Literacy, IEEE Computer Graphics and Applications, (42)1, 116-122. doi: 10.1109/MCG.2021.3132004. 

Kulbe, E. (2023). Shooting (1982-2023) - Cleaned. Kaggle. Retrieved from www.kaggle.com/datasets/eimadevyni/shooting-1982-2023-cleaned.

Lo, L. Y. H., Gupta, A., Shigyo, K., Wu, A., Bertini, E., & Qu, H. (2022, June). Misinformed by visualization: What do we learn from misinformative visualizations?. In Computer Graphics Forum (Vol. 41, No. 3, pp. 515-525). 

McNutt, A., Kindlmann, G., & Correll, M. (2020, April). Surfacing visualization mirages. In Proceedings of the 2020 CHI Conference on human factors in computing systems (pp. 1-16). 

Nguyen, V. T., Jung, K., & Gupta, V. (2021). Examining data visualization pitfalls in scientific publications. Visual computing for industry, biomedicine, and art, 4(1), 27. https://doi.org/10.1186/s42492-021-00092-y 

Our World in Data. (2024). COVID-19 Data [Data file]. Retrieved from https://github.com/owid/covid-19-data/blob/master/public/data/cases_deaths/total_deaths.csv