It is important to consider which laws and regulations might be relevant, what these laws are designed to protect or accomplish, and what the impact may be of not taking them into account.
It should be clear who will be accountable to minimize the harm that could be done by the project. Accountability includes ensuring the project team proactively identifies potential stakeholders and evaluates harms such as possible disproportionate effects that may arise from the application of a model.
Communicating to others what data science can and cannot do
Helping non-technical individuals understand the ethical, professional and technical issues relevant to a project
Leaders ensuring that their organisation as a whole understands the ethical principles and policies surrounding data science
While the need for anonymity is not new to the computing field, the thought process with respect to how to ensure anonymity must be re-examined with the emergence of advanced data science linking techniques. In short, consideration should be given to how privacy will be maintained through the transmission, storage and merging of the data.
Being able to access and collect data does not mean that it is ethical to use that data. Hence, care must be taken to understand who owns the data, what are their rights and expectations, and is the data being used the way that the person (or entity) that contributed the data intended?
Considering simpler models and documenting the performance of different models tested
Disclosing the reasons for any improvements in accuracy if one model is favoured over another
Data science machine learning models can be built using data that has a bias, and thus, the model might also learn this bias. For example, the use of machine learning algorithms has shown the capability of inheriting racial and gender biases.
Fully understand error, bias and risks associated with models applied to environmental problems**
Evaluate the energy-cost of storing and processing large volumes of data, and consider different options such as cloud computing, aggregating data and regularly reviewing the need for data**
Share appropriate code, diagnostics and results with colleagues/customers to ensure they sufficiently understand the models.
Encouraging challenge from others involved in the project
Publish results publicly where possible
Seeking feedback from peers, academia or subject matter experts
Most predictive models are statistical in nature. They provide no guarantees; rather, they tell us about areas where an increased probability of an outcome might guide us to act differently. With this in mind, the data science project manager should ensure that the analytical decisions are made as a result of a data science project reflects the scale, accuracy and precision of the data that was used in creating the model.