Ethics to Me:

Ethics in research in my opinion involves varying levels of consideration-  research study participants, other stakeholders (not including research participants), and researcher themselves- their own ethical principles and approach to research.  Research of digital data includes all of these considerations as well as other unique considerations that accompany what are called  the big three of digital data- volume, variety, and velocity.  A special consideration of digital data is the nature of how it is collected- known as unobtrusive or covert research, digital data can be accomplished with no knowledge from “participants”.   In these research cases the minimum standard should be comprehensive informed consent.   All research should include respect for participants’ privacy and as Marie Wallace mentions in her TED Talk, transparency between researchers and participants.  Research involving digital data can make a researcher feel removed from the actual participants.  Digital data also can come from sources where participants freely share information with varying degrees of what they consider private.  There is also consideration for the knowledge gap between participants’ level of digital literacy.  Some digital data is produced by individuals who have no knowledge of who has access to and/or what protections, if any,  are in place.    Although digital data does not include face to face interaction between researcher and participant, the data should still be seen as property of the owner and ensuring that less than minimal harm occurs and perhaps even greater transparency and focus on an individual’s privacy should be considered.

Digital research and the IRB:

The IRB tries to ensure that research is carried out in a manner which protects  participants involved in research studies from harm.  The IRB has strict rules and guidelines established for research proposals in non-digital spaces but the dynamic nature of digital spaces and a lack of clearly defining and regulating digital spaces makes IRB involvement challenging.  Digital technologies evolve seemingly endlessly and through their evolution changes in how the technologies operate and how persons interact with them change as well.   IRBs have measures in place to protect participants’ digital data through best management practices, but  keeping up with how technology evolves and how users interact with that technology presents challenges- as mentioned by Veltri, 2019, “A further complication is given by the fact that these are fast moving targets, meaning that their options and settings are quickly evolving”.   In my opinion, it is not only up to the IRB to determine the safety and security of participants’ data but it is also up to the researchers themselves- the  PI and research team to be good data stewards, where they  rely on their own ethical principles and commitment to participant protection.  Also, in addition to the IRB researchers themselves can use the IRB as one step in a series of steps or checks to make sure that in digital spaces subjects will be protected.

Human subjects and Big Data:

Protecting human subjects when the research includes big data can be complicated.  Access to and methodologies used, such as the unobtrusive methods used to extract big data may  have little to no consideration for participants’ privacy and protections while others do.  Research that clearly defines and implements protocols to protect participants and/or their data helps, and some research may include pseudonymization, which can help additionally ensuring participant protections.  Even in cases where digital data is considered public, it is still up to researchers to protect subjects, so it can be considered less a question of “can we protect participants?” but rather “how do we ensure protection of subjects through IRB and other research processes?”  In terms of a threshold that would constitute that no subjects would be affected- that would depend on the research methodology used, although even in exempt studies the IRB as well as the research team know that there is potential for some level of potential harm, so the goal is to reduce to less than a minimum the opportunity for risks.

Protecting my subjects:

Protecting research subjects should be a key goal of all researchers.  Researchers can  protect their subjects by having a comprehensive protection plan:

  1. Ensure that the IRB uses proper review techniques for digital research
  2. Rely on peers to review your research proposal
  3. Review carefully prior research and strategies for protecting subjects (as well as common pitfalls to not protecting subjects)
  4. Make sure all researchers on team meet or exceed ethical standards through research process
  5. Implement research with highest standards of ethics involved
  6. As research progresses identify any additional ethical considerations and always maintain a commitment to protect subjects


Additional Resource: (another short video discussing Big Data and Privacy)

University of St. Thomas Minnesota. (Nov., 2014). Ethical insights: Big data and privacy, navigating benefits, risks, and ethical boundaries. Retrieved from


Veltri, G. A. (2020). Chapter 2, Unobtrusive vs Obtrusive Methods. Digital Social Research (pp. 38). Medford, MA: Polity Press.


6 comments on “The Ethics of Big Data

  • I like how you mentioned that it is still up to researchers to determine if their study is ethical and to have a commitment to participant protection. I feel like that is the biggest issue is with ethics. The IRB can only do so much.

    I also like the video you shared on Ethical Insights. It gave a better look into the risks and boundaries that weren’t really covered in the readings.

    • Thanks for your comments. I like to include other resources that are short and help with what content we are reviewing. I found the video interesting, the points on transparency and protection of participants I found very applicable. I enjoyed reading your post as well, the common themes across all the blogs are the importance of transparency, and also that with the public nature of digital spaces they can be much more difficult to ensure protection of participants.

  • You make a great point addressing varying degrees of digital literacy. Participants operate under different precepts and might not understand what others take as obvious.

    • Thanks for your comments. I think a significant part of the consideration of protecting research participants is transparency and the development of trust and empowering participants to know about the research being conducted- and in some cases even provide insight into how they feel about the research being done.

  • Hi Beni,
    thank you for these insights. In my own research, this has been the most difficult aspect to negotiate. On the one hand, the site I use is a public forum where anyone on the web can read the post. On the other hand, despite the publicness of this post, I doubt any one user would imagine that anyone is systematically looking at their words and trying to analyze their meaning. Usually, users feel that their words will go into obscurity. In the end I opted to pseudonym the pseudonyms of the users, but it doesn’t solve the ethical problem.

    • Dr. Longo,
      Thanks for the feedback. Yes, I was thinking the same thing, there are most likely not users who post thinking that the post will be used for data analysis and/or tracked back to them personally. It is interesting to read about and consider the ethics of digital spaces. It has become a whole new area of study and what I find most interesting is how some people act so differently in digital spaces than in the rest of their lives- for some it is as though they have an alter ego that appears.

Leave a Reply

Your email address will not be published. Required fields are marked *