Web & Social Analytics

Data privacy and ethical considerations in web and social media analytics

Social media and e-commerce usage have exploded in the past decade, as has the monitoring and intrusion into people's lives. Most people have experienced privacy violations on the web. As per a 2017 survey by Statista of the age group 18 to 29, 20% missed possible career opportunities, 48% ended up buying without intending to, and 56% had links, comments, or posts viewed by unintended people. People join online communities to interact with others who have similar interests. Such forums and social media are for sharing information, views and, unwittingly, for sharing attitudes and behaviours.

These platforms are data reservoirs for various subjects, from political views and buying behaviour to racism, misogyny, and stands on climate change. In the pre-web era, such information would have to be collected through interviews, observation, and questionnaires. And even then, there would have been no guarantee that the answers reflected the actual beliefs of the individual, as they could be entering in data that they perceived as the majority view and, therefore, "safe." A lot of the huge volume of data found on web platforms is naturally occurring, original, and varied and, therefore, of special interest to businesses and researchers.

Web and social media analytics aim to monitor, collect, and analyse data to generate inferences and insights and track patterns and trends. Ethical challenges in web and social analytics exist at individual and organisation levels. With advances in AI-based analytics, the ability to reveal patterns and insights from the web and social media data pool is progressing faster than the evolution of legal and ethical guidelines. Users are not completely in the know about what data is being collected, who is doing it, and why. And most people grant permission to use personal data without being aware of the far-reaching consequences. Some individual-level issues are privacy breaches, re-identification of data, profiling, data mining, and anonymity. At an organisational level, the problems arise due to a lack of data quality, data sharing between organisations, biased insights, and erroneous sampling.

Privacy breaches are not only due to data hacking but can also result from use other than what the users gave consent for, leakage of information by employees, or improper data handling. Re-identification of data is done by matching anonymous data with information available in the public domain. For example, if health information anonymously given is linked to an individual, this could cause discrimination, defamation, or even security risks. Profiling using web and social media analytics could cause members of a certain race, gender, or social and economic status to be treated unfairly. The cloak of anonymity provided by social media allows fraudsters to get away with much wrongdoing. From fake product feedback to cons, to vicious trolling, anonymity is an enabler for many social media ills.

According to the ethics guidelines the Association of Internet Researchers (AoIR) has set, there must be defined processes for reflecting on ethical considerations. Researchers need to conduct an ethical analysis, with the first step stating the project's objective, use, and expected value. The risks emanating from the project should then be discussed against the findings of the initial step. AI research utilises large datasets; in such scenarios, getting informed consent from all data subjects is not feasible. And even if attempted, the consent could not be considered informed as understanding the complex AI models would be beyond most subjects. Digital data collection methods like automated scraping and APIs have endangered privacy and autonomy. So, researchers must be very mindful of reviewing the use and the risks associated with processing the data at every step and should take active measures to minimise any potential harm to the subjects. Collection of names may not be done, but other personally identifiable information (PII) like location, IP address, or email address might be utilised. If so, these should be stored securely and deleted as soon as possible from data sets. And if retained, it should be stored within a secure environment. When using ML models for internet research, researchers must address dataset accountability and the impact of bias in model training and normalisation in data cleaning.

Ethical challenges will continue to rise manifold as technology progresses and will outpace the development of frameworks and guidelines to safeguard those at risk. Privacy is a human right under the UN Declaration of human rights. And the fundamental significance of it is summed up by this Gary Kovacs quote: "Privacy is not an option, and it shouldn't be the price we accept for just getting on the internet."

*For organizations on the digital transformation journey, agility is key in responding to a rapidly changing technology and business landscape. Now more than ever, it is crucial to deliver and exceed on organizational expectations with a robust digital mindset backed by innovation. Enabling businesses to sense, learn, respond, and evolve like a living organism, will be imperative for business excellence going forward. A comprehensive, yet modular suite of services is doing exactly that. Equipping organizations with intuitive decision-making automatically at scale, actionable insights based on real-time solutions, anytime/anywhere experience, and in-depth data visibility across functions leading to hyper-productivity, Live Enterprise is building connected organizations that are innovating collaboratively for the future.

Recent Posts