Annotation Services

Quality control in annotation tools

The quality of data we feed into a machine learning (ML) algorithm is crucial for successful and accurate business predictions. The source of data and its labelling determine its quality in an ML pipeline.

For example, a data set for an ML model in healthcare designed to detect tumours must have sufficient images to learn from. If only 20% of the images are of malignant tumours and the rest are of non-malignant tumours, the model will be unable to predict accurate results once live.

This article will explain the measures annotators must take to ensure quality control in annotation.

Types of data quality problems in annotation

Annotation teams must factor in inherent biases and labelling errors while creating the data set. This is why labelling text, images, or video must go through different labellers/annotators. Two common types of data annotation errors are –

Data drift

Over time, a slow change in annotation labels and data features can cause data drift. It may result in higher error rates for rule-based systems and ML models. To avoid data drift, periodic annotation review is necessary, which can be a slow process.


While data drift is a slow change in data, anomalies are sudden deviations due to external events. For example, the Covid-19 pandemic was an anomaly in the normally occurring healthcare industry dataset.

It is important to have procedures to detect anomalies and switch from automated to manual annotation if necessary. Compared to data drift, anomalies are easier to detect.

Quality control in data annotation

To ensure that the dataset they feed into the ML model is of high quality, consistency, and integrity, annotators can consider adopting these techniques –

Ensure good quality communication

The team that annotates data is not the same as the one that creates algorithms. It is common to have distributed teams. The data scientists could have different education and experience compared to annotators.
Having standard protocols for communication between data scientists and annotators is important. You also need a robust feedback structure that can help improve the accuracy of data annotation.

Set the gold standard

A gold standard is a set of perfectly annotated data that acts as a template for what the annotation should look like. This dataset is a reference point for all annotators and reviewers, irrespective of their skill level and experience.
The gold standard dataset could be an initial tutorial or distributed across annotation stages. It sets a benchmark for annotators’ performance even if the labelling instructions change.

Have a consensus algorithm

A consensus algorithm is an agreement on a single data point among multiple team members. You can either automate this process or assign multiple reviewers per data point. The latter is common for open-source data.

This method relies on the belief that collective decision is more effective and accurate than individual decision-making.

Use scientific tests to determine annotator accuracy

Scientific methods take a statistical approach to take the data accuracy to the next level. The tests use proven formulas to assess how different annotators perform. Some tests include Chronbach Alpha, Fleiss’ Kappa, Pairwise F1, and Krippendorff’s Alpha.

Each of these scientific tests determines the labelling consistency of a human for holistic quality, consistency, and reliability of data.


This method randomly selects labelled data from a larger pool and searches for errors. Annotators often compare this sample with the gold standard and the consensus algorithm. Random samples are a good indicator of areas more prone to labelling errors.

Categorise the annotators based on skill

Data always goes through multiple annotators of different skill levels. The team of annotators is divided into sub-teams with different weightage as per the expertise of its members. Passing a dataset through different levels of annotators is beneficial when you need a specific skill, or there is a high variance in data labelling.

While generic annotators will have a lower weightage, those with specific expertise will have a higher weightage and may finally review the dataset.

For organisations on the digital transformation journey, agility is key in responding to a rapidly changing technology and business landscape. Now more than ever, it is crucial to deliver and exceed organisational expectations with a robust digital mindset backed by innovation. Enabling businesses to sense, learn, respond, and evolve like living organisms will be imperative for business excellence. A comprehensive yet modular suite of services is doing precisely that. Equipping organisations with intuitive decision-making automatically at scale, actionable insights based on real-time solutions, anytime/anywhere experience, and in-depth data visibility across functions leading to hyper-productivity, Live Enterprise is building connected organisations that are innovating collaboratively for the future.

How can Infosys BPM help?

Infosys BPM helps businesses by delivering high-quality datasets at scale through a repetitive and exhaustive labelling process.
With industry-specific expertise, the annotators ensure your ML models deliver quality output. Some of the areas of expertise are –

Read more about the data annotation outsourcing services at Infosys BPM.