Annotation Services

Text Annotation - A Comprehensive Overview

AI models interpret vast data that systems ingest through machine learning. But how do we make the machines understand raw data, including images, videos, and text?

While humans interpret the meaning of a phrase based on the cultural and proverbial context, a machine may not do so. For example, in the phrase ‘You cracked the puzzle’, a human knows that it indicates solving a puzzle, but a machine may misinterpret the word ‘cracked’. This makes text annotation challenging.

An accurate AI model is an outcome of text annotation for machine learning. With the adoption of AI models, the demand for annotation has increased, and the data annotation market will grow at a CAGR of 12.1% between 2023 and 2030.

Let’s explore the types of text annotation services and use cases.


Types of text annotation services

Without human intervention, AI models will lack the depth and nativity humans use to control a language. Human annotators train the AI model with high-quality training data using these text annotation methods –

Entity annotation

This process assigns the entities within a text with predefined labels based on semantic meaning. Machine learning uses this annotated text to retrieve the underlying meaning of the text. This method of text annotation locates, extracts, and tags entities within the text using one of these ways –

  1. Named entity recognition (NER) – This method labels key information within a text, such as people, locations, characters, dates, and objects, using distinct categories. For example, 2024 and 2030 are labelled as ‘Dates’, text annotation as a ‘Service’, and Infosys BPM as an ‘Organisation’.

  2. Conference resolution (relationship annotation) – This method links two or more different words that refer to the same entity. For example, in the sentence below, ‘Infosys BPM’ and ‘Organisation’ refer to the same entity.

    Infosys BPM builds high-quality training data for AI models. The organisation leverages humanware and automation at scale for training and evaluation.”

  3. Part-of-speech tagging – This method of text annotation parses sentences and identifies grammar, such as nouns, adjectives, subjects, objects, verbs, pronouns, prepositions, etc. For example, in the sentence below, the word ‘train’ could be an object or a verb.

    “He took several train journeys through India over the years. The conversations with locals helped him train himself in the local language.”

  4. Key phrase tagging – This method locates and labels keywords and key phrases in the text. It is useful when the machine ingests a large chunk of data and wants to understand the main topics without parsing the whole text.

Entity linking

Entity linking, also called named entity linking (NEL), maps words in a text to entities in a knowledge base, open-domain text derived from sources such as Wikipedia. While entity annotation locates exact entities within a text, entity linking connects these labelled entities to larger text.

For example, in the sentence below, the entity helps the AI model understand that ‘Paris’ here refers to a city and not the celebrity ‘Paris Hilton’.

“Last week, we went for a holiday to Paris, the capital of France”.


Text classification

Text classification annotates a chunk of text or sentences with a single label. Some specialised forms of text classification include document classification, sentiment annotation, product categorisation, email classification, toxicity classification, etc. For example, the text chunk below can be labelled ‘news’.

> “According to Ted Decker, CEO of Home Depot, the quarter showed subdued results. Compared to the time during the pandemic, customers wanted fewer do-it-yourself projects. The spring season started late, and there was softness in certain larger discretionary projects.”


Sentiment annotation

This method determines and labels the emotion or opinion behind a text body. This can be difficult especially when the writer has used sarcasm or rhetoric. In this case, the annotator picks the label that best represents the emotion, and computers conclude positive, negative, and neutral views based on analogous data. For example, in the sentence below, there is a mixed feeling of joy, nostalgia, and sadness.

“Helping the old lady with groceries filled me with joy. She reminded me of my mother, whom I miss a lot.”


Text annotation use cases

While text annotation services impact almost all industries, here are some of the prominent use cases –


Healthcare

Text annotation services help extract data from clinical trial records, classify medical documents, detect medical conditions, analyse patient records, and process claims.


Insurance

Insurance companies leverage text annotation services to extract contextual data from contracts, evaluate risk, recognise the parties involved, and identify dubious or fraudulent claims.


Banking

Text annotation services help identify frauds and money laundering, extract and manage contract data, and determine loan rates and credit scores.


Logistics

Text annotation services in logistics can help extract names, amounts, order numbers, and more from invoices.


How can Infosys BPM help in text annotation services?

Infosys BPM uses a human-in-loop model for text annotation services. The agile model combines human intelligence and automation to produce high-quality training datasets at scale to refine and improve AI model while saving time and resources.
Learn more about text annotation services at Infosys BPM.