Master Data Management

NLP labelling: The types of data in NLP

Why must humans learn machine language to communicate with machines? Why can’t machines learn our languages? After all, we created machines! This rather simple (some may say simplistic) question can unearth a whole host of responses. It is something computer scientists have been pondering as far back as the times of Alan Turing - his famous Turing Test is still used today to evaluate the language processing capabilities of AI systems.

Linguist Noam Chomsky’s work in the 1950s was invaluable in the development of many early NLP algorithms and models, and his work continues to shape the field to this day. In essence, NLP (Natural Language Processing) is a part of Artificial Intelligence (AI) and computational linguistics that enables computers to understand, interpret, and generate human language. That is, it helps machines get closer to understanding human languages.

Data annotations are a critical part of NLP, where we label or tag text data with machine-readable information that provides NLP algorithms vital information on how to process the tagged data. Clear and concise annotations ensure we train machine models with better quality inputs. It also helps us to scale NLP operations.

For example, let’s take a data set that we are using to train a model on positive and negative sentiment. Consider:

Original text: "I absolutely loved the movie! The acting was fantastic and the story was engaging."

Annotation: Positive sentiment

Original text: "The movie was terrible. The acting was wooden and the story was predictable."

Annotation: Negative sentiment

The annotation helps the machine understand the sentiment being expressed in the text. Such input annotations can help us train a model to recognise language patterns that express positive or negative sentiment.

As one might expect, there are many types of annotations employed by NLP scientists and software engineers. Here are some popular ones:

Part of speech (POS) tagging:

Consider the sentence “Sarla sat on the chair”.

We would annotate it as “Sarla/NN sat/VBD on/IN the/DT chair/NN”

Where NN corresponds to a noun, VBD implies verb in past tense and so on.

Named entity recognition (NER):

For example, in the sentence "I ordered a dozen roses from Ferns N Petals," "Ferns N Petals." could be tagged as an ORGANIZATION.

Sentiment analysis:

Text classification:

Event extraction:

Consider the sentence: "Apple announced that it has acquired a startup."

To extract the event described in this sentence, we would identify the relevant entities and attributes and link them to the event. The resulting event might look something like this:

Event: Acquisition
Trigger: announced
Agent: Apple
Object: a startup

In this example, the event is an acquisition, which is triggered by the verb "announced". The agent performing the event is "the company", and the object being acquired is "a startup".

Coreference resolution:

Consider the following sentences:

A. Mary went to the store to buy some apples. She paid for them with cash.

B. John saw Mary at the store. He said hello to her.

In these sentences, there are two different mentions of the same entity, Mary. Coreference resolution identifies these mentions as referring to the same entity. By resolving the pronoun ‘she’ in sentence A to refer to Mary, and the pronoun ‘her’ in sentence B, the the resulting coreference resolution might look like this:

A. Mary went to the store to buy some apples. Mary paid for them with cash.

B. John saw Mary at the store. John said hello to her.

*For organizations on the digital transformation journey, agility is key in responding to a rapidly changing technology and business landscape. Now more than ever, it is crucial to deliver and exceed on organizational expectations with a robust digital mindset backed by innovation. Enabling businesses to sense, learn, respond, and evolve like a living organism, will be imperative for business excellence going forward. A comprehensive, yet modular suite of services is doing exactly that. Equipping organizations with intuitive decision-making automatically at scale, actionable insights based on real-time solutions, anytime/anywhere experience, and in-depth data visibility across functions leading to hyper-productivity, Live Enterprise is building connected organizations that are innovating collaboratively for the future.

Bring Data Uniformity and Precision with Master Data Management >>

Blogs

Annotation Services

BPM Analytics

Business Process as a Service (BPaaS)

Business Transformation

Corporate

Customer Service

Digital Interactive Services

Education Technology Services

Finance & Accounting

Financial Services

Geospatial Data Services

Healthcare

Human Resource Outsourcing

Learning and Development

Legal Process Outsourcing

Manufacturing

Marketing

Master Data Management

McCamish

Robotic Process Automation

Retail, CPG and Logistics

Sales and Fulfillment

Sourcing and Procurement

Infosys BPM