Sitemap

Getting Started With NLP: TextBlob

5 min readApr 7, 2022

Introduction

Natural Language Processing (NLP) is an area of growing attention due to increasing number of applications like chatbots, machine translation etc. In some ways, the entire revolution of intelligent machines in based on the ability to understand and interact with humans.

I have been exploring NLP for some time now. My journey started with NLTK library in Python, which was the recommended library to get started at that time. NLTK is a perfect library for education and research, it becomes very heavy and tedious for completing even the simple tasks.

Later, I got introduced to TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks.

If it is your first step in NLP, TextBlob is the perfect library for you to get hands-on with. The best way to go through this article is to follow along with the code and perform the tasks yourself. So let’s get started!

TextBlob is a python library and offers a simple API to access its methods and perform basic NLP tasks.

A good thing about TextBlob is that they are just like python strings. So, you can transform and play with it same like we did in python. Below, I have shown you below some basic tasks. Don’t worry about the syntax, it is just to give you an intuition about how much-related TextBlob is to Python strings.

Setup:

Installation of TextBlob in your system in a simple task, all you need to do is open anaconda prompt ( or terminal if using Mac OS or Ubuntu) and enter the following commands:

pip install -U textblob

This will install TextBlob. For the uninitiated — practical work in Natural Language Processing typically uses large bodies of linguistic data, or corpora. To download the necessary corpora, you can run the following command

python -m textblob.download_corpora

1. Noun Phrase Extraction

Since we extracted the words in the previous section, instead of that we can just extract out the noun phrases from the textblob. Noun Phrase extraction is particularly important when you want to analyze the “who” in a sentence. Lets see an example below.

blob = TextBlob("Analytics Vidhya is a great platform to learn data science.")
for np in blob.noun_phrases:
print (np)
>> analytics vidhya
great platform
data science

As we can see that the results aren’t perfectly correct, but we should be aware that we are working with machines.

2. Part-of-speech Tagging

Part-of-speech tagging or grammatical tagging is a method to mark words present in a text on the basis of its definition and context. In simple words, it tells whether a word is a noun, or an adjective, or a verb, etc. This is just a complete version of noun phrase extraction, where we want to find all the the parts of speech in a sentence.

Let’s check the tags of our textblob.

for words, tag in blob.tags:
print (words, tag)
>> Analytics NNS
Vidhya NNP
is VBZ
a DT
great JJ
platform NN
to TO
learn VB
data NNS
science NN

Here, NN represents a noun, DT represents as a determiner, etc. You can check the full list of tags from here to know more.

3. Words Inflection and Lemmatization

Inflection is a process of word formation in which characters are added to the base form of a word to express grammatical meanings. Word inflection in TextBlob is very simple, i.e., the words we tokenized from a textblob can be easily changed into singular or plural.

blob = TextBlob("Analytics Vidhya is a great platform to learn data science. \n It helps community through blogs, hackathons, discussions,etc.")
print (blob.sentences[1].words[1])
print (blob.sentences[1].words[1].singularize())
>> helps
help

TextBlob library also offers an in-build object known as Word. We just need to create a word object and then apply a function directly to it as shown below.

from textblob import Word
w = Word('Platform')
w.pluralize()
>>'Platforms'

We can also use the tags to inflect a particular type of words as shown below.

## using tags
for word,pos in blob.tags:
if pos == 'NN':
print (word.pluralize())
>> platforms
sciences

Words can be lemmatized using the lemmatize function.

## lemmatization
w = Word('running')
w.lemmatize("v") ## v here represents verb
>> 'run'

4. N-grams

A combination of multiple words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words, and can be used as features for language modelling. N-grams can be easily accessed in TextBlob using the ngrams function, which returns a tuple of n successive words.

for ngram in blob.ngrams(2):
print (ngram)
>> ['Analytics', 'Vidhya']
['Vidhya', 'is']
['is', 'a']
['a', 'great']
['great', 'platform']
['platform', 'to']
['to', 'learn']
['learn', 'data']
['data', 'science']

5. Sentiment Analysis

Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.

The sentiment function of textblob returns two properties, polarity, and subjectivity.

Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

Let’s check the sentiment of our blob.

print (blob)
blob.sentiment
>> Analytics Vidhya is a great platform to learn data science.
Sentiment(polarity=0.8, subjectivity=0.75)

We can see that polarity is 0.8, which means that the statement is positive and 0.75 subjectivity refers that mostly it is a public opinion and not a factual information.

Pros:

  1. Since, it is built on the shoulders of NLTK and Pattern, therefore making it simple for beginners by providing an intuitive interface to NLTK.
  2. It provides language translation and detection which is powered by Google Translate ( not provided with Spacy).

Cons:

  1. It is little slower in the comparison to spacy but faster than NLTK. (Spacy > TextBlob > NLTK)
  2. It does not provide features like dependency parsing, word vectors etc. which is provided by spacy.

I hope that you that a fun time learning about this library. TextBlob, actually provided a very easy interface for beginners to learn basic NLP tasks.

I would recommend every beginner to start with this library and then in order to do advance work you can learn spacy as well. We will still be using TextBlob for initial prototyping in the almost every NLP project.

You can find the full code of this article from my github repository.

Also, did you find this article helpful? Please share your opinions/thoughts in the comments section below

--

--

_Khussshal_
_Khussshal_

Written by _Khussshal_

React and MERN stack Dev, also trying my luck as React Native Android Developer

No responses yet