Can Computers Understand Human Languages?
When a human baby starts learning to respond to the adults, it would start with a few sounds, and then evolve to use words, understand the semantics of it, build a vocabulary, learn to resolve ambiguities, analyse sentiment and emotions, use grammar to form a structured sentence, and then converse based on subjective factors and other agents in play. A computer, however, cannot perform all this, yet.
But we can make the computer, a machine that is based on codes, to process, understand the human languages, and emulate some of the human traits. We can do this by programming the machine to analyse patterns from old conversations and literary works and mathematically process any new human interactions based on what it has learned.
Computers and languages
Let’s take a computer chat bot for example. And let’s not throw in any math for now. The task now is to make the machine respond to a human’s text-based input. If you type “Hello” then the program should respond with “Hello”. Sounds simple. And it can be done with a simple if loop.
But what about other inputs? Say, a hundred different variations of greetings. It is not that efficient and sensible to write hundred if loops. For that, you could write a simple program in python like this that merely takes in a sentence, and lets you store a response for it. And when a user runs it, it simply executes an SQL query and fetches any stored response for their input.
But when ‘train’ the program for “Hello, Tiel!” with a response text as “Hello there, amigo!” then for every “Hello, Tiel!” the response will always be “Hello there, amigo!” For all the program cares, it just needs to run a SELECT query
SELECT user_voice, response FROM response_memory WHERE user_voice=”Hello, Tiel!”.
We are not even merely mimicking a conversation here.
Natural Language Processing
But what if we make the computer process some level of grammar and vocabulary to make them behave like humans? Instead of just processing the input as strings, what if we make them classify the words in a sentence into its appropriate parts of speech? The technical process of analysing and algorithmically processing a natural human language is known as Natural Language Processing.
With NLP in play, we can empower our script to ‘learn’ from real-world datasets like a text corpus (huge collections of literary stuff that are tagged and classified already) and draw out patterns using a decision tree or a statistical model to figure out an optimal response when a new input is given.
Now this feels much more efficient and ‘natural’ than just fetching stored responses. Let’s get emotional for a minute now.
While natural language processing is all about objectively structuring a human input to derive some meaning out of it, sentiment analysis is one of the processes that associates a qualitative or subjective feature to the input. In a broader sense, it is more like training your code to understand if the user is happy like a Hello Kitty or mad like the Hulk.
Most commonly, the sentiment analysis on trivial and smaller datasets are done by picking out certain words in a sentence that would attribute to a certain emotion, and then use the information obtained from the training to compare and quantify the sentiment of the input.
For example, if we do sentiment analysis on the sentence ‘Alice is happy that she got a new crayon box.’ A simple analysis script would use a model like the bag-of-words, split the entire sentence into a set of words, and compare it with the set (bag) of learned words. it would then quantify the sentence with a score as a subjectively positive one with a positive number. The polarising emotions, positive and negative are thus quantified under the term ‘polarity’.
The following notebook would help understand what sentiment analysis is all about, even better.
Now that you have a grip on the basics of NLP and what things you could do with it, you can expand the above script or write one yourself to deal with a much larger dataset—perhaps analysing a list of tweets, or your own IM texts, and visualize the data with matplotlib. To get yourself started, refer the TextBlob documentation. With user-generated content amassing on the internet, it is the right time to process all that sweet data and do all sorts of fun things with it.
This post was first published on April 8, 2014.