Natural Language Processing Madan Kartheesan Technical Leader Object Automation Software Solutions
NLP • NLP is a part of computer science and artificial intelligence which deals with human languages.
Data Science • 1) Programming • 2) Maths and Statistics • 3) Communication
NLP Libraries for NLP in Python—pandas, sklearn, re, nltk, gensim, TextBlob EDA—Corpus, document-term matrix, word counts Use cases—sentiment analysis, topic modeling, text generation
• Natural Language (English Language, Tamil Language) • Processing (How a computer carries out instructions) • How to deal with text data?
What is natural language processing? •Natural language processing strives to build machines that understand and respond to text or voice data—and respond with text or speech of their own—in much the same way humans do.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
NLP tasks •Speech recognition, also called speech-to-text, is the task of reliably converting voice data into text data. Speech recognition is required for any application that follows voice commands or answers spoken questions. What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar.
Part of speech tagging •The Part of speech tagging, also called grammatical tagging, is the process of determining the part of speech of a particular word or piece of text based on its use and context. Part of speech identifies ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do you own?’
Word sense disambiguation Word sense disambiguation is the selection of the meaning of a word with multiple meanings through a process of semantic analysis that determine the word that makes the most sense in the given context. For example, word sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place). I saw a bear I cannot bear the pain.
Named entity recognition Named entity recognition, or NEM, identifies words or phrases as useful entities. NEM identifies ‘Kentucky’ as a location or ‘Fred’ as a man's name. Madan lives in Chennai.
Sentiment analysis • Sentiment analysis attempts to extract subjective qualities—attitudes, emotions, sarcasm, confusion, suspicion—from text.
Natural language generation • Natural language generation is sometimes described as the opposite of speech recognition or speech-to-text; it's the task of putting structured information into human language.
What is natural language understanding? • Natural language understanding is a subset of natural language processing, which uses syntactic and semantic analysis of text and speech to determine the meaning of a sentence. Syntax refers to the grammatical structure of a sentence, while semantics alludes to its intended meaning. NLU also establishes a relevant ontology: a data structure which specifies the relationships between words and phrases. While humans naturally do this in conversation, the combination of these analyses is required for a machine to understand the intended meaning of different texts. • Our ability to distinguish between homonyms and homophones illustrates the nuances of language well. For example, let’s take the following two sentences:
Alice is swimming against the current. The current version of the report is in the folder. In the first sentence, the word, current is a noun. The verb that precedes it, swimming, provides additional context to the reader, allowing us to conclude that we are referring to the flow of water in the ocean. The second sentence uses the word current, but as an adjective. The noun it describes, version, denotes multiple iterations of a report, enabling us to determine that we are referring to the most up-to-date status of a file.
What is natural language generation? •Natural language generation is another subset of natural language processing. While natural language understanding focuses on computer reading comprehension, natural language generation enables computers to write. NLG is the process of producing a human language text response based on some data input. This text can also be converted into a speech format through text-to-speech services.
What is natural language processing? • Natural language processing, which evolved from computational linguistics, uses methods from various disciplines, such as computer science, artificial intelligence, linguistics, and data science, to enable computers to understand human language in both written and verbal forms. While computational linguistics has more of a focus on aspects of language, natural language processing emphasizes its use of machine learning and deep learning techniques to complete tasks, like language translation or question answering. Natural language processing works by taking unstructured data and converting it into a structured data format. It does this through the identification of named entities (a process called named entity recognition) and identification of word patterns, using methods like tokenization, stemming, and lemmatization, which examine the root forms of words. For example, the suffix -ed on a word, like called, indicates past tense, but it has the same base infinitive (to call) as the present tense verb calling.
Basic steps in NLP 1) TOKENISATION 2) STEMMING 3) LEMMATIZATION 4) STOPWORDS
Data cleaning He is eating an apple. {"He","is","eating","an", "apple"}—Tokenisation "eating" to "eat"—Stemming and Lemmatization Removes "an" from the sentence—Stopwords
Document Term Matrix corpus = ['This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?'] ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] [[0 1 1 1 0 0 1 0 1] [0 2 0 1 0 1 1 0 1] [1 0 0 1 1 0 1 1 1] [0 1 1 1 0 0 1 0 1]]
Vector to sequence models
Sequence to vector models "I love to eat" Positive sentiment [0.10] bad [0.90] good
Sequence to sequence models •Language translation

Natural Language Processing from Object Automation

  • 2.
    Natural Language Processing Madan Kartheesan TechnicalLeader Object Automation Software Solutions
  • 3.
    NLP • NLPis a part of computer science and artificial intelligence which deals with human languages.
  • 4.
    Data Science • 1)Programming • 2) Maths and Statistics • 3) Communication
  • 5.
    NLP Libraries for NLPin Python—pandas, sklearn, re, nltk, gensim, TextBlob EDA—Corpus, document-term matrix, word counts Use cases—sentiment analysis, topic modeling, text generation
  • 6.
    • Natural Language(English Language, Tamil Language) • Processing (How a computer carries out instructions) • How to deal with text data?
  • 7.
    What is natural language processing? •Natural languageprocessing strives to build machines that understand and respond to text or voice data—and respond with text or speech of their own—in much the same way humans do.
  • 8.
    NLP combines computationallinguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
  • 9.
    NLP tasks •Speech recognition,also called speech-to-text, is the task of reliably converting voice data into text data. Speech recognition is required for any application that follows voice commands or answers spoken questions. What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar.
  • 10.
    Part of speech tagging •ThePart of speech tagging, also called grammatical tagging, is the process of determining the part of speech of a particular word or piece of text based on its use and context. Part of speech identifies ‘make’ as a verb in ‘I can make a paper plane,’ and as a noun in ‘What make of car do you own?’
  • 11.
    Word sense disambiguation Wordsense disambiguation is the selection of the meaning of a word with multiple meanings through a process of semantic analysis that determine the word that makes the most sense in the given context. For example, word sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place). I saw a bear I cannot bear the pain.
  • 12.
    Named entity recognition Named entity recognition, orNEM, identifies words or phrases as useful entities. NEM identifies ‘Kentucky’ as a location or ‘Fred’ as a man's name. Madan lives in Chennai.
  • 13.
    Sentiment analysis • Sentiment analysisattempts to extract subjective qualities—attitudes, emotions, sarcasm, confusion, suspicion—from text.
  • 14.
    Natural language generation • Natural languagegeneration is sometimes described as the opposite of speech recognition or speech-to-text; it's the task of putting structured information into human language.
  • 15.
    What is natural language understanding? •Natural language understanding is a subset of natural language processing, which uses syntactic and semantic analysis of text and speech to determine the meaning of a sentence. Syntax refers to the grammatical structure of a sentence, while semantics alludes to its intended meaning. NLU also establishes a relevant ontology: a data structure which specifies the relationships between words and phrases. While humans naturally do this in conversation, the combination of these analyses is required for a machine to understand the intended meaning of different texts. • Our ability to distinguish between homonyms and homophones illustrates the nuances of language well. For example, let’s take the following two sentences:
  • 16.
    Alice is swimmingagainst the current. The current version of the report is in the folder. In the first sentence, the word, current is a noun. The verb that precedes it, swimming, provides additional context to the reader, allowing us to conclude that we are referring to the flow of water in the ocean. The second sentence uses the word current, but as an adjective. The noun it describes, version, denotes multiple iterations of a report, enabling us to determine that we are referring to the most up-to-date status of a file.
  • 17.
    What is natural language generation? •Natural languagegeneration is another subset of natural language processing. While natural language understanding focuses on computer reading comprehension, natural language generation enables computers to write. NLG is the process of producing a human language text response based on some data input. This text can also be converted into a speech format through text-to-speech services.
  • 18.
    What is natural language processing? • Naturallanguage processing, which evolved from computational linguistics, uses methods from various disciplines, such as computer science, artificial intelligence, linguistics, and data science, to enable computers to understand human language in both written and verbal forms. While computational linguistics has more of a focus on aspects of language, natural language processing emphasizes its use of machine learning and deep learning techniques to complete tasks, like language translation or question answering. Natural language processing works by taking unstructured data and converting it into a structured data format. It does this through the identification of named entities (a process called named entity recognition) and identification of word patterns, using methods like tokenization, stemming, and lemmatization, which examine the root forms of words. For example, the suffix -ed on a word, like called, indicates past tense, but it has the same base infinitive (to call) as the present tense verb calling.
  • 19.
    Basic steps inNLP 1) TOKENISATION 2) STEMMING 3) LEMMATIZATION 4) STOPWORDS
  • 20.
    Data cleaning He iseating an apple. {"He","is","eating","an", "apple"}—Tokenisation "eating" to "eat"—Stemming and Lemmatization Removes "an" from the sentence—Stopwords
  • 21.
    Document Term Matrix corpus =['This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?'] ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] [[0 1 1 1 0 0 1 0 1] [0 2 0 1 0 1 1 0 1] [1 0 0 1 1 0 1 1 1] [0 1 1 1 0 0 1 0 1]]
  • 22.
  • 23.
    Sequence to vector models "I loveto eat" Positive sentiment [0.10] bad [0.90] good
  • 24.