Tutorial on NLP with Python
I gave a tutorial at the Mid-west Big Data Summer School 2018, held at Iowa State University, USA. It is titled “Natural Language Processing with Python” and this post is just a quick summary of what I did, what I wanted to do etc.
My original proposal for the talk looked something like this:
“Abstract: Unstructured text data is everywhere, in the form of web pages, social media posts, news articles, blogs, and a lot of other documents. Natural Language Processing (NLP) deals with developing methods to solve various language processing problems involving these large quantities of text data. Some examples of real-world problems where NLP is useful are: email spam classification, machine translation, question answering/search, automatically tagging content, identifying fake news etc. In this tutorial, we will learn about working with textual data using the state of the art methods in NLP and Python. I will demonstrate the use of nltk and gensim libraries for processing text, and scikit-learn and keras libraries for building predictive models with the processed textual data. I will take text classification as my primary use case, and demonstrate how these tools enable us to build text classification models and use them in our own applications.”
However, as I started preparing, my ideas on what to keep kept changing. The following issues were in my mind:
- I have only 1.5 hours, it is not as if it is a full course or even a half-a-day tutorial. So, it would be difficult to squeeze in practicals.
- School is attended by people of diverse backgrounds - not necessarily engineers alone. I should keep it as relevant as possible.
- I want the audience to take some thoughts home, I have been in such tutorials where I had no clue what is happening. I don’t want to be such a speaker.
So, with those thoughts in mind, here are the final topics I decided upon:
- Overview of where NLP is used in some applications (a couple of videos)
- Different kinds of computational problems we should solve to be able to achieve language understanding
- Introduction to different NLP tasks with code examples.
For the third part, I chose the following tasks:
- Sentence and word segmentation
- Pattern extraction
- POS tagging
- Entity extraction, noun chunk extraction
- Parsing
- Language generation
- Text Classification
I used python libraries: nltk, spacy, textgenrnn, tensorflow, keras, and sklearn in this process. All the relevant slides and code examples are hosted on github.
Considering that there was some discussion after the talk, and there were people who wanted to talk to me afterwards, I think it went on well. However, what I could have done better is this:
- Putting up information about required libraries/installations etc a few days earlier (I was moving between countries during this time - so I can excuse myself!!)
- Requesting for 2-2.5 hours instead of 1.5, and leaving some time open for open exercise sessions.
Overall, I think coming up with a practical introduction to NLP for a 1.5 hour session is kind of challenging, but I am satisified with what I managed for a first time. I hope someone else who wants to do a similar tutorial in future will find this useful.