Teaching Notes - Text Classification

This post is a continuation from my previous posts on teaching a 100-level undergraduate course called Language and Computers. As mentioned earlier, it is a very diverse class, and I use this textbook: Language and Computers by Marcus Dickinson, Chris Brew and Detmar Meurers.

5th topic of the course is Text classification. Like others who taught similar courses elsewhere (Indiana, UT-Austin), I also took a short detour into cryptography and language before going into text classification. I primarily followed the same pattern as these two courses for that topic, primarily because I wasn’t sure what to discuss. Now that I have a clearer picture, may be I can think of more examples to take. We spoke about Caeser Cipher and Vigenère cipher, and about how translation can be thought of as a form of decryption.

For classification, I spent about 3-3.5 classes (50 min sessions), and worked on the following questions:

What is text classification? What are some of the applications?
What is challenging about doing such a thing automatically?
What do we need to do text classification? What are the different steps involved?
What is needed at each step? How do we get that information?
How do we evaluate text classification?

I took two example problems - spam classification, and predicting whether the content of a webpage is appropriate or inappropriate for children, to illustrate the idea of annotated training data, taking bag-of-words as features and choosing problem specific features, “training” with naive bayes (and a quick overview of nearest neigbors and neural networks) after a quick probability primer, and diferent ways of evaluating text classification.

For exercises during the class, I used the following:

Examples of spam vs non-spam SMS from a public SMS-spam dataset, and emails from Enron email spam dataset with and without the labels displayed, to do “human” classification, to get a hang of the problem, and to illustrate why it could be difficult for sms versus email as we have smaller amount of evidence for sms.
Zoink! problem from NACLO 2015, to think about how to solve text classification provided you have some annotated data.
Questions showing different confusion matrices for different classification problems, asking which is a better classifier. The one with higher accuracy need not always be the best one is what I tried to show if you want a specific class to have as little errors as possible - I think they understood.

Overall, I think this went on well. The fact that I dealt with this topics in other classes earlier (400 and 500 level) helped, I guess. I am looking for more real-world examples for text classification. Spam classification is the easiest to take as an example, and I used a couple of other examples - but may be more of the generic examples of the kind - google translate, apple siri etc, real products would be useful. Perhaps we will have more examples by next year!

We just started with “Dialog Systems”, which is the next topic. I am relying on Jurafsky and Martin’s 3rd edition draft chapter, and the textbook for this topic.

Written on October 29, 2017