Teaching Notes - Text Classification

This post is a continuation from my previous posts on teaching a 100-level undergraduate course called Language and Computers. As mentioned earlier, it is a very diverse class, and I use this textbook: Language and Computers by Marcus Dickinson, Chris Brew and Detmar Meurers.

5th topic of the course is Text classification. Like others who taught similar courses elsewhere (Indiana, UT-Austin), I also took a short detour into cryptography and language before going into text classification. I primarily followed the same pattern as these two courses for that topic, primarily because I wasn’t sure what to discuss. Now that I have a clearer picture, may be I can think of more examples to take. We spoke about Caeser Cipher and Vigenère cipher, and about how translation can be thought of as a form of decryption.

For classification, I spent about 3-3.5 classes (50 min sessions), and worked on the following questions:

  • What is text classification? What are some of the applications?
  • What is challenging about doing such a thing automatically?
  • What do we need to do text classification? What are the different steps involved?
  • What is needed at each step? How do we get that information?
  • How do we evaluate text classification?

I took two example problems - spam classification, and predicting whether the content of a webpage is appropriate or inappropriate for children, to illustrate the idea of annotated training data, taking bag-of-words as features and choosing problem specific features, “training” with naive bayes (and a quick overview of nearest neigbors and neural networks) after a quick probability primer, and diferent ways of evaluating text classification.

For exercises during the class, I used the following:

  • Examples of spam vs non-spam SMS from a public SMS-spam dataset, and emails from Enron email spam dataset with and without the labels displayed, to do “human” classification, to get a hang of the problem, and to illustrate why it could be difficult for sms versus email as we have smaller amount of evidence for sms.
  • Zoink! problem from NACLO 2015, to think about how to solve text classification provided you have some annotated data.
  • Questions showing different confusion matrices for different classification problems, asking which is a better classifier. The one with higher accuracy need not always be the best one is what I tried to show if you want a specific class to have as little errors as possible - I think they understood.

Overall, I think this went on well. The fact that I dealt with this topics in other classes earlier (400 and 500 level) helped, I guess. I am looking for more real-world examples for text classification. Spam classification is the easiest to take as an example, and I used a couple of other examples - but may be more of the generic examples of the kind - google translate, apple siri etc, real products would be useful. Perhaps we will have more examples by next year!

We just started with “Dialog Systems”, which is the next topic. I am relying on Jurafsky and Martin’s 3rd edition draft chapter, and the textbook for this topic.

Written on October 29, 2017