Teaching NLP for Economists - Some reflections

I taught an online guest course at the Munich Graduate School of Economics last month (October 2020), called “NLP for Economists”. The last real course I taught was in Spring 2018, and my professional experiences since then substantially influenced my views on what to include when teaching NLP to a non-CS audience. This post is a reflection on my course, along with some comments from student feedback.

Challenges in preparing this course: A course such as this is new. I don’t know of any other NLP course with a specific focus on economics. Why such a course? is not my question - there is clearly a need. Why is a traditional NLP course insufficient? What is missing in that approach? What makes this course different from a regular NLP course? Here are some challenges with designing this course, in my opinion:

How much can you pack into 2 weeks of classes? (6-7 meetings in total i.e., every alternate weekday for two weeks) was my first challenge. Given the background of my students (graduate students in economics, some with a little bit of programming skills, some without), the course had to be introductory. Yet, there should be some actionable content they can use in their own research projects soon (or at least effectively collaborate with a more technical person with NLP/ML background).
Another challenge comes from the fact that I am teaching to economists. Many off the shelf NLP solutions may not be a good fit for texts from that discipline, as they are typically trained on newspapers, tweets etc. Further, there are all sorts of challenging NLP problems, but not all of them are commonly found in existing research at the intersection of NLP and Economics (as seen in economics journals, not NLP publications). So, any content I choose should reflect that reality.
A third challenge is - where do you get your data? how do you extract text from various formats? While this is a completely ignored problem in a typical NLP course I can think of, it is a major bottleneck both in industry use cases, as well as for anyone wanting to work in a non-standard NLP setup. So, including some discussion on this is essential.
A final challenge is to introduce methods that are relatively easier to understand without a lot of technical background. I saw two advantages to this:
- Data hungry NLP methods (e.g., deep learning) may not suit the situation, especially considering that they may not have that sort of datasets. So, they should know about alternative approaches as well (I felt regular expressions, rule based approaches are useful here)
- Simpler solutions (if they work) are more manageable in terms of computational and financial resources.

So, considering all these, I divided my syllabus into three main topics (details on the course website):

Overview (NLP overview, NLP in economics overview, Python basics)
Python and Text (working with various formats of text, pre-processing text, different kinds of text vectorization from bag of words to embeddings)
NLP/ML methods (regular expressions, corpus collection/analysis methods, text classification, topic modeling, information extraction, summarization)

For the last one, I noticed in my readings that text classification and topic modeling are two most common methods seen in economics papers dealing with textual analysis. I included information extraction to introduce what else can possibly be done (key phrase extraction, named entity recognition etc), and text summarization was very brief, just because there are some available libraries. It seemed to me from assignment submissions that students saw some value in knowing about KPE/NER for their own research.

Apart from these, we had group discussion sessions where students formed into groups, chose an economics paper involving NLP methods (mostly from the list supplied on the website) and presented about it. I don’t like to give too challenging assignments in courses with non-CS students, since that may deter them from the content altogether. So, I focused on letting them figure out how to use some existing NLP tools, and in addition, write their observations on where they work and where they fail. They also had to write a short term paper reflecting on some economics research problem that can benefit from what they learnt.

What I liked: I just loved the insights and criticisms (of NLP) that came from students during our meetings, in the group discussions, in their assignments and term paper. Some of these gave me a fresh perspective on NLP. The many perspectives on PDF parsing enriched my own experience with that. I would have appreciated more interaction/more questions, but I felt this is my best class so far!

What I missed: The course was supposed to be taught in person. Situations made it an online course. I realized that as an instructor, I rely a lot on in person interactions, moving around in the classroom, asking questions, etc. I terribly missed all these. Especially, the pre-recording of lectures left me in some sort of depressed mood for a full week. Recording them, alternatively looking at the slides and staring into blankness, is the worst part of the course for me. However, some students felt this format worked better for them as they could pause, replay, and resume when they want (even though the audio/video quality was poor from my end).

What can be done better in the course:: After teaching once, and seeing the student feedback, here are my thoughts on how to modify this next time (and perhaps spending an extra class) -

Have more guided discussion sessions rather than keeping them completely open. My inexperience with online classes showed here, and this was an important point mentioned in student feedback too
Focus on the connection between Economics research and NLP methods more carefully. My own inexperience with the way economics papers are written made this part challenging for me, but this has been another main point in the student feedback.
Talk more on how to combine NLP based feature representations with the primary source of data for the respective domains (e.g., tables of numeric and categorical data unrelated to text itself)
Introduce some specific examples of crowd sourcing/automatic data labeling/active learning approaches to address scenarios where there is no annotated text data to start with.
Look for more resources/examples on how to meaningfully continue learning and applying NLP for their own disciplinary research, without trying to become a CS researcher - I think there is a real gap in this area in terms of available resources. Existing ones I find are either too basic or too advanced - no middle ground.

A bigger challenge for NLP researchers: A question that repeatedly came up in our discussions, and in my thoughts while reading papers as well as student submissions was: how can NLP meaningfully contribute to economics research, apart from being a fancy new method? How can NLP based measures relate to some concrete economics concept? How can one draw some causal inference based on what NLP tells us? - these were some of the questions some students (and me) grappled with, especially during the group discussions phase. I found a few economics papers that attempt to take correlational evidence from NLP further and relate them to actual economic events (some examples in Gentzkow et.al., (2019)), but none from NLP side. I think this is a recurring problem whenever one wants to apply NLP methods beyond CS related disciplines (including bioinformatics etc), and especially more with social sciences. NLP community should seriously consider this (along with self reflection, which was a theme in ACL this year) and ponder on what can be done to increase its impact in all these new-er application areas.

Overall, while the virtual format left me initially dull and depressed, considering that I am teaching after a while, and that this is my first online teaching experience, I felt it is a pretty good start. There were many issues on the family end as well during the same 2 week period (including two ER visits). Yet, the general student feedback has been positive, both in the anonymous form as well as in the personal emails sent later. So, I decided not to be too self-critical after publishing this post, and I look forward to refine this or build a similar course later in future, if possible.

Written on November 11, 2020