Teaching NLP for Economists (again)- Some reflections
I taught an online guest course at the Munich Graduate School of Economics in October 2022, called “NLP for Economists”. I taught it in October 2020 and wrote a post sharing my reflections about it back then. This post is a reflection on the recent course, although it is coming several months late (I just realized today, 17 Feb 2023, that I did not commit the post!)
Like before, this was an online, two week crash course, with six 3-hour sessions and a few shorter one-to-one student meetings. Some of the students knew some programming (Python, R, Stata etc) and some even took a machine learning course just before mine. But there were also students who did neither (in a class of about 15 students). So, I asked myself the following questions before start, which are similar to the ones I started with before :
- How much can you pack into 2 weeks of classes while accommodating the needs of all students?
- What is relevant for economists, who probably wants something that they can readily apply, rather than learning about the open research questions?
- How can I give them actionable knowledge, without putting them off with deep technical details?
In the previous iteration of the course, I focused on getting the basics right, and did not venture into the deep learning space at all. My reasoning at that time was that traditional feature engineering would make more intuitive sense to students and it is easy to write some code and do some stuff. However, lot of things changed in between, and I felt I have to introduce concepts such as large language models, transfer learning, fine-tuning etc. Thus, my syllabus was framed largely based on my thoughts on the above mentioned issues.
Structure of the course: I divided my syllabus into six main topics (details on the course website) and each topic had a 3-hour lecture session followed by some resources for students for further study (including code exercises on collab notebook to try out):
- Introduction (NLP overview, NLP in economics overview)
- Python overview
- NLP/ML methods (regular expressions, corpus collection/analysis methods, text classification, topic modeling, information extraction, summarization)
- Diving deeper: Text Classification and Topic Modeling
- NLP without annotated data: overview of methods when we don’t have large datasets
- NLP and Economics: Selected readings
The last topic was primarily student presentations. Students chose an economics paper involving NLP methods (mostly from the list supplied on the website) and presented about it.They also had to write a short term paper reflecting on some economics research problem that can benefit from what they learnt. About half of the class also had short one-one meetings to discuss their specific research problems and the role of NLP in them. Additionally, one of the alumni from the previous iteration of the course, who is now a PhD student, spent some time with us during a session, sharing how they are using NLP in their current research.
What I liked:
- I liked the interactions with students. They seemed to be way more aware of the possibilities with NLP than before, and always had good questions during the class and sometimes, afterwards too.
- I did not use colab notebooks last time, but now, I see their value as classroom tools (I still don’t use for my research, as most of it is done on remote machines within the office network). So, I will perhaps continue using them when I teach next, but I would recommend the students to learn about setting up a virtual environment, install libraries locally etc, where possible.
- I found it really cool that an alumnus of the previous iteration of the course volunteered to give a small talk in this course. It was valuable in terms of contextualizing the relevance of the course for the current students.
What I missed:
- I would have loved to do some examples with actual economics datasets, especially in a multimodal setup combining tabular data with text representations. I gave some pointers, but really did not have the time to cover it all in the short course.
- Perhaps it is also useful to talk briefly on interpretable methods for NLP/tabular data too.
What can be done better in the course::
- I should have covered regex properly, in an interesting manner, instead of just listing the syntax. I am yet to figure out an interesting, and yet, compact way of introducing regex in about 20 minutes or so. I strongly think it is an important tool in any NLP enthusiast’s armor even in this age of deep learning, LLMs, ChatGPT and so on. There are still some use-cases that are way easier,simpler, and leaner to solve with regex knowledge.
- I should probably leave Python basics as an optional lecture for those who need it. Some students already knew this stuff, and mentioned that this was not necessary.
- I should try to understand the economics literature that uses NLP more closely and get a better picture than the peripheral understanding I have right now.
Some challenges:
-In the last iteration, a question that repeatedly came up was: how can NLP meaningfully contribute to economics research, apart from being a fancy new method? - this time, we did not run into that question that often, as the students seemed to be more aware of the use of NLP in their research. However, the issue of drawing causal inference from NLP work in economics remained.
-Another issue that came up was this: I referred to Hugging Face and the models hosted there a couple of times, and one of the students asked how to choose the right model from hundreds and thousands of models hosted there, when you are a newbie. This is indeed a challenge, and sometimes, it is challenging even for experienced people. I am yet to think of a good answer beyond obvious things like look at the model card/paper/published results/number of people using it etc.
- I felt we need more examples of combining text and tabular data in NLP research, to be able to show and discuss them in such classrooms where that sort of multimodal model is a norm and not an exception.
Overall, I felt much more comfortable with the virtual format now than in the previous iteration, and the sessions were more interactive than before. I felt the fact that I taught two short online courses, and did a few online tutorials in the meanwhile helped. General student feedback has been positive, both in the anonymous form as well as in the personal emails sent later. If I teach the course again, of course, I have things to improve upon. But overall, I am satisfied with the way it went this time.