Quick Notes - Day 3
All previous posts in this series here
What Happens To BERT Embeddings During Fine-tuning?
Authors: Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, Ian Tenney
Published At: (seems non-peer reviewed as of now) ArXiv url
Several papers in the past 2,3 years spoke about probing pre-trained language models. However, while using them and fine tuning to specific tasks is now quite common in NLP, there is hardly any research on how these models change during fine tuning. This paper addresses that issue using BERT. They focused on 3 tasks - dependency parsing, natural language inference, and reading comprehension. Their main conclusions are that linguistic properties that the original model learns remain after fine tuning, fine tuning primarily affects top few layers of BERT and that changes due to fine tuning are prominent for in-domain examples and out of domain example representations are still closer to original pre-trained model.
I like the topic of the paper, the questions chosen, and their systematic exploration. I would have loved to see a few more comments on what are some potential extensions to this kind of work, some speculation on whether this is BERT specific or more generic, though.
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
Authors: Emily M. Bender, Alexander Koller
Published At: ACL 2020 url
Continuing on the topic of what do pre-trained language models learn, there are many claims about such models learning “meaning” (i.e., beyond form). These casual claims in research papers sometimes results in the results being hyped up in media (or even in research community itself). This paper argues that models trained only on form has no apriori way of learning meaning. By starting with a theoretical perspective on what is meaning, why meaning cannot be “learnt” when there is no training signal for it, the paper summarizes what human language acquisition literature says on how humans learn meaning and finally wraps the discussion with arguments from distributional semantics (which is frequently mentioned in the context of training these models) on what “meaning” means in this context.
This paper is very much different from regular NLP papers, and I think is a much needed paper to understand whether the current approaches take us towards the holygrail of NLP - “general linguistic intelligence”. What I most liked about the paper is the section “Some possible counter arguments”. I hope this paper triggers someone to write NLP version of the kind of posts seen in Piekniewski’s blog.
End of story for today!