Quick Notes - Day 15

All previous posts in this series here

Sanskrit segmentation revisited
Authors: Sriram Krishnan, Amba Kulkarni
Published at: ArXiv (may not be peer reviewed) url

This paper briefly reviews available segmentation approaches for sanskrit text, and suggests modifications for one of them by proposing a probability function to prioritize and rank all possible segmentation solutions. They then show that these modifications result in improved accuracy with segmentation (e.g., sandhi splitting).

It is always good to see work on some under-worked languages, and especially, sanskrit. However, I felt this paper assumes the reader to already be familiar with a lot of stuff. Hence, I did not quite understand what “phases of segmentation” are, for example, as I was thinking of segmentation more in terms of morphological analysis. However, works such as these are also relevant for other languages with similar phenomenon, and I hope this continues in future.

The Unstoppable Rise of Computational Linguistics in Deep Learning
Author: James Henderson
Published at: ACL 2020 url

This paper gives a historical overview of the use of neural network architectures to different NLP/NLU tasks, and discuss what kind of language representations can be “learned” from data, by different neural networks. They conclude that successful deep learning architectures still rely on some hand coded aspects such as resources. It then identifies some challenges and potential directions ahead for deep learning in terms of understanding human language.

It is an interesting paper, and also different from typical NLP papers one comes across. It requires to think beyond modeling, comparison of SOTA numbers etc., and focuses on more abstract questions. I have to return to this in future, to fully understand the arguments.

Written on May 14, 2020