Quick Notes - Day 19

All previous posts in this series here
(Days 17 and 18 are included in one post with 4 papers)

Theme: LREC 2020 papers on general overview about European projects.

The Competitiveness Analysis of the European Language Technology Market
Authors: Andrejs Vasiljevs et.al.
Published at: LREC 2020 url

This paper did a study on the global competitiveness of European NLP technologies in three areas: speech technology, cross-lingual search, and machine translation, in comparison to North America and Asia. They did this comparison in 7 dimensions: research, innovations, investments, market dominance, industry, infrastructure and open data. In terms of research, asia seems to be having more and more output. North america seems to dominate innovations. Asia dominates investments for search, whereas north america dominates for other two areas. North america has more market dominance for all 3 areas, whereas Europe is ahead in enterprise search (elastic search). north america seems to dominate industry and infrastructure. For open data, they conclude that europe dominates for MT, whereas as north america dominates the other two areas.

I don’t really know what I was looking for in reading this analysis. I was just curious about this kind of market analysis and read through. It is interesting to see how more stuff is coming from Asia in certain aspects and tasks. I would one day also like to see other regions such as Africa, Australia etc in this comparison.

Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures
Authors: Lilli Smal et.al
Published at: LREC 2020 url

This paper did a study on understanding the language data sharing practices in the public sectors of EU countries, what are the obstacles they face, and gave some recommendations to overcome these obstacles. They say that the main challenges are related to the lack of undervaluing language data, not having set standards or procedures to manage such data , lack of digital skills and not moving towards computer aided translation tools, limited access to outsourced translations (e.g., not maintaining translation memories) and legal issues related to IP of translations. So they made some recommendations to address these issues - at the policy level as well as institutional and process level.

I found this information interesting. The eventual value of any resource such as datasets or software lies in its adoption by some community. Public sector is a large community which needs various forms of NLP technologies. However, we don’t usually get to know what they use or how much they know about these resources. So, I enjoyed skimming through this paper, to understand these issues.

Language Technology Programme for Icelandic 2019-2023
Authors: Anna Nikulasdottir et.al.
Published at: LREC 2020 url

This paper describes the five year plan for the development of Icelandic language technologies for language resources, speech recognition, speech synthesis, machine translation, and spell and grammar checking. They describe what the goals are, who are the main particpants, what are the sub-tasks within each of these five areas etc.

I kind of like this idea of a national plan for language technology. I don’t know how a country like India implements this - perhaps each language has its own plan, or there are many of them together in one, it will be good to know. What I found surprising in this paper is - they call it a 5 year plan, but there is no timeline anywhere for any of these tasks!

Written on May 18, 2020