Neural networks for the treatment of flexible natural languages (Q3056361)

Modernu IKT risinājumu izveide nav iedomājama bez valodu tehnoloģijām, kuras nodrošina risinājumu pieejamību lietotāja dzimtajā valodā. META-NET veiktais pētījums (Rehm&Uszkoreit, 2013) norāda uz būtiskām atšķirībām valodas tehnoloģiju attīstības līmenī. Pasaules lielajām valodām tiek piešķirts vērtējums “no laba līdz izcilam”, bet valodām ar nelielu runātāju skaitu - “no vidēja līdz zemam”. Šīs plaisas mazināšana ir liels izaicinājums, mūsdienu valodu tehnoloģijās izmantotajām metodēm nepieciešami lieli datu apjumi, kuru īpaši trūkst mazajām fleksīvām valodām ar bagātu morfoloģiju. Viena no jaunākajām, daudzsološākajām tendencēm IKT risinājumu veidošanā ir neironu tīklu tehnoloģiju (NT) izmantošana. NT tiek veiksmīgi izmantoti tādām valodām kā angļu, franču, spāņu un ķīniešu valodām, taču dziļi pētījumi nelielām valodām ar nepietiekamiem valodas resursiem, piemēram, baltu, somu-ugru un slāvu valodām vēl nav veikti. Nepieciešamība mazināt plaisu valodu tehnoloģiju attīstībā un perspektīvā NT tehnoloģijas izmantošana paver lielu iespēju projekta komandai Rīgā, Latvijā. Šis projekts un tā sagaidāmie rezultāti Latvijai ļaus ieņemt vadošo pozīciju valodu tehnoloģiju izpētē un inovatīvu risinājumu radīšanā. Latvija jau šobrīd ir sasniegusi augstu līmeni šajā jomā, to nodrošinājusi cieša akadēmiskās vides un komerciālo struktūru sadarbība. Latvija līdzdarbojas pētniecībā ar citām Eiropas nācijām un ir piemērota valodu tehnoloģiju izpētes un inovāciju reģionālā centra izveidei.Projektā tiks pētīti, analizēti un novērtēti (gan ar automātiskām metodēm, gan manuāli) dažādi NT algoritmi, lai atrastu piemērotākos risinājumus vairākiem nozīmīgiem valodas tehnoloģiju uzdevumiem:- rakstītā teksta apstrādē – tekstu sintaktiskai analīzei, gramatiskuma novērtēšanai un gramatisko kļūdu labošanai (šos komponentus varēs izmantot arī automatizētas tulkošanas, runātās valodas un cilvēka-datora saziņas problēmu risināšanā); - runas tehnoloģijās – efektīvā runas atpazīšanā un runas sintēzē, jaunu datu ieguvē un sagatavošanā, ja pieejamais datu apjoms ir neliels;- automatizētā tulkošanā – labākā dažādu teksta vienību tulkojumu izveidē; automatizētā vārdnīcu izveidē;- cilvēka-datora saziņas modelēšanā – dabiskās valodas sapratnes, dialoga pārvaldības un dabiskās valodas ģenerēšanas uzdevumos. Tiks pētīti arī dialoga modelēšanas, zināšanu izguves un pārvaldības kognitīvie aspekti un to modelēšana neironu tīklos. Tiks pētītas dažādas NT arhitektūras, tostarp dziļie rekursīvie (angliski: deep recurrent), dziļie vienvirziena (angliski: feed-forward) NT, un efektīvi NT apmācības algoritmi, lai risinātu datu skrajuma, morfoloģiskās daudzveidības un resursu nepietiekamības problēmas. Tiks veikta kognitīvo procesu – valodas, vizuālās uztveres, semantiskās apstrādes – empīriska (korelatīva, eksperimentāla) izpēte. Iegūtās zināšanas tiks izmantotas inovatīvu fleksīvo valodu apstrādes metožu radīšanā. Pētījuma teorētiskie rezultāti tiks apkopoti vismaz 15 Web of Science vai SCOPUS datubāzēs iekļauto žurnālu vai konferenču rakstu krājumu publikācijās.Pētījuma teorētiskie rezultāti tiks pārbaudīti laboratoriskos prototipos (TRL 4), kā arī apvienoti prototipā cilvēka-datora saziņai (TRL 5). Veiksmīgākie risinājumi tiks publicēti kā atvērtā koda risinājumi.Plānoto pētījumu rezultātiem ir liels komercializācijas potenciāls, tāpēc pēc projekta beigām turpināsies to attīstīšana un stabilizēšana, lai izveidotu inovatīvus programmproduktus Latvijas un Eiropas tirgum.Projekts ir ar saimniecisko darbību saistīts projekts, tas tiks īstenots sadarbībā starp komersantu (Tilde) un pētniecības institūciju (Latvijas Universitāte) un ilgs no 2016. gada 1. novembra līdz 2019. gada 31. oktobrim. Projekta kopējas izmaksas ir 690 672,13 EUR, t.sk., ERAF atbalsts 484 384,04 EUR. Projekta pētniecības kategorija - eksperimentālā izstrāde un rūpnieciskie pētījumi. Projekts pēc Frascati rokasgrāmatas atbilst IKT zinātnes nozarei “1.2 Computer and information sciences” (Latvian)

0 references

The development of modern ICT solutions is inconceivable without language technologies ensuring the availability of solutions in the user’s mother tongue. The meta-NET study (Rehm&Uszkoreit, 2013) points to significant differences in the level of development of language technologies. The world’s great languages are rated “good to excellent”, while languages with a small number of speakers are rated “from medium to low”. Bridging this gap is a major challenge, the methods used in modern language technologies require large data penetrations, which are particularly lacking for the small flexive languages with rich morphology. One of the most promising developments in ICT solutions is the use of Neural Network Technologies (NT). NTs are successfully used in languages such as English, French, Spanish and Chinese, but deep studies have not yet been carried out in small languages with insufficient linguistic resources, such as white, Finnish-Ugric and Slavic languages. The need to bridge the gap in the development of language technologies and the prospective use of NT technology opens a great opportunity for the project team in Riga, Latvia. This project and its expected results will allow Latvia to take a leading position in the research of language technologies and the creation of innovative solutions. Latvia has already reached a high level in this field, ensured by close cooperation between the academic environment and commercial structures. Latvia participates in research with other European nations and is suitable for the creation of a regional centre for research and innovation of language technologies.The project will explore, analyse and evaluate various NT algorithms (both automatic methods and manually) to find the most suitable solutions for several important tasks of language technologies:- processing of written text – syntactic analysis of texts, evaluation of grammar and correction of grammar errors (these components will also be used for automated translation, spoken language and human-computer communication problems); In speech technologies – effective speech recognition and speech synthesis, acquisition and preparation of new data where the amount of data available is small;- automated translation – for the creation of the best translations of different text units; Automated dictionaries;- human-computer communication modelling – tasks of understanding the natural language, managing dialogue and generating natural language. Cognitive aspects of dialogue modelling, knowledge retrieval and management and their modelling in neural networks will also be studied. Different NT architectures will be studied, including deep recursive (in English: Deep Recurrent), deep one-way (in English: Feature-forward) NT, and efficient NT training algorithms to address data scratching, morphological diversity and resource scarcity problems. Cognitive processes – language, visual perception, semantic processing – empirical (correlative, experimental) will be studied. The acquired knowledge will be used to create innovative flexible language processing methods. The theoretical results of the study will be summarised in the publications of at least 15 journals or conference articles from Web of Science or SCOPUS databases.The theoretical results of the study will be tested in laboratory prototypes (TRL 4), as well as pooled human-computer communication (TRL 5). The most successful solutions will be published as open source solutions.The results of the planned research have great potential for commercialisation, therefore after the end of the project their development and stabilisation will continue in order to create innovative software products for the Latvian and European market.The project is a project related to economic activities, it will be implemented in cooperation between the merchant (Tilde) and research institution (University of Latvia) and will run from November 1, 2016 to October 31, 2019. The total cost of the project is EUR 690672,13, including ERDF support EUR 484384,04. Research category of the project – experimental development and industrial research. The project, according to the Frascati handbook, corresponds to the ICT science industry “1.2 Computer and information sciences” (English)

point in time

15 July 2021

0 references

Le développement de solutions TIC modernes est inconcevable sans les technologies linguistiques qui rendent les solutions disponibles dans la langue maternelle de l’utilisateur. L’étude méta-NET (Rehm&Uszkoreit, 2013) fait état de différences significatives dans le niveau de développement des technologies linguistiques. Les principales langues du monde sont classées «de bonne à excellente» et «moyenne à faible» pour les langues avec un petit nombre de locuteurs. Combler cette lacune est un défi majeur, les méthodes utilisées dans les technologies du langage moderne exigent de grandes quantités de données, qui font particulièrement défaut dans les petites langues flexibles avec une morphologie riche. L’une des tendances les plus récentes et les plus prometteuses en matière de solutions TIC est l’utilisation des technologies de réseau neuronal (NT). Le NT est utilisé avec succès pour des langues telles que l’anglais, le français, l’espagnol et le chinois, mais des recherches approfondies sur les petites langues dont les ressources linguistiques sont insuffisantes, telles que le blanc, le finnois et le slave, n’ont pas encore été menées. La nécessité de combler l’écart dans le développement des technologies linguistiques et l’utilisation future de la technologie NT ouvre une excellente occasion à l’équipe de projet à Riga, en Lettonie. Ce projet et les résultats escomptés permettront à la Lettonie de prendre une position de premier plan dans la recherche des technologies linguistiques et la création de solutions innovantes. La Lettonie a déjà atteint un niveau élevé dans ce domaine, grâce à une coopération étroite entre les universités et les organismes commerciaux. La Lettonie participe à la recherche avec d’autres pays européens et convient à la création d’un centre régional de recherche et d’innovation dans le domaine des technologies linguistiques. Le projet explorera, analysera et évaluera divers algorithmes NT (à la fois automatiques et manuellement) afin de trouver les solutions les plus appropriées à un certain nombre de défis importants en matière de technologie linguistique:- traitement de texte écrit — analyse syntaxique des textes, évaluation grammaticale et correction des erreurs grammaticales (ces composants peuvent également être utilisés pour résoudre les problèmes de traduction automatisée, de langue orale et de communication entre l’homme et l’ordinateur); — dans les technologies de la parole — reconnaissance vocale efficace et synthèse vocale, l’acquisition et la production de nouvelles données lorsque la quantité de données disponibles est faible;- dans la traduction automatisée, la meilleure traduction de différentes unités de texte; création automatisée de vocabulaire;- modélisation de la communication homme-ordinateur — compréhension du langage naturel, gestion du dialogue et tâches de génération de langage naturel. Les aspects cognitifs de la modélisation du dialogue, de l’extraction et de la gestion des connaissances ainsi que de leur modélisation dans les réseaux neuronaux seront également étudiés. Diverses architectures NT, y compris les architectures profondes récursives, seront explorées (en anglais: profonde récurrente), profonde à sens unique (en anglais: en amont) NT, et des algorithmes de formation NT efficaces pour traiter de la numérisation des données, de la diversité morphologique et de la rareté des ressources. Les processus cognitifs — langage, perception visuelle, traitement sémantique — empiriques (correlatifs, expérimentaux) seront étudiés. Les connaissances acquises seront utilisées pour créer des méthodes de traitement linguistique flexibles et innovantes. Les résultats théoriques de l’étude seront recueillis dans au moins 15 publications de revues ou d’articles de conférence inclus dans le Web of Science ou les bases de données SCOPUS. Les résultats théoriques de l’étude seront testés dans des prototypes de laboratoire (TRL 4), ainsi que combinés pour le prototypage de la communication homme-ordinateur (TRL 5). Les solutions les plus réussies seront publiées comme des solutions open source. Les résultats de la recherche planifiée ont un grand potentiel de commercialisation, donc après la fin du projet, ils continueront à être développés et stabilisés afin de créer des produits logiciels innovants pour le marché letton et européen. Le projet est un projet lié à l’activité économique, il sera mis en œuvre en coopération entre le commerçant (Tilde) et l’institution de recherche (Université de Lettonie) et durera du 1er novembre 2016 au 31 octobre 2019. Le coût total du projet s’élève à 690 672,13 EUR, y compris le soutien du FEDER 484 384,04 EUR, catégorie de recherche de projet — développement expérimental et recherche industrielle. Le projet du Manuel Frascati correspond au secteur des TIC «1.2 Informatique et sciences de l’information» (French)

point in time

25 November 2021

0 references

Die Entwicklung moderner IKT-Lösungen ist ohne Sprachtechnologien unvorstellbar, die Lösungen in der Muttersprache des Nutzers zur Verfügung stellen. Die Meta-NET-Studie (Rehm&Uszkoreit, 2013) deutet auf erhebliche Unterschiede im Niveau der Sprachtechnologieentwicklung hin. Die wichtigsten Sprachen der Welt werden als „von gut bis hervorragend“ und „mittel bis niedrig“ für Sprachen mit einer kleinen Anzahl von Sprechern bewertet. Diese Lücke zu überbrücken ist eine große Herausforderung, denn die Methoden der modernen Sprachtechnologien erfordern große Datenmengen, die in den kleinen, flexiblen Sprachen mit reicher Morphologie besonders fehlen. Einer der neuesten und vielversprechendsten Trends bei IKT-Lösungen ist der Einsatz neuronaler Netzwerktechnologien (NT). NT wird erfolgreich für Sprachen wie Englisch, Französisch, Spanisch und Chinesisch verwendet, aber es wurde noch keine gründliche Erforschung kleiner Sprachen mit unzureichenden Sprachressourcen wie weißer, finnischer und slawischer Sprache durchgeführt. Die Notwendigkeit, die Lücke bei der Entwicklung von Sprachtechnologien und dem künftigen Einsatz von NT-Technologien zu überbrücken, eröffnet dem Projektteam in Riga (Lettland) eine große Chance. Dieses Projekt und die erwarteten Ergebnisse werden es Lettland ermöglichen, eine führende Position bei der Erforschung von Sprachtechnologien und der Entwicklung innovativer Lösungen zu übernehmen. Lettland hat in diesem Bereich bereits ein hohes Niveau erreicht, dank der engen Zusammenarbeit zwischen Wissenschaft und Wirtschaft. Lettland beteiligt sich an der Forschung mit anderen europäischen Nationen und eignet sich für die Schaffung eines regionalen Zentrums für Sprachtechnologieforschung und -innovation. Das Projekt wird verschiedene NT-Algorithmen (sowohl automatische Methoden als auch manuell) untersuchen, analysieren und bewerten, um die am besten geeigneten Lösungen für eine Reihe wichtiger sprachlicher technologischer Herausforderungen zu finden:- schriftliche Textverarbeitung – syntaktische Analyse von Texten, grammatikalische Auswertung und Korrektur von grammatikalischen Fehlern (diese Komponenten können auch zur Lösung von Problemen der automatisierten Übersetzung, gesprochener Sprache und Mensch-Computer-Kommunikation verwendet werden); — in Sprachtechnologien – effektive Spracherkennung und Sprachsynthese, die Erfassung und Erstellung neuer Daten, bei denen die verfügbare Datenmenge gering ist;- in automatisierter Übersetzung die beste Übersetzung verschiedener Texteinheiten; automatisierte Vokabular-Erstellung;- Human-Computer-Kommunikationsmodellierung – natürliches Sprachverständnis, Dialogmanagement und natürliche Sprachgenerierung. Auch kognitive Aspekte der Dialogmodellierung, des Wissensabrufs und des Managements und deren Modellierung in neuronalen Netzwerken werden untersucht. Verschiedene NT-Architekturen, darunter tiefe rekursive, werden erkundet (auf Englisch: Tiefwiederholung), tiefe Einbahn (auf Englisch: NT und effektive NT-Trainingsalgorithmen zur Datenabtastung, morphologischer Vielfalt und Ressourcenknappheit. Kognitive Prozesse – Sprache, visuelle Wahrnehmung, semantische Verarbeitung – empirisch (korrelativ, experimentell) werden untersucht. Das erworbene Wissen wird genutzt, um innovative flexible Methoden der Sprachverarbeitung zu schaffen. Die theoretischen Ergebnisse der Studie werden in mindestens 15 Publikationen von Zeitschriften oder Tagungsartikeln in Web of Science oder SCOPUS-Datenbanken gesammelt. Die theoretischen Ergebnisse der Studie werden in Laborprototypen (TRL 4) sowie kombiniert zur Prototypierung der Mensch-Computer-Kommunikation (TRL 5) getestet. Die erfolgreichsten Lösungen werden als Open-Source-Lösungen veröffentlicht. Die Ergebnisse der geplanten Forschung haben ein großes Potenzial für die Kommerzialisierung, so dass sie nach Ende des Projekts weiter entwickelt und stabilisiert werden, um innovative Softwareprodukte für den lettischen und europäischen Markt zu schaffen. Das Projekt ist ein Projekt im Zusammenhang mit der Wirtschaftstätigkeit, es wird in Zusammenarbeit zwischen dem Händler (Tilde) und Forschungsinstitution (Universität Lettland) umgesetzt und wird vom 1. November 2016 bis zum 31. Oktober 2019 laufen. Die Gesamtkosten des Projekts belaufen sich auf 690 672,13 EUR, einschließlich EFRE-Unterstützung 484 384,04 EUR Projektforschung – experimentelle Entwicklung und industrielle Forschung. Das Frascati Handbook-Projekt entspricht dem IKT-Wissenschaftssektor „1.2 Informatik und Informatik“ (German)

point in time

28 November 2021

0 references

location (string)

Vienības gatve 75, Rīga, LV-1004

0 references

Raiņa bulvāris 19, Rīga, LV-1050

0 references

Identifiers

Latvian Kohesio ID

1.1.1.1/16/A/215

0 references

Neural networks for the treatment of flexible natural languages (Q3056361)

Statements

Identifiers

Navigation menu

Search