Neural networks for the treatment of flexible natural languages (Q3056361)

Modernu IKT risinājumu izveide nav iedomājama bez valodu tehnoloģijām, kuras nodrošina risinājumu pieejamību lietotāja dzimtajā valodā. META-NET veiktais pētījums (Rehm&Uszkoreit, 2013) norāda uz būtiskām atšķirībām valodas tehnoloģiju attīstības līmenī. Pasaules lielajām valodām tiek piešķirts vērtējums “no laba līdz izcilam”, bet valodām ar nelielu runātāju skaitu - “no vidēja līdz zemam”. Šīs plaisas mazināšana ir liels izaicinājums, mūsdienu valodu tehnoloģijās izmantotajām metodēm nepieciešami lieli datu apjumi, kuru īpaši trūkst mazajām fleksīvām valodām ar bagātu morfoloģiju. Viena no jaunākajām, daudzsološākajām tendencēm IKT risinājumu veidošanā ir neironu tīklu tehnoloģiju (NT) izmantošana. NT tiek veiksmīgi izmantoti tādām valodām kā angļu, franču, spāņu un ķīniešu valodām, taču dziļi pētījumi nelielām valodām ar nepietiekamiem valodas resursiem, piemēram, baltu, somu-ugru un slāvu valodām vēl nav veikti. Nepieciešamība mazināt plaisu valodu tehnoloģiju attīstībā un perspektīvā NT tehnoloģijas izmantošana paver lielu iespēju projekta komandai Rīgā, Latvijā. Šis projekts un tā sagaidāmie rezultāti Latvijai ļaus ieņemt vadošo pozīciju valodu tehnoloģiju izpētē un inovatīvu risinājumu radīšanā. Latvija jau šobrīd ir sasniegusi augstu līmeni šajā jomā, to nodrošinājusi cieša akadēmiskās vides un komerciālo struktūru sadarbība. Latvija līdzdarbojas pētniecībā ar citām Eiropas nācijām un ir piemērota valodu tehnoloģiju izpētes un inovāciju reģionālā centra izveidei.Projektā tiks pētīti, analizēti un novērtēti (gan ar automātiskām metodēm, gan manuāli) dažādi NT algoritmi, lai atrastu piemērotākos risinājumus vairākiem nozīmīgiem valodas tehnoloģiju uzdevumiem:- rakstītā teksta apstrādē – tekstu sintaktiskai analīzei, gramatiskuma novērtēšanai un gramatisko kļūdu labošanai (šos komponentus varēs izmantot arī automatizētas tulkošanas, runātās valodas un cilvēka-datora saziņas problēmu risināšanā); - runas tehnoloģijās – efektīvā runas atpazīšanā un runas sintēzē, jaunu datu ieguvē un sagatavošanā, ja pieejamais datu apjoms ir neliels;- automatizētā tulkošanā – labākā dažādu teksta vienību tulkojumu izveidē; automatizētā vārdnīcu izveidē;- cilvēka-datora saziņas modelēšanā – dabiskās valodas sapratnes, dialoga pārvaldības un dabiskās valodas ģenerēšanas uzdevumos. Tiks pētīti arī dialoga modelēšanas, zināšanu izguves un pārvaldības kognitīvie aspekti un to modelēšana neironu tīklos. Tiks pētītas dažādas NT arhitektūras, tostarp dziļie rekursīvie (angliski: deep recurrent), dziļie vienvirziena (angliski: feed-forward) NT, un efektīvi NT apmācības algoritmi, lai risinātu datu skrajuma, morfoloģiskās daudzveidības un resursu nepietiekamības problēmas. Tiks veikta kognitīvo procesu – valodas, vizuālās uztveres, semantiskās apstrādes – empīriska (korelatīva, eksperimentāla) izpēte. Iegūtās zināšanas tiks izmantotas inovatīvu fleksīvo valodu apstrādes metožu radīšanā. Pētījuma teorētiskie rezultāti tiks apkopoti vismaz 15 Web of Science vai SCOPUS datubāzēs iekļauto žurnālu vai konferenču rakstu krājumu publikācijās.Pētījuma teorētiskie rezultāti tiks pārbaudīti laboratoriskos prototipos (TRL 4), kā arī apvienoti prototipā cilvēka-datora saziņai (TRL 5). Veiksmīgākie risinājumi tiks publicēti kā atvērtā koda risinājumi.Plānoto pētījumu rezultātiem ir liels komercializācijas potenciāls, tāpēc pēc projekta beigām turpināsies to attīstīšana un stabilizēšana, lai izveidotu inovatīvus programmproduktus Latvijas un Eiropas tirgum.Projekts ir ar saimniecisko darbību saistīts projekts, tas tiks īstenots sadarbībā starp komersantu (Tilde) un pētniecības institūciju (Latvijas Universitāte) un ilgs no 2016. gada 1. novembra līdz 2019. gada 31. oktobrim. Projekta kopējas izmaksas ir 690 672,13 EUR, t.sk., ERAF atbalsts 484 384,04 EUR. Projekta pētniecības kategorija - eksperimentālā izstrāde un rūpnieciskie pētījumi. Projekts pēc Frascati rokasgrāmatas atbilst IKT zinātnes nozarei “1.2 Computer and information sciences” (Latvian)

0 references

The development of modern ICT solutions is inconceivable without language technologies ensuring the availability of solutions in the user’s mother tongue. The meta-NET study (Rehm&Uszkoreit, 2013) points to significant differences in the level of development of language technologies. The world’s great languages are rated “good to excellent”, while languages with a small number of speakers are rated “from medium to low”. Bridging this gap is a major challenge, the methods used in modern language technologies require large data penetrations, which are particularly lacking for the small flexive languages with rich morphology. One of the most promising developments in ICT solutions is the use of Neural Network Technologies (NT). NTs are successfully used in languages such as English, French, Spanish and Chinese, but deep studies have not yet been carried out in small languages with insufficient linguistic resources, such as white, Finnish-Ugric and Slavic languages. The need to bridge the gap in the development of language technologies and the prospective use of NT technology opens a great opportunity for the project team in Riga, Latvia. This project and its expected results will allow Latvia to take a leading position in the research of language technologies and the creation of innovative solutions. Latvia has already reached a high level in this field, ensured by close cooperation between the academic environment and commercial structures. Latvia participates in research with other European nations and is suitable for the creation of a regional centre for research and innovation of language technologies.The project will explore, analyse and evaluate various NT algorithms (both automatic methods and manually) to find the most suitable solutions for several important tasks of language technologies:- processing of written text – syntactic analysis of texts, evaluation of grammar and correction of grammar errors (these components will also be used for automated translation, spoken language and human-computer communication problems); In speech technologies – effective speech recognition and speech synthesis, acquisition and preparation of new data where the amount of data available is small;- automated translation – for the creation of the best translations of different text units; Automated dictionaries;- human-computer communication modelling – tasks of understanding the natural language, managing dialogue and generating natural language. Cognitive aspects of dialogue modelling, knowledge retrieval and management and their modelling in neural networks will also be studied. Different NT architectures will be studied, including deep recursive (in English: Deep Recurrent), deep one-way (in English: Feature-forward) NT, and efficient NT training algorithms to address data scratching, morphological diversity and resource scarcity problems. Cognitive processes – language, visual perception, semantic processing – empirical (correlative, experimental) will be studied. The acquired knowledge will be used to create innovative flexible language processing methods. The theoretical results of the study will be summarised in the publications of at least 15 journals or conference articles from Web of Science or SCOPUS databases.The theoretical results of the study will be tested in laboratory prototypes (TRL 4), as well as pooled human-computer communication (TRL 5). The most successful solutions will be published as open source solutions.The results of the planned research have great potential for commercialisation, therefore after the end of the project their development and stabilisation will continue in order to create innovative software products for the Latvian and European market.The project is a project related to economic activities, it will be implemented in cooperation between the merchant (Tilde) and research institution (University of Latvia) and will run from November 1, 2016 to October 31, 2019. The total cost of the project is EUR 690672,13, including ERDF support EUR 484384,04. Research category of the project – experimental development and industrial research. The project, according to the Frascati handbook, corresponds to the ICT science industry “1.2 Computer and information sciences” (English)

Neural networks for the treatment of flexible natural languages (Q3056361)

Statements

Identifiers

Navigation menu

Search