State-of-the-art statistical MT employed in more or less interactive settings generally lacks dynamic adaptation capabilities that allow it to learn from the user’s feedback. On the other hand, a very natural desire of a human translator using MT in aCATtool would be to see a consistent use of terminology and style that is similar to his/her own throughout the text, and that once he/she corrects an error this should not occur again in the following text segments. In addition, such adaptations should happen in real time.
MateCat will provide methods for the automatic self-correction of MT making use of the implicit feedback of the user. The segments of text that have already been post-edited by the user will be analysed and compared with the corresponding automatic translations by the MT in order to spot the errors together with their corrections and the portions accepted by the translator. The MT models will be modified accordingly by penalizing the former and reinforcing the latter, or, more drastically, by removing the source of errors. Although ad-hoc transformations could be similar to those for the project adaptation (see above), the goal here is to make them very precise and consistent with the actual translator. Through this on-line adaptation, which is performed in real-time and sentence by sentence, MT should automatically translate the following segments more and more consistently with respect to the previous ones from the point of view of the translator’s lexical and stylistic preferences.
MateCat will also focus on providing suggestions by MT which are consistent with respect not only to the already edited segments but also to the whole document. This context information will be embedded in the statistical models and will enable better disambiguation, for instance, between lexical alternatives. The context-based models will combine information about recurring terms and expressions extracted during the document analysis with the corresponding chosen and confirmed translations as soon as they become available. In particular, translation constraints related to inter-sentence and intra-sentence anaphoric expressions, to syntactic concordances, and to lexical coherence will be taken into account by means of specific statistical models.
The core components of traditional MT systems, that is, the translation and the language models, are generally static: they never change after an initial training phase. This means that they are unsuitable for a dynamic environment like the one that MateCAT is designing for translators. In order to model the dynamic changes depicted in the two previous tasks, MateCat will develop innovative data-structures that can be rapidly and effectively updated as soon as a new translation is supplied by the user, and innovative, efficient algorithms for performing this adaptation in such a way that the whole process takes place in real time and is transparent to the translator. Moreover, efficiency will be improved by taking advantage of single CPU multithreading, as well as distributed computing facilities running on private clusters or computer clouds.