Analysis Report

During creation of the project, when all files have been uploaded and all information has been correctly set on the MateCat home page, the Analyze button will be displayed below the box containing all the files. Click on the Analyze button to start the analysis.

The analysis page will open.

A progress bar shows you the progress on analyses of the files you uploaded for translation. Once the analysis is completed, the volume analysis page displays the Analysis report. It contains information about the number of words (Payable, Total, New, Repeated) for the entire project and for each file/job.

You can download a TXT file containing the details of the volume analysis of MateCat. A link to download the report will appear on the analysis page as shown below.

screenshot-beta-matecat-com-2017-07-31-10-06-31-1


Payable words are marked in green and calculated by multiplying the words for each match type by the payable percentage:

(12*0.6) + (265*0.3) + (18*0.8) = 101

Assume you obtain the following analysis result:

screenshot-drive-google-com-2017-08-04-10-05-21

 

In detail, we have the indication of:

Payable Words

Payable word count is the sum of the weighted word count for each match type multiplied by its payable rate percentage.

Total Words

The total word count without leveraging any content from translation memory matches, repetitions or machine translation. This is similar to what Microsoft Word would give as a word count in a .doc or .docx file.

New Words

All words found in segments that:

  • do not match any fuzzy or complete match in the private TM Key and/or in the public TM;
  • are not repeated in the project;
  • do not have a suggestion from a machine translation.

Repetitions

Number of words of identical segments that occurs more than once throughout the project.

For example, imagine that we find the following segments in our translation:

  • My house is blue.
  • My house is blue.
  • My house is red.

Segments 1 and 2 are identical segments, so they are counted as 4 repetitions.

Segment 3 is counted as an internal match.

Internal Matches

Internal matches are similar segments found in the document you are translating. For example, imagine that we find the following segments in our translation:

  • My house is blue.
  • My house is red.

MateCat recognises that segment 2 is similar to segment 1 (3 words out of 4 are identical) so the following would occur during translation:

  • You translate segment 1.
  • The translation memory is updated with this translation.
  • You open segment 2 for translation.
  • A search is performed in the TM for segment 2 and a 75% fuzzy match is found.

MateCat therefore counts four new words for segment 1 but, because a 75% fuzzy match has been found in the TM, the four words in segment 2 are counted as Internal Matches (in terms of weighted words, these are counted as 2.4, or 60% of 4).

Partial TM

In this case, similarity between the document to be translated and any correspondences found in the translation memory (fuzzy matches) are calculated.

For example, imagine that we have the following segments in an EN>FR translation:

  • My house is blue.
  • My house is red.

and that the TM contains this:

Source Target
My house is blue. Ma maison est bleue.

For segment 1 (My house is blue), there will be a 100% match in the translation memory. For segment 2 (My house is red), there will be a 75% fuzzy match (segment 2 and the segment found in the translation memory only differ in terms of the colour of the house: blue/red –bleue/rouge, so 1 word out of 4).

100% TM

This is a 100% match between a segment in the source language found in the document to translate and an identical sentence found in the source language in the private translation memory.

100% Public TM

This is a 100% match between a segment in the source language found in the document to translate and an identical sentence found in the source language in the public translation memory.

100% TM in Context

This is more than a 100% match.

If you have a 100% context match (the corresponding label is “101%”), this means that both of the following 2 conditions exist:

  • the segment in the document has a 100% match in the translation memory;
  • the segment in the document and the segment in the translation memory must both be preceded by the same segment.

For example, if you have to translate the following segments in the same order:

  • My house is blue.
  • My house is red.

and you have a 100% match in the translation memory for segment 2, you will actually receive a 100% CM if segment 2 is also preceded by segment 1 in the translation memory.

This means that you can be certain that the 100% match is correct according to the context of the document you are translating. 

Actually this does not mean that, in the list of segments in the translation memory, the segment in the translation memory is indeed preceded by the same segment as the one in the document. What it means is that context information is stored in the translation memory as metadata, so each segment stored in the translation memory contains the following information (not visible in the translation memory contents by the users):

  • the source segment;
  • the corresponding translation;
  • the segment preceding the segment itself.

 

When a segment has an in-context exact match the target segment has a green bar to its right. There is also a lock icon that appears on the left, that enables translators to modify that segment.

 

Find out more on this topic in the specific section of the FAQ.
Was this article helpful?
10 4