Term Extraction
We offer you an automated terminology extraction service which is based on a statistical method and results in bilingual term pairs. We extract them from your TMX files. The quality of the pairs found depend on the size of entries in the tmx file, the more entries the better results are gained.
As a result you receive a TBX or csv file containing the extracted terms. We can also offer other formats you may prefer.
Our automatic term extraction tool creates term proposals which have a quality measurement attached. 1,0 indicates a very high probability that the found term pairs are real translations of each other.
We do not only extract one word phrases but multi-word phrases too.
Our terminology extraction works very fast and under normal conditions we can provide you with the extracted terminology within one day. If necessary we can clean the extracted terms with known terms from your terminology management systems. We provide you with a fast and simple method to get the most out of your translations. You speed up your terminology work, get yourt hands free from routine work and the time consuming search for terminology in your translations.
Araya Bilingual Term Extraction Tool
New: You can now download a test version of the Araya Bilingual Term Extraction Tool. The main restriction compared to the full version is that only 20 term pairs will be extracted.
With the bilingual term extraction tool of Araya you can run your own bilingual term extraction in an cost effective way.
Based on an easy to use graphical user interface you can extract bilingual terminology from a TMX file and edit the found the term pairs.
Where can you use the tool?
- Dictionary production - Generating new bilingual dictionary entries
- Terminology creation - and checking - Is terminology used consistently in your tmx files?
- Terminology Mining ("Data Mining") - Analyse the contents of your translations
- and much more...
Watch video on bilingual term extraction!
You will find more information about this tool here:
Supported input formats
- TMX Files
- All openTMS Data Sources (Xliff, ...)
Prices
see
Prices.
Example of extracted translations
nr;score;status;term1.LangCode;term1.wordGroup;term1.wordGroupLen;term1.wFreq;term2.LangCode;term2.wordGroup;term2.wordGroupLen;term2.wFreq
0;1.0;approved;de;AG;1;7;en;AG;1;7
1;1.0;approved;de;ASP;1;5;en;ASP;1;5
2;1.0;approved;de;Ausgangssituation;1;10;en;Initial situation;2;10
3;1.0;approved;de;Circle;1;17;en;Circle;1;17
4;1.0;approved;de;Claudia;1;14;en;Claudia;1;14
5;1.0;approved;de;Crystal;1;4;en;Crystal;1;4
6;1.0;approved;de;Datei;1;4;en;file;1;4
7;1.0;approved;de;Diplomarbeit;1;4;en;thesis;1;4
8;1.0;unapproved;de;Donau;1;9;en;Donau;1;9
9;1.0;approved;de;Downloads;1;15;en;Downloads;1;15
10;1.0;approved;de;Dr;1;36;en;Dr;1;36
...
118;0.9;unapproved;de;Presse Tourismus;2;10;en;Press Tourism;2;8
119;0.9;unapproved;de;Relationship Management;2;5;en;Relationship Management;2;4
120;0.9;unapproved;de;Technologie GmbH;2;10;en;Technologie GmbH;2;8
121;0.8888888888888888;unapproved;de;etc;1;8;en;etc;1;10
122;0.8846153846153846;unapproved;de;info;1;10;en;info;1;13
123;0.8823529411764706;unapproved;de;Email;1;17;en;Email;1;13
124;0.875;unapproved;de;Projekte Tourismus;2;16;en;Projects Tourism;2;12
Tests using the Europarl Corpus
Bilingual terminologie extraction was tested and is currently tested using the
Europarl Corpus.
Tests using the Multilingual Translation Memory of the Acquis Communautaire: DGT-TM
Bilingual terminologie extraction was tested and is currently tested using the
Multilingual Translation Memory of the Acquis Communautaire: DGT-TM.
If you are interested in our terminology services please send an E-mail: info@heartsome.de