Merging and Splitting Segments in #OmegaT without editing segmentation rules.

One of the complains OmegaT gets is impossibility to split and merge segments without editing projects’ or global segmentation rules.  There were a few attempts to address the issue, but they required a third-party utility that would edit segmentation.conf. One of the most recent attempt was Dimitry Prihodko’s Merge utility. If I understood it right, Dimitry asked Yu Tang to rework his thingy, and Yu Tang came up with a Groovy script that did all the merging using only OmegaT internals. It wasn’t limited to any OS or dependent on other tools (so much for hard Pascal coding, Dimitry). There was only a minor issue that the script couldn’t be used to split segments. And that’s what I’ve added and what I’m sharing here. Continue reading

#ISO 9:1995 #Transliteration in #OmegaT

This short announcement might be of some interest to those OmegaT users who work with Cyrillic text. Below you’ll find a script that transliterates current target or selection according to transliteration standard ISO 9 (one of the very few reversible Cyrillic translit systems).  The script is a tiny adaptation of the one discussed in the article Translit для JavaScript.

All you need to do is to copy it into your scripts folder and run it when there’s something you need transliterated (can be run multiple times — it’ll toggle the text between Cyrillic and Latin). If the text is not transliterateable, the script will not change it.

Here’s the link to the script: http://pastebin.com/npXEthmc (download).

//:name=Utils - Translit :description=Transliterate current target or selection
/*******************************************************************************
* @Name   : "translit(a, b)"                         // Имя
* @Params :   str  - транслитерируемая строка        // Параметры запуска
              typ  - [123456]
                   system A = 1-диакритика
                   system B =(2-Беларусь;3-Болгария;4-Македония;5-Россия;6-Украина)
                   Если typ отрицательное - обратная транслитерация
* @Descrp : Прямая и обратная транслитерация         // Описание
            по стандарту ISO 9 или ISO 9:1995 или ГОСТ 7.79-2000 системы А и Б
* @ExtURL : ru.wikipedia.org/wiki/ISO_9              // Внешний URL
* #Guid   : {E7088033-479F-47EF-A573-BBF3520F493C}   // GUID
* @Exampl : "example()"                              // Пример использования
* GPL applies. No warranties XGuest[11.02.2015/03:44:01] translit [ver.1.0.1]
*******************************************************************************/
var dia = false;
//var loc = java.util.Locale.getDefault().getLanguage();
var prop = project.getProjectProperties();
var ste = editor.currentEntry;
if (editor.selectedText){
	var target = editor.selectedText;
	}else{
	var target = editor.getCurrentTranslation();
	}

var tlcode = prop.getTargetLanguage().getLanguageCode();
var suplang = ["BE", "BG", "MK", "RU", "UK"];

if ((/[\u0400-\u04ff]+/ig).test(target)){
	transcode = suplang.indexOf(tlcode) ? suplang.indexOf(tlcode) + 2 : 0 ;
	transcode = dia ? 1 : transcode ;
	}else{
	transcode = suplang.indexOf(tlcode) ? -(suplang.indexOf(tlcode) + 2) : 0 ;
	transcode = dia ? -1 : transcode ;
	}

exports = function (str, typ) {
 var func = function (typ) {
 /* Function Expression
  * Вспомогательная функция.
  *
  * В ней и хотелось навести порядок.
  *
  * Проверяет направление транслитерации.
  * Предобработка строки (правила из ГОСТ).
  * Возвращает массив из 2 функций:
  *  построения таблиц транслитерации.
  *  и пост-обработки строки (правила из ГОСТ).
  *
  * @param  {Number} typ
  * @return {Array}
  */
  var abs = Math.abs(typ);             // Абсолютное значение транслитерации
  if(typ === abs) {                    // Прямая транслитерация(кирилица в латиницу)
   // Правила транслитерации (из ГОСТ).
   // "i`" только перед согласными в ст. рус. и болг.
   //  str = str.replace(/(i(?=.[^аеиоуъ\s]+))/ig, "$1`");
   str = str.replace(/(\u0456(?=.[^\u0430\u0435\u0438\u043E\u0443\u044A\s]+))/ig, "$1`");
   return [                            // Возвращаем массив функций
    function (col, row) {              // создаем таблицу и RegExp
     var chr;                          // Символ
     if(chr = col[0] || col[abs]) {    // Если символ есть
      trantab[row] = chr;              // Добавляем символ в объект преобразования
      regarr.push(row);                // Добавляем в массив RegExp
     }
    },
    // функция пост-обработки
    function (str) {                   // str - транслируемая строка.
    // Правила транслитерации (из ГОСТ).
    return str.replace(/i``/ig, "i`"). // "i`" только перед согласными в ст. рус. и болг.
    replace(/((c)z)(?=[ieyj])/ig, "$2");// "cz" в символ "c"
    }];
  } else {                             // Обратная транслитерация (латиница в кирилицу)
   str = str.replace(/(c)(?=[ieyj])/ig, "$1z"); // Правило сочетания "cz"
   return [                            // Возвращаем массив функций
    function (col, row) {              // Создаем таблицу и RegExp
     var chr;                          // Символа
     if(chr = col[0] || col[abs]) {    // Если символ есть
      trantab[chr] = row;              // Добавляем символ в объект преобразования
      regarr.push(chr);                // Добавляем в массив RegExp
     }
    },
   // функция пост-обработки
   function (str) {return str;}];      // nop - пустая функция.
  }
 }(typ);
 var iso9 = {                          // Объект описания стандарта
   // Имя - кириллица
   //   0 - общие для всех
   //   1 - диакритика         4 - MK|MKD - Македония
   //   2 - BY|BLR - Беларусь  5 - RU|RUS - Россия
   //   3 - BG|BGR - Болгария  6 - UA|UKR - Украина
   /*-Имя---------0-,-------1--,---2-,---3-,---4-,----5-,---6-*/
 "\u0449": [   "", "\u015D",   "","sth",   "", "shh","shh"], // "щ"
 "\u044F": [   "", "\u00E2", "ya", "ya",   "",  "ya", "ya"], // "я"
 "\u0454": [   "", "\u00EA",   "",   "",   "",    "", "ye"], // "є"
 "\u0463": [   "", "\u011B",   "", "ye",   "",  "ye",   ""], //  ять
 "\u0456": [   "", "\u00EC",  "i", "i`",   "",  "i`",  "i"], // "і" йота
 "\u0457": [   "", "\u00EF",   "",   "",   "",    "", "yi"], // "ї"
 "\u0451": [   "", "\u00EB", "yo",   "",   "",  "yo",   ""], // "ё"
 "\u044E": [   "", "\u00FB", "yu", "yu",   "",  "yu", "yu"], // "ю"
 "\u0436": [ "zh", "\u017E"],                                // "ж"
 "\u0447": [ "ch", "\u010D"],                                // "ч"
 "\u0448": [ "sh", "\u0161"],                                // "ш"
 "\u0473": [   "","f\u0300",   "", "fh",   "",  "fh",   ""], //  фита
 "\u045F": [   "","d\u0302",   "",   "", "dh",    "",   ""], // "џ"
 "\u0491": [   "","g\u0300",   "",   "",   "",    "", "g`"], // "ґ"
 "\u0453": [   "", "\u01F5",   "",   "", "g`",    "",   ""], // "ѓ"
 "\u0455": [   "", "\u1E91",   "",   "", "z`",    "",   ""], // "ѕ"
 "\u045C": [   "", "\u1E31",   "",   "", "k`",    "",   ""], // "ќ"
 "\u0459": [   "","l\u0302",   "",   "", "l`",    "",   ""], // "љ"
 "\u045A": [   "","n\u0302",   "",   "", "n`",    "",   ""], // "њ"
 "\u044D": [   "", "\u00E8", "e`",   "",   "",  "e`",   ""], // "э"
 "\u044A": [   "", "\u02BA",   "", "a`",   "",  "``",   ""], // "ъ"
 "\u044B": [   "",      "y", "y`",   "",   "",  "y`",   ""], // "ы"
 "\u045E": [   "", "\u01D4", "u`",   "",   "",    "",   ""], // "ў"
 "\u046B": [   "", "\u01CE",   "", "o`",   "",    "",   ""], //  юс
 "\u0475": [   "", "\u1EF3",   "", "yh",   "",  "yh",   ""], //  ижица
 "\u0446": [ "cz",      "c"],                                // "ц"
 "\u0430": [  "a"],                                          // "а"
 "\u0431": [  "b"],                                          // "б"
 "\u0432": [  "v"],                                          // "в"
 "\u0433": [  "g"],                                          // "г"
 "\u0434": [  "d"],                                          // "д"
 "\u0435": [  "e"],                                          // "е"
 "\u0437": [  "z"],                                          // "з"
 "\u0438": [   "",      "i",   "",  "i",  "i",   "i", "y`"], // "и"
 "\u0439": [   "",      "j",  "j",  "j",   "",   "j",  "j"], // "й"
 "\u043A": [  "k"],                                          // "к"
 "\u043B": [  "l"],                                          // "л"
 "\u043C": [  "m"],                                          // "м"
 "\u043D": [  "n"],                                          // "н"
 "\u043E": [  "o"],                                          // "о"
 "\u043F": [  "p"],                                          // "п"
 "\u0440": [  "r"],                                          // "р"
 "\u0441": [  "s"],                                          // "с"
 "\u0442": [  "t"],                                          // "т"
 "\u0443": [  "u"],                                          // "у"
 "\u0444": [  "f"],                                          // "ф"
 "\u0445": [  "x",      "h"],                                // "х"
 "\u044C": [   "", "\u02B9",  "`",  "`",   "",   "`",  "`"], // "ь"
 "\u0458": [   "","j\u030C",   "",   "",  "j",    "",   ""], // "ј"
 "\u2019": [  "'", "\u02BC"],                                // "’"
 "\u2116": [  "#"]                                           // "№"
  }, regarr = [], trantab = {};
 for(var row in iso9) {func[0](iso9[row], row);} // Создание таблицы и массива RegExp
 return func[1](                       // функция пост-обработки строки (правила и т.д.)
  str.replace(                         // Транслитерация
  new RegExp(regarr.join("|"), "gi"),  // Создаем RegExp из массива
  function (R) {                       // CallBack Функция RegExp
   if(                                 // Обработка строки с учетом регистра
    R.toLowerCase() === R) {
    return trantab[R];
   } else {
    return trantab[R.toLowerCase()].toUpperCase();
   }
  }));
};


if (! target){
	console.println("Target is empty");
	} else {
	var newtarget = exports(target, transcode)
	if (newtarget == target){
		console.println("Could not transliterate");
		}else{
		if (editor.selectedText){
			editor.insertText(newtarget);
			}else{
			editor.replaceEditText(newtarget);
			}
		console.clear();
		console.println(target + "\n↓\n" + newtarget);
		}
	}

Changing line 16 from ‘false’ to ‘true’ will make the script use diacritics for transliteration.

Major update to #OmegaT QA Script

Sometime ago my monkey approach to programming led me to creating a GUI for QA rules checking script. That was fun, the result was sometimes even usable, but since I don’t really know how to program, I got stuck with developing it. Ok, a rule or two was added now and then, but that doesn’t really count. But then all of a sudden the spellcheck script in OmegaT got drastically improved, and that meant I could mimic some new ideas. That’s exactly what I did, and here’s the new “QA – Check Rules” script:

Image

Continue reading

Installing and using #OmegaT scripts (Reblog/Translation)

I meant to write a short article about OmegaT script basics for a long time, but never found time to do so. This mishap got fixed without me, and out of gratitude I’m posting my translation of Gli script di OmegaT by LanguageLane. Continue reading

“Filtered” Note Export in #OmegaT

This script is variation of the one published before that exports all notes in the current project. The only difference is that this one allows you to select which notes will get exported based on the first line of the note. The resultant HTML table will consist of four columns: Source, Target, Filtered Notes (adjustable heading name), and Reply.
Say, you want to be able to export only the notes that start with <query>, as you’ve been using this word (<query>) to mark your questions to the client. In order to do so, go to line 14 and specify which mark-word was used. Note: The mark-word used to filter notes should be found in the very beginning of the very first line of the note, otherwise it’ll be ignored. In line 15 you can specify the column heading.

All project notes

All project notes

Only filtered notes

Only filtered notes

Continue reading

Export #OmegaT Project Notes

Here’s a new script that lets you export OmegaT project notes to a HTML table. It may help you to discuss different translation issues with the client/editor/your spiritual guru or review your own translation if you use notes for yourself.

When the script is invoked, it will create a file named PROJECTNAME_notes.html in /script_output subfolder of the current project root (the subfolder will be created if it doesn’t exist, and PROJECTNAME is the actual name, of course).

Exported notes screenshot
Continue reading

XLIFF to TMX

One of the recent scripts published here allowed OmegaT users who wanted their project to be worked on in a different CAT tool, to export the whole OmegaT project to an XLIFF file. To get the completed work back to OmegaT, one had to run Okapi Rainbow to convert XLIFF to TMX, possibly using the Rainbow settings file created by the script.

In this post I’ll share how to convert those OmegaT-created XLIFF files finished (or partly finished) in Trados/MemoQ/Deja Vu/WhatNotCAT back to TMX that can be used in OmegaT (all tags preserved, of course, that was the whole point), right from within OmegaT, without running Rainbow manually. Continue reading

Locked OmegaT Team Project (SVN)

Situation

One of the translators involved in a team translation project using a very cool OmegaT Team project feature gets a message about the project being locked.

16

Problem

The translator’s connectivity is rather limited and re-downloading the whole project would be very undesirable. There’s also no SVN software installed on her computer (other than OmegaT itself). Otherwise the quickest solution would be either getting OmegaT to download the project anew into an empty folder, or run “svn cleanup” on the project’s folder using one of the available SVN tools. Bad luck this time.

Solution

Since OmegaT can act as a SVN client itself (that’s what it does when a Team project is loaded), and there’s a cool scripting functionality, why not just exploit OmegaT to do the cleanup on any folder of our choice? So, here’s the script (the heading is also a link):

  • svn_cleanup_selected.groovy

    /*
     * Perform SVN cleanup on any local SVN repository
     *
     * @author	Yu Tang
     * @author	Kos Ivatsov
     * @date	2014-01-17
     * @version	0.2
     */
    
    import javax.swing.JFileChooser
    import org.omegat.core.team.SVNRemoteRepository
    import org.tmatesoft.svn.core.wc.*
    
    def folder
    
    if (project.isProjectLoaded()) {
    	def prop = project.getProjectProperties()
    	folder = new File(prop.getProjectRoot())
    	}else{
    	JFileChooser fc = new JFileChooser(
    		dialogTitle: &amp;amp;quot;Choose SVN repository to perform cleanup&amp;amp;quot;,
    		fileSelectionMode: JFileChooser.DIRECTORIES_ONLY, 
    		multiSelectionEnabled: false
    		)
    	if(fc.showOpenDialog() != JFileChooser.APPROVE_OPTION) {
    		console.println &amp;amp;quot;Canceled&amp;amp;quot;
    		return
    		}
    	folder = new File(fc.getSelectedFile().toString())
    	}
    
    if (SVNRemoteRepository.isSVNDirectory(folder)) { 
    	def clientManager = SVNClientManager.newInstance()
    	clientManager.getWCClient().doCleanup(folder)
    	console.println(&amp;amp;quot;Cleanup done!!&amp;amp;quot;)
    	}
    return
    

    If that pesky message pops up, it should usually suffice to fire up the script (it works even if no project is currently loaded, otherwise it cleans up the current project if it happens to be a Team project), browse to the problematic project folder and get “Cleanup done!!” message in the Scripting console. After that, the project should open without problems.

    After the fix there might be a problem with sync, and OmegaT may throw a message like this:

    sync problem

    but that is usually easily fixed with Ctrl+S (Save) or F5 (Reload).

I wish you all good and stable Internet connection, responsive and responsible team members and a glitch-less work-flow.

But as of now,

Good luck

UPDATE:

The script is now bundled with OmegaT, you’ll get it along with the program. No need to download it from the links posted here.