Stripping Tags Everywhere, Groovy Way

Every once in a while you have to deal with a match that has wrong tags. Hopefully, pretty soon OmegaT will be smart enough to deal with such matches for you, making it possible to insert a wrongly tagged match in such a way that you wouldn’t have to fix tags — they’ll get fixed on their own. But while we are not there yet, a practical workaround is to use the match tag-free and to insert proper tags wherever needed (OmegaT 3 lets you insert them one by one, and a new default shortcut for that is Ctrl+T).
In this post I share 5 groovy scripts to strip tags in different situations (headings link to pastebin.com, files can be downloaded from there):

  • Replacing target with match
    /*
     * #Purpose: Replace current target with tag-free match 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    import org.omegat.core.Core;
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Replace with Match (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def match = Core.getMatcher()
    def near = match.getActiveMatch()
    if (near != null) {
      def matchtranslation = "$near.translation"
      matchtranslation = matchtranslation.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
      editor.replaceEditText(matchtranslation);
    }
    
  • Inserting match
    /*
     * #Purpose: Insert tag-free match into current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    import org.omegat.core.Core;
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Insert Match (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def match = Core.getMatcher()
    def near = match.getActiveMatch()
    if (near != null) {
      def matchtranslation = "$near.translation"
      matchtranslation = matchtranslation.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
      editor.insertText(matchtranslation)
    }
    
  • Replacing target with source
    /*
     * #Purpose: Replace current target with tag-free source 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Replace with Source (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def stext = editor.currentEntry.getSrcText().replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    editor.replaceEditText(stext)
    
  • Inserting source
    /*
     * #Purpose: Insert tag-free source into current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Insert source (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def stext = editor.currentEntry.getSrcText().replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    editor.insertText(stext)
    
  • Stripping tags in target
    /*
     * #Purpose: Remove tags in the current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Strip tags in current segment'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    target = editor.getCurrentTranslation()
    if (target != null) {
    target = target.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    }
    editor.replaceEditText(target)
    

There are plenty of other ways to remove tags in OmegaT, some of them even posted as my recipes, but the beauty of using groovy is that scripts can be run from withing OmegaT, with its own keyboard shortcut, without needing to assign an OS shortcut to an external script/application.
As usual, inspiration for the scripts was an idea shared at OmegaT Yahoo! Group


Good luck!

Writing Auxilary Text Files from OmegaT

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I’d like to share two Groovy scripts that don’t help with anything at hand in OmegaT, but write out external text files that can often be helpful in producing better quality translation.

The first script writes selected text to a file along with some context information. This can be helpful if you need to produce a list of unknown/unclear term that need to be discussed with the client, or things to be double-checked, studied, rewritten etc.

  • write_selection2list.groovy
    /*
     * #Purpose: Write selection to a file to create a list of terms
     * #Files:   Writes 'terms_list.txt' in the current project's root
     *     the file contains selection text, segment number, segment text
     *     and filename of the selection, if selection is in the current segment,
     *     or just the text of selection and the filename, if selection
     *     is outside the current segment.
     * #Note:    When invoked without selection, it opens the file
     *     in the default text editor
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Selection to List'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    // get segment #, source filename and the whole current segment
    def srcfile = editor.currentFile
    def ste = editor.currentEntry
    cur_text = ste.getSrcText()
    cur_seg = ste.entryNum()
    
    // define list file
    
    def folder = prop.projectRoot
    def fileloc = folder+'/terms_list.txt'
    list_file = new File(fileloc)
    
    // create file if it doesn't exist
    if (! list_file.exists()) {
    	list_file.write(&quot;&quot;,'UTF-8')
    	}
    
    /* 
     * command to open the file if there's no active selection
     * if a custom (not OS default) text editor should be used,
     * it needs to be defined in the next line (edit as needed and uncomment)
     */
    
    // def textEditor = /path to your editor/
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = &quot;cmd /c start \&quot;\&quot; \&quot;$list_file\&quot;&quot;  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : &quot;\&quot;$textEditor\&quot; \&quot;$list_file\&quot;&quot; } catch (ignore) {}
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = ['open', list_file]  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : ['open', '-a', textEditor, list_file] } catch (ignore) {}
        break
      default:  // for Linux or others
        command = ['xdg-open', list_file] // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : [textEditor, list_file] } catch (ignore) {}
        break
    }
    
    def sel_txt = editor.selectedText
    if (sel_txt) {
    	list_file.append &quot;${'='*10}\n $sel_txt\n&quot;,'UTF-8'
    	if (cur_text =~ sel_txt) {
    		list_file.append &quot;${'-'*5}\n\
    filename: $srcfile\n\
    segment: $cur_seg\n\
    segment text: $cur_text \n\n&quot;,'UTF-8'
    	}else{
    		list_file.append &quot;${'-'*5}\n\
    filename: $srcfile\n\
    ***Selection outside of current segment***\n&quot;,'UTF-8'
    	}
    	console.println &quot;\&quot;$sel_txt\&quot; written to $list_file&quot;	
    } else {
    console.println &quot;[No selection]&quot;
    console.println &quot;***Opening the file in text editor***&quot;
    console.println &quot;Command: $command&quot;
    command.execute()
    return // exit
    }
    

    The list is created in the current OmegaT project folder, file is named terms_list.txt. When the script is invoked with no selection, this file is opened in the default text editor — so that you can easily view or edit the file. When it’s invoked with some text selected in the Editor pane, the selection gets written to the file along with some context info depending on whether selection is inside or outside of the current segment.
    I’d like to write wider context, but I don’t know how to get text from previous and next segment without actually going there. Any help is welcome and appreciated, as usual.

The second script writes unique untranslated segments from the complete project into a text file named untranslated.txt. This files is located in the project’s root folder, and is rewritten each time the script is invoked. Such file can be used for a number of purposes, including producing TMX with MT.

  • write_untranslated2file.groovy
    /*
     * #Purpose: Write all unique untranslated segments to a file
     * #Files:   Writes 'untranslated.txt' in the current project's root
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Untranslated to File'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def folder = prop.projectRoot
    def fileloc = folder+'/untranslated.txt'
    writefile = new File(fileloc)
    
    writefile.write(&quot;&quot;, 'UTF-8')
    def count = 0
    project.projectFiles
    .each {
    //console.println &quot;\n${it.filePath}&quot;
    it.entries
    .findAll {!project.getTranslationInfo(it).isTranslated()}
    .each {count++; writefile.append &quot;${it.srcText}\n&quot;,'UTF-8'}
    }
    
    console.println &quot;\nUntranslated segments found: $count&quot;
    count = 0 
    def lines = writefile.readLines()
    uniqline = lines.unique()
    writefile.write(&quot;&quot;,'UTF-8')
    uniqline.each {
    writefile.append &quot;$it\n&quot;,'UTF8';
    }
    console.println &quot;Unique untranslated segments written to file:  $uniqline.size&quot;
    

If you have ideas how to improve these, feel free to share.


UPDATE:

Here’s another script that writes all source segments to a file


But as of now,
Good luck!

Substitute Template For Each Project

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I have a script that reads a tab-separated file (any number of tabs between items), each line of which contains the patterns to be found in the first position, and what it should be replaced with in the second. This file MUST be named subst_template.txt (well, it can be changed in the script, so maybe such a loud “must” isn’t really needed). The first pair should start on the first line, no empty lines between the pairs, and after the final pair there should be exactly one empty line. Below you’ll find an example of such file.
The file ought to be placed in OmegaT project’s root. That is made intentionally so that one can have a unique set of substitute patterns for each project. For example, I had an English to Ukrainian Christian project where names of the Bible books needed to be translated using one particular Ukrainian Bible version (Khomenko Bible), while for another project they needed to be taken from another version (Ohiyenko Bible). While English abbreviations remained the same, Ukrainian needed to be quite different (for instance, “Jn.” was “Йо.” in one, and “Ів.” in the other). So having a separate substitute pattern file in each projects I could use just one script to get Bible references with proper abbreviations in each of them. Continue reading

Simple QA with OmegaT GUI scripts

Here I want to share several groovy scripts that bring up a window showing a clickable segment # button to skip to the respective segment, and source and target segments that meet certain criteria. I’m not a scripting master, so I share them in hope that they can be helpful to some and improved by many (reverse the pronouns if needed).
Here’s but one screenshot to give you an idea what this is all about.
Window showing a simple QA check

Following is a list of scripts with links (click on titles) to download them, and short descriptions of what each one does.

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

  • Check QA rules script
  • This one is a simple QA check. Originally it was included in OmegaT scripts bundle, but I expanded it to show a window with clickable buttons. Currently checks for leading and trailing spaces, double words, and unproportionally longer or shorter target segments. If anyone knows how to add more rules to check, please share.
    Update: No need to download this script from here, it’s now included in the OmegaT bundle.

  • Show Source = Target
  • This one was originally included in the bundle as well, and I just added the GUI part. It brings up a window that lets you navigate through segments where target is the same as source.

  • Show Untranslated
  • This script brings up a window with all the untranslated segments, so you get only segment # buttons and source segments. Beside that, in the scripting console (lower part of the right side of the Scripting window) you get the count of all and unique untranslated segments. I don’t see a big practical value of this script, but a friend of mine and a fellow OmegaT user felt really blessed to get it. This one originally isn’t mine either, I used Yu Tang’s idea that he shared on OmegaT Yahoo! Group.