Batch Search and Replace and Selective Pretranslation in OmegaT

Update: Most of the post ramains true, but make sure you download these scripts from the SF.net repository.

In this post I want to share three scripts that can do an extended search and replace in OmegaT project. Search and replace templates for each script are specified in external plain text files located in project’s root folder, so these scripts without any modifications can be used for different projects with different sets of search and replace patterns — the user needs to modify only those plain text files as needed. On top of text modification there is a possibility to do a simple math on what is being found by the script thus enabling the user to have a per project unit converter.
Each script should be accompanied by its own external file located in a subfolder named .ini in the project’s root (details under each script further on). The format of these files is the same for all three:


  • Only one empty line in the file — the very last one
  • Each line consists of tree blocks:
    1. Search pattern (regex aware)
    2. Tab
    3. Replace pattern

So, if you need to replace “Владимир Владимирович” (taking into consideration different cases of Russian nouns) with “the President of Russian Federation“, here’s what you need to specify in the substitution file:
Владимир\p{L}?+\sВладимирович\p{L}?+ the President of Russian Federation Continue reading

Advertisements

Stripping Tags Everywhere, Groovy Way

Every once in a while you have to deal with a match that has wrong tags. Hopefully, pretty soon OmegaT will be smart enough to deal with such matches for you, making it possible to insert a wrongly tagged match in such a way that you wouldn’t have to fix tags — they’ll get fixed on their own. But while we are not there yet, a practical workaround is to use the match tag-free and to insert proper tags wherever needed (OmegaT 3 lets you insert them one by one, and a new default shortcut for that is Ctrl+T).
In this post I share 5 groovy scripts to strip tags in different situations (headings link to pastebin.com, files can be downloaded from there):

  • Replacing target with match
    /*
     * #Purpose: Replace current target with tag-free match 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    import org.omegat.core.Core;
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Replace with Match (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def match = Core.getMatcher()
    def near = match.getActiveMatch()
    if (near != null) {
      def matchtranslation = "$near.translation"
      matchtranslation = matchtranslation.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
      editor.replaceEditText(matchtranslation);
    }
    
  • Inserting match
    /*
     * #Purpose: Insert tag-free match into current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    import org.omegat.core.Core;
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Insert Match (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def match = Core.getMatcher()
    def near = match.getActiveMatch()
    if (near != null) {
      def matchtranslation = "$near.translation"
      matchtranslation = matchtranslation.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
      editor.insertText(matchtranslation)
    }
    
  • Replacing target with source
    /*
     * #Purpose: Replace current target with tag-free source 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Replace with Source (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def stext = editor.currentEntry.getSrcText().replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    editor.replaceEditText(stext)
    
  • Inserting source
    /*
     * #Purpose: Insert tag-free source into current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Insert source (no tags)'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def stext = editor.currentEntry.getSrcText().replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    editor.insertText(stext)
    
  • Stripping tags in target
    /*
     * #Purpose: Remove tags in the current target 
     * #Details: http: // wp.me/p3fHEs-4W
     * 
     * @author   Kos Ivantsov
     * @date     2013-06-26
     * @version  0.1
     */
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Strip tags in current segment'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    target = editor.getCurrentTranslation()
    if (target != null) {
    target = target.replaceAll(/<\/?[a-z]+[0-9]* ?\/?>/, '')
    }
    editor.replaceEditText(target)
    

There are plenty of other ways to remove tags in OmegaT, some of them even posted as my recipes, but the beauty of using groovy is that scripts can be run from withing OmegaT, with its own keyboard shortcut, without needing to assign an OS shortcut to an external script/application.
As usual, inspiration for the scripts was an idea shared at OmegaT Yahoo! Group


Good luck!

Substitute Template For Each Project

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I have a script that reads a tab-separated file (any number of tabs between items), each line of which contains the patterns to be found in the first position, and what it should be replaced with in the second. This file MUST be named subst_template.txt (well, it can be changed in the script, so maybe such a loud “must” isn’t really needed). The first pair should start on the first line, no empty lines between the pairs, and after the final pair there should be exactly one empty line. Below you’ll find an example of such file.
The file ought to be placed in OmegaT project’s root. That is made intentionally so that one can have a unique set of substitute patterns for each project. For example, I had an English to Ukrainian Christian project where names of the Bible books needed to be translated using one particular Ukrainian Bible version (Khomenko Bible), while for another project they needed to be taken from another version (Ohiyenko Bible). While English abbreviations remained the same, Ukrainian needed to be quite different (for instance, “Jn.” was “Йо.” in one, and “Ів.” in the other). So having a separate substitute pattern file in each projects I could use just one script to get Bible references with proper abbreviations in each of them. Continue reading