Convert OmegaT project to XLIFF for other CAT tools

I’m back with another little script that might be pretty handy for those who need to work on the same material in different CAT tools, or for translation agencies who use OmegaT as their main CAT application but farm out the work to translators using their CAT tools of choice. As a matter of fact, the script was requested by translation agency Velior for this very reason.
When the script is invoked, it writes out a file named PROJECTNAME.xlf (PROJECTNAME is the actual name of the project, not this loudly yelled word, of course), and the file is located in script_output subfolder of the current project. It exports both translated (they get “final” state in the resultant XLF file) and untranslated segments, and for untranslated segments the source is copied to the target, and such segments get “needs-translation” state. OmegaT segmentation and tags are preserved. Tags get enveloped in <ph id=”x”> and </ph>, so that they are treated as tags in other CAT tools. Continue reading

Advertisements

Batch Search and Replace and Selective Pretranslation in OmegaT

Update: Most of the post ramains true, but make sure you download these scripts from the SF.net repository.

In this post I want to share three scripts that can do an extended search and replace in OmegaT project. Search and replace templates for each script are specified in external plain text files located in project’s root folder, so these scripts without any modifications can be used for different projects with different sets of search and replace patterns — the user needs to modify only those plain text files as needed. On top of text modification there is a possibility to do a simple math on what is being found by the script thus enabling the user to have a per project unit converter.
Each script should be accompanied by its own external file located in a subfolder named .ini in the project’s root (details under each script further on). The format of these files is the same for all three:


  • Only one empty line in the file — the very last one
  • Each line consists of tree blocks:
    1. Search pattern (regex aware)
    2. Tab
    3. Replace pattern

So, if you need to replace “Владимир Владимирович” (taking into consideration different cases of Russian nouns) with “the President of Russian Federation“, here’s what you need to specify in the substitution file:
Владимир\p{L}?+\sВладимирович\p{L}?+ the President of Russian Federation Continue reading

Writing Auxilary Text Files from OmegaT

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Here I’d like to share two Groovy scripts that don’t help with anything at hand in OmegaT, but write out external text files that can often be helpful in producing better quality translation.

The first script writes selected text to a file along with some context information. This can be helpful if you need to produce a list of unknown/unclear term that need to be discussed with the client, or things to be double-checked, studied, rewritten etc.

  • write_selection2list.groovy
    /*
     * #Purpose: Write selection to a file to create a list of terms
     * #Files:   Writes 'terms_list.txt' in the current project's root
     *     the file contains selection text, segment number, segment text
     *     and filename of the selection, if selection is in the current segment,
     *     or just the text of selection and the filename, if selection
     *     is outside the current segment.
     * #Note:    When invoked without selection, it opens the file
     *     in the default text editor
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Selection to List'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    // get segment #, source filename and the whole current segment
    def srcfile = editor.currentFile
    def ste = editor.currentEntry
    cur_text = ste.getSrcText()
    cur_seg = ste.entryNum()
    
    // define list file
    
    def folder = prop.projectRoot
    def fileloc = folder+'/terms_list.txt'
    list_file = new File(fileloc)
    
    // create file if it doesn't exist
    if (! list_file.exists()) {
    	list_file.write(&quot;&quot;,'UTF-8')
    	}
    
    /* 
     * command to open the file if there's no active selection
     * if a custom (not OS default) text editor should be used,
     * it needs to be defined in the next line (edit as needed and uncomment)
     */
    
    // def textEditor = /path to your editor/
    def command
    switch (osType) {
      case [OsType.WIN64, OsType.WIN32]:
        command = &quot;cmd /c start \&quot;\&quot; \&quot;$list_file\&quot;&quot;  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : &quot;\&quot;$textEditor\&quot; \&quot;$list_file\&quot;&quot; } catch (ignore) {}
        break
      case [OsType.MAC64, OsType.MAC32]:
        command = ['open', list_file]  // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : ['open', '-a', textEditor, list_file] } catch (ignore) {}
        break
      default:  // for Linux or others
        command = ['xdg-open', list_file] // default
        try { command = textEditor instanceof List ? [*textEditor, list_file] : [textEditor, list_file] } catch (ignore) {}
        break
    }
    
    def sel_txt = editor.selectedText
    if (sel_txt) {
    	list_file.append &quot;${'='*10}\n $sel_txt\n&quot;,'UTF-8'
    	if (cur_text =~ sel_txt) {
    		list_file.append &quot;${'-'*5}\n\
    filename: $srcfile\n\
    segment: $cur_seg\n\
    segment text: $cur_text \n\n&quot;,'UTF-8'
    	}else{
    		list_file.append &quot;${'-'*5}\n\
    filename: $srcfile\n\
    ***Selection outside of current segment***\n&quot;,'UTF-8'
    	}
    	console.println &quot;\&quot;$sel_txt\&quot; written to $list_file&quot;	
    } else {
    console.println &quot;[No selection]&quot;
    console.println &quot;***Opening the file in text editor***&quot;
    console.println &quot;Command: $command&quot;
    command.execute()
    return // exit
    }
    

    The list is created in the current OmegaT project folder, file is named terms_list.txt. When the script is invoked with no selection, this file is opened in the default text editor — so that you can easily view or edit the file. When it’s invoked with some text selected in the Editor pane, the selection gets written to the file along with some context info depending on whether selection is inside or outside of the current segment.
    I’d like to write wider context, but I don’t know how to get text from previous and next segment without actually going there. Any help is welcome and appreciated, as usual.

The second script writes unique untranslated segments from the complete project into a text file named untranslated.txt. This files is located in the project’s root folder, and is rewritten each time the script is invoked. Such file can be used for a number of purposes, including producing TMX with MT.

  • write_untranslated2file.groovy
    /*
     * #Purpose: Write all unique untranslated segments to a file
     * #Files:   Writes 'untranslated.txt' in the current project's root
     * #Details: http : / / wp.me/p3fHEs-4L
     *
     * @author   Kos Ivantsov
     * @based on scripts by Yu Tang
     * @date     2013-06-25
     * @version  0.2
     */
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
      final def title = 'Untranslated to File'
      final def msg   = 'Please try again after you open a project.'
      showMessageDialog null, msg, title, INFORMATION_MESSAGE
      return
    }
    
    def folder = prop.projectRoot
    def fileloc = folder+'/untranslated.txt'
    writefile = new File(fileloc)
    
    writefile.write(&quot;&quot;, 'UTF-8')
    def count = 0
    project.projectFiles
    .each {
    //console.println &quot;\n${it.filePath}&quot;
    it.entries
    .findAll {!project.getTranslationInfo(it).isTranslated()}
    .each {count++; writefile.append &quot;${it.srcText}\n&quot;,'UTF-8'}
    }
    
    console.println &quot;\nUntranslated segments found: $count&quot;
    count = 0 
    def lines = writefile.readLines()
    uniqline = lines.unique()
    writefile.write(&quot;&quot;,'UTF-8')
    uniqline.each {
    writefile.append &quot;$it\n&quot;,'UTF8';
    }
    console.println &quot;Unique untranslated segments written to file:  $uniqline.size&quot;
    

If you have ideas how to improve these, feel free to share.


UPDATE:

Here’s another script that writes all source segments to a file


But as of now,
Good luck!