Export #OmegaT Project to Excel

This post is about a script that exports OmegaT project to an XLS document with a separate worksheet for each source file. Continue reading

Advertisements

“Filtered” Note Export in #OmegaT

This script is variation of the one published before that exports all notes in the current project. The only difference is that this one allows you to select which notes will get exported based on the first line of the note. The resultant HTML table will consist of four columns: Source, Target, Filtered Notes (adjustable heading name), and Reply.
Say, you want to be able to export only the notes that start with <query>, as you’ve been using this word (<query>) to mark your questions to the client. In order to do so, go to line 14 and specify which mark-word was used. Note: The mark-word used to filter notes should be found in the very beginning of the very first line of the note, otherwise it’ll be ignored. In line 15 you can specify the column heading.

All project notes

All project notes

Only filtered notes

Only filtered notes

Continue reading

Export OmegaT Project to HTML table

Here’s a script that lets you export your whole OmegaT project into an HTML file with one or more tables, one for each source file. The left column will have source segments, and the right will be either blank if the segment isn’t translated, or populated with translation (or , if translation was set to be empty). Each table will have source file name for its heading. The script was requested and kindly sponsored by Roman Mironov at Translation Agency Velior. As usual, in the below listing the heading is a link to pastebin.com where you can download this script. Continue reading

Export TMX for selected files

OmegaT exports TMX for current files in the project every time translated documents are created. It writes three TMX files in the root of the project. But what if you need a translation memory file that contains translation units of only one or several files, not all that are currently present in the project. One solution is to temporarily move unneeded files out of source folder, reload the project and then create translated documents. But it is somewhat awkward and time consuming.
Here’s a groovy script that lets you select one or several files located in the same subfolder of the current project’s /source. Once they are selected, the script writes selected_files.[date_time].tmx in the project root. This TMX-file contains TU’s only for the selected files.

  • write_sel_files2TMX.groovy
    /*
     * Purpose:	Export source and translation segments of user selected 
     *	files into TMX-file
     * #Files:	Writes 'selected_files_<date_time>.tmx' in the current project's root
     * #File format:	TMX v.1.4
     * #Details:	http:/ /wp.me / p3fHEs-6g
     *
     * @author  Kos Ivantsov
     * @date    2013-08-12
     * @version 0.3
     */
    
    import javax.swing.JFileChooser
    import org.omegat.util.StaticUtils
    import org.omegat.util.TMXReader
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    def prop = project.projectProperties
    if (!prop) {
    	final def title = 'Export TMX from selected files'
    	final def msg   = 'Please try again after you open a project.'
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	return
    }
    
    def curtime = new Date().format("MMM-dd-yyyy_HH.mm")
    srcroot = new File(prop.getSourceRoot())
    def fileloc = prop.projectRoot+'selected_files_'+curtime+'.tmx'
    exportfile = new File(fileloc)
    def sourceroot = prop.getSourceRoot().toString() as String
    
    JFileChooser fc = new JFileChooser(
    	currentDirectory: srcroot,
    	dialogTitle: "Choose files to export",
    	fileSelectionMode: JFileChooser.FILES_ONLY, 
    	//the file filter must show also directories, in order to be able to look into them
    	multiSelectionEnabled: true)
    
    if(fc.showOpenDialog() != JFileChooser.APPROVE_OPTION) {
    console.println "Canceled"
    return
    }
    
    if (!(fc.selectedFiles =~ sourceroot.replaceAll(/\\+/, '\\\\\\\\'))) {
    		console.println "Selection outside of ${prop.getSourceRoot()} folder"
    		final def title = 'Wrong file(s) selected'
    		final def msg   = "Files must be in ${prop.getSourceRoot()} folder."
    		showMessageDialog null, msg, title, INFORMATION_MESSAGE
    		return
    	}
    
    if (prop.isSentenceSegmentingEnabled())
    	segmenting = TMXReader.SEG_SENTENCE
    	else
    	segmenting = TMXReader.SEG_PARAGRAPH
    
    def sourceLocale = prop.getSourceLanguage().toString()
    def targetLocale = prop.getTargetLanguage().toString()
    
    exportfile.write("", 'UTF-8')
    exportfile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", 'UTF-8')
    exportfile.append("<!DOCTYPE tmx SYSTEM \"tmx11.dtd\">\n", 'UTF-8')
    exportfile.append("<tmx version=\"1.4\">\n", 'UTF-8')
    exportfile.append(" <header\n", 'UTF-8')
    exportfile.append("  creationtool=\"OmegaTScripting\"\n", 'UTF-8')
    exportfile.append("  segtype=\"" + segmenting + "\"\n", 'UTF-8')
    exportfile.append("  o-tmf=\"OmegaT TMX\"\n", 'UTF-8')
    exportfile.append("  adminlang=\"EN-US\"\n", 'UTF-8')
    exportfile.append("  srclang=\"" + sourceLocale + "\"\n", 'UTF-8')
    exportfile.append("  datatype=\"plaintext\"\n", 'UTF-8')
    exportfile.append(" >\n", 'UTF-8')
    fc.selectedFiles.each{
    	fl = "${it.toString()}" - "$sourceroot"
    	exportfile.append("  <prop type=\"Filename\">" + fl + "</prop>\n", 'UTF-8')
    }
    exportfile.append(" </header>\n", 'UTF-8')
    exportfile.append("  <body>\n", 'UTF-8')
    
    def count = 0
    fc.selectedFiles.each{
    	fl = "${it.toString()}" - "$sourceroot" 
    	files = project.projectFiles
    	files.each{
    		if ( "${it.filePath}" != "$fl" ) {
    		println "Skipping to the next file"
    		}else{
    	it.entries.each {
    	def info = project.getTranslationInfo(it)
    	def changeId = info.changer
    	def changeDate = info.changeDate
    	def creationId = info.creator
    	def creationDate = info.creationDate
    	def alt = 'unknown'
    	if (info.isTranslated()) {
    		source = StaticUtils.makeValidXML(it.srcText)
    		target = StaticUtils.makeValidXML(info.translation)
    		exportfile.append("    <tu>\n", 'UTF-8')
    		exportfile.append("      <tuv xml:lang=\"" + sourceLocale + "\">\n", 'UTF-8')
    		exportfile.append("        <seg>" + "$source" + "</seg>\n", 'UTF-8')
    		exportfile.append("      </tuv>\n", 'UTF-8')
    		exportfile.append("      <tuv xml:lang=\"" + targetLocale + "\"", 'UTF-8')
    		exportfile.append(" changeid=\"${changeId ?: alt }\"", 'UTF-8')
    		exportfile.append(" changedate=\"${ changeDate > 0 ? new Date(changeDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    		exportfile.append(" creationid=\"${creationId ?: alt }\"", 'UTF-8')
    		exportfile.append(" creationdate=\"${ creationDate > 0 ? new Date(creationDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    		exportfile.append(">\n", 'UTF-8')
    		exportfile.append("        <seg>" + "$target" + "</seg>\n", 'UTF-8')
    		exportfile.append("      </tuv>\n", 'UTF-8')
    		exportfile.append("    </tu>\n", 'UTF-8')
    		count++;
    				}
    			}
    		}
    	}
    }
    exportfile.append("  </body>\n", 'UTF-8')
    exportfile.append("</tmx>", 'UTF-8')
    
    final def title = 'TMX file written'
    final def msg   = "$count TU's written to " + exportfile.toString()
    console.println msg
    showMessageDialog null, msg, title, INFORMATION_MESSAGE
    return
    

    The TMX file is rewritten each time the script in invoked.

Big thank you goes to Roman Mironov and Velior Translation Agency for the idea and comprehensive support.
Suggestions and comments are always welcome.
But as of now,


wordpress visitor

Good luck!

Write all source segments to a file

Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).

Since we started to make OmegaT write stuff to files, let’s try to dump all source segments to one file. I’m pretty sure one can find some use for it.

  • write_source2file.groovy
    /*
     * #Purpose:	Write all source segments to a file
     * #Files:	Writes 'allsource.txt' in the current project's root
     * 
     * @author:	Kos Ivantsov
     * @date:	2013-07-16
     * @version:	0.2
     */
    
    /* change &quot;includefilenames&quot; to anything but 'yes' (with quotes)
     * if you don't need filenames to be included in the file */
    
    def includefilenames = 'no'
    def includerepetitions = 'no'
    
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    // abort if a project is not opened yet
    def prop = project.projectProperties
    if (!prop) {
    	final def title = 'Source to File'
    	final def msg   = 'Please try again after you open a project.'
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	return
    }
    
    def folder = prop.projectRoot+'/script_output'
    def fileloc = folder+'/allsource.txt'
    writefile = new File(fileloc)
    if (! (new File(folder)).exists()) {
    	(new File(folder)).mkdir()
    	}
    
    writefile.write(&quot;&quot;, 'UTF-8')
    def count = 0
    def uniqline
    
    if (includefilenames == 'yes') {
    	files = project.projectFiles;
    	for (i in 0 ..&lt; files.size())
    	{
    		fi = files[i];
    		marker = &quot;+${'='*fi.filePath.size()}+\n&quot;
    		writefile.append(&quot;$marker|$fi.filePath|\n$marker&quot;, 'UTF-8')
    		for (j in 0 ..&lt; fi.entries.size())
    		{
    		ste = fi.entries[j];
    		source = ste.getSrcText();
    		writefile.append source +&quot;\n&quot;, 'UTF-8'
    		count++;
    		}
    	}
    } else {
    	project.allEntries.each { ste -&gt;
    	source = ste.getSrcText();
    	writefile.append source+&quot;\n&quot;,'UTF-8'
    	count++
    		}
    	console.println &quot;$count segments found in all files&quot;
    	if (includerepetitions != 'yes') {
    		count = 0
    		uniqline = writefile.readLines().unique()
    		//console.println uniqline
    		writefile.write(&quot;&quot;,'UTF-8')
    		uniqline.each {
    		writefile.append &quot;$it\n\n&quot;,'UTF8';
    		count++
    				}
    			}
    	}
    
    console.println count +&quot; segments written to &quot;+ writefile
    final def title = 'Source to File'
    final def msg   = count +&quot; segments&quot;+&quot;\n&quot;+&quot;written to \n&quot;+ writefile
    showMessageDialog null, msg, title, INFORMATION_MESSAGE
    return
    

    Once the script is invoked, it’ll create a file named “allsource.txt” in the current project’s root folder, where each segment will be on a new line. It’ll contain all the segments, even the ones that are already translated, and all the repetitions. The script can either just dump all segments into the file, or write out a filename in a box like this
    +====+
    |file|
    +====+

    followed by all the segments that belong to this file, and then a new filename and respective segment, and so on, or just dump all the segments in the order they appear in OmegaT without indicating what files they belong to. This behavior can be triggered by changing line 13. When it says def includefilenames = 'yes', you’ll get filenames written to the allsource.txt, but if you don’t want the filenames, change ‘yes’ to anything else or even leave it empty, making sure you have quotes, i.e. it can say def includefilenames = 'no, thanks' or even def includefilenames = '', but not def includefilenames = no (no quotes in the last example).
    The way the filenames get marked is defined in lines 44, 45.
    If filenames are not included, one can choose whether to include repetitions (line 14). 'yes' means “yes”, anything else, even 'yep', means “no”.

Suggestions, enhancements, bug reports, donations, postcards, invitations to a cup of coffee, feature requests, interesting translation projects with a good pay etc. are always welcome. Criticism isn’t, but will be accepted too.


wordpress visitor

But as of now,
Good luck

File Renamer (Bash from withing OmegaT)

Situation

You have a client who loves to give his files very descriptive names. That’s understandable as mostly the files you get to translate from him are lessons, lectures, howtos, manuals and so on. It makes sense to distribute them digitally with localized filenames.

Problem

What you need is a way to translate filenames in OmegaT thus keeping consistency with the contents of the translated files and past/future files from the same client.
Continue reading