This post is about a script that exports OmegaT project to an XLS document with a separate worksheet for each source file. Continue reading
Tag Archives: Project Files
“Filtered” Note Export in #OmegaT
This script is variation of the one published before that exports all notes in the current project. The only difference is that this one allows you to select which notes will get exported based on the first line of the note. The resultant HTML table will consist of four columns: Source, Target, Filtered Notes (adjustable heading name), and Reply.
Say, you want to be able to export only the notes that start with <query>
, as you’ve been using this word (<query>
) to mark your questions to the client. In order to do so, go to line 14 and specify which mark-word was used. Note: The mark-word used to filter notes should be found in the very beginning of the very first line of the note, otherwise it’ll be ignored. In line 15 you can specify the column heading.
Export OmegaT Project to HTML table
Here’s a script that lets you export your whole OmegaT project into an HTML file with one or more tables, one for each source file. The left column will have source segments, and the right will be either blank if the segment isn’t translated, or populated with translation (or , if translation was set to be empty). Each table will have source file name for its heading. The script was requested and kindly sponsored by Roman Mironov at Translation Agency Velior. As usual, in the below listing the heading is a link to pastebin.com where you can download this script. Continue reading
Export TMX for selected files
OmegaT exports TMX for current files in the project every time translated documents are created. It writes three TMX files in the root of the project. But what if you need a translation memory file that contains translation units of only one or several files, not all that are currently present in the project. One solution is to temporarily move unneeded files out of source folder, reload the project and then create translated documents. But it is somewhat awkward and time consuming.
Here’s a groovy script that lets you select one or several files located in the same subfolder of the current project’s /source. Once they are selected, the script writes selected_files.[date_time].tmx in the project root. This TMX-file contains TU’s only for the selected files.
- write_sel_files2TMX.groovy
/* * Purpose: Export source and translation segments of user selected * files into TMX-file * #Files: Writes 'selected_files_<date_time>.tmx' in the current project's root * #File format: TMX v.1.4 * #Details: http:/ /wp.me / p3fHEs-6g * * @author Kos Ivantsov * @date 2013-08-12 * @version 0.3 */ import javax.swing.JFileChooser import org.omegat.util.StaticUtils import org.omegat.util.TMXReader import static javax.swing.JOptionPane.* import static org.omegat.util.Platform.* def prop = project.projectProperties if (!prop) { final def title = 'Export TMX from selected files' final def msg = 'Please try again after you open a project.' showMessageDialog null, msg, title, INFORMATION_MESSAGE return } def curtime = new Date().format("MMM-dd-yyyy_HH.mm") srcroot = new File(prop.getSourceRoot()) def fileloc = prop.projectRoot+'selected_files_'+curtime+'.tmx' exportfile = new File(fileloc) def sourceroot = prop.getSourceRoot().toString() as String JFileChooser fc = new JFileChooser( currentDirectory: srcroot, dialogTitle: "Choose files to export", fileSelectionMode: JFileChooser.FILES_ONLY, //the file filter must show also directories, in order to be able to look into them multiSelectionEnabled: true) if(fc.showOpenDialog() != JFileChooser.APPROVE_OPTION) { console.println "Canceled" return } if (!(fc.selectedFiles =~ sourceroot.replaceAll(/\\+/, '\\\\\\\\'))) { console.println "Selection outside of ${prop.getSourceRoot()} folder" final def title = 'Wrong file(s) selected' final def msg = "Files must be in ${prop.getSourceRoot()} folder." showMessageDialog null, msg, title, INFORMATION_MESSAGE return } if (prop.isSentenceSegmentingEnabled()) segmenting = TMXReader.SEG_SENTENCE else segmenting = TMXReader.SEG_PARAGRAPH def sourceLocale = prop.getSourceLanguage().toString() def targetLocale = prop.getTargetLanguage().toString() exportfile.write("", 'UTF-8') exportfile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", 'UTF-8') exportfile.append("<!DOCTYPE tmx SYSTEM \"tmx11.dtd\">\n", 'UTF-8') exportfile.append("<tmx version=\"1.4\">\n", 'UTF-8') exportfile.append(" <header\n", 'UTF-8') exportfile.append(" creationtool=\"OmegaTScripting\"\n", 'UTF-8') exportfile.append(" segtype=\"" + segmenting + "\"\n", 'UTF-8') exportfile.append(" o-tmf=\"OmegaT TMX\"\n", 'UTF-8') exportfile.append(" adminlang=\"EN-US\"\n", 'UTF-8') exportfile.append(" srclang=\"" + sourceLocale + "\"\n", 'UTF-8') exportfile.append(" datatype=\"plaintext\"\n", 'UTF-8') exportfile.append(" >\n", 'UTF-8') fc.selectedFiles.each{ fl = "${it.toString()}" - "$sourceroot" exportfile.append(" <prop type=\"Filename\">" + fl + "</prop>\n", 'UTF-8') } exportfile.append(" </header>\n", 'UTF-8') exportfile.append(" <body>\n", 'UTF-8') def count = 0 fc.selectedFiles.each{ fl = "${it.toString()}" - "$sourceroot" files = project.projectFiles files.each{ if ( "${it.filePath}" != "$fl" ) { println "Skipping to the next file" }else{ it.entries.each { def info = project.getTranslationInfo(it) def changeId = info.changer def changeDate = info.changeDate def creationId = info.creator def creationDate = info.creationDate def alt = 'unknown' if (info.isTranslated()) { source = StaticUtils.makeValidXML(it.srcText) target = StaticUtils.makeValidXML(info.translation) exportfile.append(" <tu>\n", 'UTF-8') exportfile.append(" <tuv xml:lang=\"" + sourceLocale + "\">\n", 'UTF-8') exportfile.append(" <seg>" + "$source" + "</seg>\n", 'UTF-8') exportfile.append(" </tuv>\n", 'UTF-8') exportfile.append(" <tuv xml:lang=\"" + targetLocale + "\"", 'UTF-8') exportfile.append(" changeid=\"${changeId ?: alt }\"", 'UTF-8') exportfile.append(" changedate=\"${ changeDate > 0 ? new Date(changeDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8') exportfile.append(" creationid=\"${creationId ?: alt }\"", 'UTF-8') exportfile.append(" creationdate=\"${ creationDate > 0 ? new Date(creationDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8') exportfile.append(">\n", 'UTF-8') exportfile.append(" <seg>" + "$target" + "</seg>\n", 'UTF-8') exportfile.append(" </tuv>\n", 'UTF-8') exportfile.append(" </tu>\n", 'UTF-8') count++; } } } } } exportfile.append(" </body>\n", 'UTF-8') exportfile.append("</tmx>", 'UTF-8') final def title = 'TMX file written' final def msg = "$count TU's written to " + exportfile.toString() console.println msg showMessageDialog null, msg, title, INFORMATION_MESSAGE return
The TMX file is rewritten each time the script in invoked.
Big thank you goes to Roman Mironov and Velior Translation Agency for the idea and comprehensive support.
Suggestions and comments are always welcome.
But as of now,
Good luck!
Write all source segments to a file
Update: Please, download scripts from the dedicated SF.net project page where they are maintained. Scripts at the links below might be obsolete (though most likely still working).
Since we started to make OmegaT write stuff to files, let’s try to dump all source segments to one file. I’m pretty sure one can find some use for it.
/* * #Purpose: Write all source segments to a file * #Files: Writes 'allsource.txt' in the current project's root * * @author: Kos Ivantsov * @date: 2013-07-16 * @version: 0.2 */ /* change "includefilenames" to anything but 'yes' (with quotes) * if you don't need filenames to be included in the file */ def includefilenames = 'no' def includerepetitions = 'no' import static javax.swing.JOptionPane.* import static org.omegat.util.Platform.* // abort if a project is not opened yet def prop = project.projectProperties if (!prop) { final def title = 'Source to File' final def msg = 'Please try again after you open a project.' showMessageDialog null, msg, title, INFORMATION_MESSAGE return } def folder = prop.projectRoot+'/script_output' def fileloc = folder+'/allsource.txt' writefile = new File(fileloc) if (! (new File(folder)).exists()) { (new File(folder)).mkdir() } writefile.write("", 'UTF-8') def count = 0 def uniqline if (includefilenames == 'yes') { files = project.projectFiles; for (i in 0 ..< files.size()) { fi = files[i]; marker = "+${'='*fi.filePath.size()}+\n" writefile.append("$marker|$fi.filePath|\n$marker", 'UTF-8') for (j in 0 ..< fi.entries.size()) { ste = fi.entries[j]; source = ste.getSrcText(); writefile.append source +"\n", 'UTF-8' count++; } } } else { project.allEntries.each { ste -> source = ste.getSrcText(); writefile.append source+"\n",'UTF-8' count++ } console.println "$count segments found in all files" if (includerepetitions != 'yes') { count = 0 uniqline = writefile.readLines().unique() //console.println uniqline writefile.write("",'UTF-8') uniqline.each { writefile.append "$it\n\n",'UTF8'; count++ } } } console.println count +" segments written to "+ writefile final def title = 'Source to File' final def msg = count +" segments"+"\n"+"written to \n"+ writefile showMessageDialog null, msg, title, INFORMATION_MESSAGE return
Once the script is invoked, it’ll create a file named “allsource.txt” in the current project’s root folder, where each segment will be on a new line. It’ll contain all the segments, even the ones that are already translated, and all the repetitions. The script can either just dump all segments into the file, or write out a filename in a box like this
+====+
|file|
+====+
followed by all the segments that belong to this file, and then a new filename and respective segment, and so on, or just dump all the segments in the order they appear in OmegaT without indicating what files they belong to. This behavior can be triggered by changing line 13. When it says def includefilenames = 'yes'
, you’ll get filenames written to the allsource.txt, but if you don’t want the filenames, change ‘yes’ to anything else or even leave it empty, making sure you have quotes, i.e. it can say def includefilenames = 'no, thanks'
or even def includefilenames = ''
, but not def includefilenames = no
(no quotes in the last example).
The way the filenames get marked is defined in lines 44, 45.
If filenames are not included, one can choose whether to include repetitions (line 14). 'yes'
means “yes”, anything else, even 'yep'
, means “no”.
Suggestions, enhancements, bug reports, donations, postcards, invitations to a cup of coffee, feature requests, interesting translation projects with a good pay etc. are always welcome. Criticism isn’t, but will be accepted too.
But as of now,
Good luck
File Renamer (Bash from withing OmegaT)
Situation
You have a client who loves to give his files very descriptive names. That’s understandable as mostly the files you get to translate from him are lessons, lectures, howtos, manuals and so on. It makes sense to distribute them digitally with localized filenames.
Problem
What you need is a way to translate filenames in OmegaT thus keeping consistency with the contents of the translated files and past/future files from the same client.
Continue reading