Voice Input in Translation Work (#Linux + Chrome + #OmegaT), Take 1

I always was rather skeptical about using dictate software in my translation work. But recently I read a success story where a person started to use Dragon Naturally Speaking, and it boosted his productivity by ungodly high percentage. Though it didn’t shake the deep skepticism of a die-hard Linux fanatic whose main target language isn’t supported by the major dictate software vendors, it doesn’t hurt to fool around and try a few things, does it?

As it turns out, one can save quite a few keystrokes by speaking into the cloud, and it can even be used on Linux in OmegaT. Google’s speech recognition supports my target language, several Chromium/Chrome browser’s apps and extensions kindly try to make written words out of my utterances, and then it’s up to me how I put it all together to be able to dictate instead of typing.

My working recipe is based on using SpeechPad – new voice notebook for voice input. This little thing can be installed as a Chrome app and can work in background, putting the recognized pieces into the clipboard. To enable that, one needs to put ticks in ” Restart on errors” and ” Transfer to clipboard”. It’s best to register with this application to be able to add new languages not listed by default (limited to what Google supports), add terms to the custom replacement list (to enable punctuation by voice for some languages, for instance), and do other things. It’s all done in the user’s profile (called “User data” on the main page). When the SpeechPad is fired up and listening in the background, you can switch to the app where you need to type (OmegaT in my case), dictate a logical chunk and press Ctrl+V. Some of the repeated mistakes in the text can be fixed with replace_with_template.groovy (see here for details on how to use the script). Or pasting and fixing can be done with one OmegaT script insert_modify_clipboard.groovy (the above link with details still applies, but substitution template should be named .ini/clipboard_substitution.ini).

I’ve noticed that in Ukrainian the speech gets recognized much better when I chant it (and that’s where my passion for the byzantine rite liturgical chanting comes real handy, although one of my buddies said that Rammstein style singing provides similar results). With all of it I did manage to get a productivity boost (and unplanned chanting practice). I’d be happy to hear suggestions on how to improve this recipe or change the ingredients to be able to type less and produce more.


But as of now,
Good luck

Customizing #OmegaT Kaptain Launcher (GNU/Linux)

OmegaT for GNU/Linux comes with a nifty launcher that gives you a comprehensive GUI to most of the startup parameters without needing to do anything on the command line. Along with that I’m not sure that many Linux user use this script. I think, one of the reasons for that is that the script doesn’t save your choices and you have to enter them at each run, which isn’t too bad if OmegaT was installed by the provided installation script (a rare case, as far as I can tell). And then the launcher is written in a somewhat obscure scripting language that requires some familiarization if the defaults are to be edited. In this article I’ll show which parts of the code correspond to the respective GUI elements and what can be edited to make this script customized. Continue reading

Live OmegaT Statistics and Wages Calculator (Linux)

This post is to announce Dimitry Prihodko’s nice little program that shows live OmegaT statistics and calculates wages based on it. All of that can be done in a spreadsheet, of course, but Dimitry’s solution is faster both in that it doesn’t require any additional preparation of a spreadsheet and copying data from OmegaT’s project_stats.txt, and in the way in constantly updates data without any intervention on a user’s part.
OTStats window showing a project statistics Continue reading

lame GUI update to the new TMX export

This is an update to the previous post about exporting new translations to a TMX.
The script doesn’t have a GUI to select date and time or to specify whether it should work globally or on selected files. This update still doesn’t have that GUI, but provides for an external program to fill that gap. In this post I’m sharing the updated groovy script and a simple bash+zenity wizard-like script for Linux that acquires necessary data.
If no such external program/script exists, the groovy script continues to work as before without any extra fuss.

  • write_new_trans2TMX_extGUI.groovy

    /*
     * Purpose:	 Export new translations completed after the specified 
     * 	 date (line 21) either for the entire project or for the
     * 	 selected files ("select_files" must be set to 'yes' — line 27)
     * 	 to TMX file
     * #Files:	 Writes 'translated_after_<date_time>.tmx'
     * 	 in the current project's root
     * #File format:	 TMX v.1.4
     * #Details:	http:/ /wp.me / p3fHEs-6z
     *
     * @author  Kos Ivantsov
     * @date    2013-08-12
     * @version 0.3
     */
    
    /*
     * The date should be specified as "year-month-day HOURS:minutes"
     * If not specified or specified wrongly, the script will look for
     * translations that are newer than one day. 
     */ 
    def newdate = ''
    /*
     * Set "select_files" to 'yes' if you want to use file selector
     * to specify files for export. If anything else is specified, the script
     * will work with the complete project.
     */ 
    select_files = ''
    
    import javax.swing.JFileChooser
    import org.omegat.util.StaticUtils
    import org.omegat.util.TMXReader
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    def prop = project.projectProperties
    
    if (!prop) {
    	final def title = 'Export new translation'
    	final def msg   = 'Please try again after you open a project.'
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	return
    }
    
    /*
     * If you want to use an external date and time selector and a window to
     * ask whether you want to select individual files, specify the whole path
     * to that program/script. It should print out date in the proper format
     * on the first line of the stout, and "yes" or anything else on the second
     * line. 
     */
    def command = "/home/user/.omegat/script/new2tmx_tweak"
    try {
    	proc = command.execute()
    	proc.waitFor()
    	//console.println "${proc.in.text}"
    	def lines = "${proc.in.text}".readLines()
    	newdate = lines[0]
    	select_files = lines[1]
    	}
    catch(java.io.IOException ex){
    if (ex.getMessage() =~ 'error=13'){
    	console.println "The program is not executable"
    	}
    if (ex.getMessage() =~ 'error=2'){
    	console.println "The program is not found"
    	}
    }
    
    try {
    	newdate = new Date().parse("yyyy-MM-dd HH:mm", newdate)
    	}
    	catch (java.text.ParseException e) {
    		newdate = new Date().minus(1)
    		final def title = 'Wrong date format'
    		final def msg   = """\
    The date has been specified in a wrong format.
    The script will work with entries exactly one day old,
    i.e. changed after $newdate\
    """
    		console.println msg
    		showMessageDialog null, msg, title, INFORMATION_MESSAGE
    		}
    
    namedate = new Date().parse("E MMM dd H:m:s z yyyy", newdate.toString()).format("MMM-dd-yyyy_HH.mm")
    
    def fileloc = prop.projectRoot+'translated_after_'+namedate+"${ (select_files == 'yes') ? "_select" : ''}"+'.tmx'
    exportfile = new File(fileloc)
    
    if (prop.isSentenceSegmentingEnabled())
    	segmenting = TMXReader.SEG_SENTENCE
    	else
    	segmenting = TMXReader.SEG_PARAGRAPH
    
    def sourceLocale = prop.getSourceLanguage().toString()
    def targetLocale = prop.getTargetLanguage().toString()
    
    exportfile.write("", 'UTF-8')
    exportfile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", 'UTF-8')
    exportfile.append("<!DOCTYPE tmx SYSTEM \"tmx11.dtd\">\n", 'UTF-8')
    exportfile.append("<tmx version=\"1.4\">\n", 'UTF-8')
    exportfile.append(" <header\n", 'UTF-8')
    exportfile.append("  creationtool=\"OmegaTScripting\"\n", 'UTF-8')
    exportfile.append("  segtype=\"" + segmenting + "\"\n", 'UTF-8')
    exportfile.append("  o-tmf=\"OmegaT TMX\"\n", 'UTF-8')
    exportfile.append("  adminlang=\"EN-US\"\n", 'UTF-8')
    exportfile.append("  srclang=\"" + sourceLocale + "\"\n", 'UTF-8')
    exportfile.append("  datatype=\"plaintext\"\n", 'UTF-8')
    exportfile.append(" >\n", 'UTF-8')
    
    def hitcount = 0
    
    if ((select_files == 'yes')) {
    	srcroot = new File(prop.getSourceRoot())
    	sourceroot = prop.getSourceRoot().toString() as String
    	JFileChooser fc = new JFileChooser(
    	currentDirectory: srcroot,
    	dialogTitle: "Choose files to export",
    	fileSelectionMode: JFileChooser.FILES_ONLY, 
    	//the file filter must show also directories, in order to be able to look into them
    	multiSelectionEnabled: true)
    
    	if(fc.showOpenDialog() != JFileChooser.APPROVE_OPTION) {
    	console.println "Canceled"
    	return
    	}
    
    	if (!(fc.selectedFiles =~ sourceroot.replaceAll(/\\+/, '\\\\\\\\'))) {
    		console.println "Selection outside of ${prop.getSourceRoot()} folder"
    		final def title = 'Wrong file(s) selected'
    		final def msg   = "Files must be in ${prop.getSourceRoot()} folder."
    		showMessageDialog null, msg, title, INFORMATION_MESSAGE
    		return
    	}
    
    	fc.selectedFiles.each {
    		fl = "${it.toString()}" - "$sourceroot"
    		exportfile.append("  <prop type=\"Filename\">" + fl + "</prop>\n", 'UTF-8')
    	}
    	exportfile.append(" </header>\n", 'UTF-8')
    	exportfile.append("  <body>\n", 'UTF-8')
    
    	fc.selectedFiles.each{
    		fl = "${it.toString()}" - "$sourceroot" 
    		files = project.projectFiles
    		files.each{
    			if ( "${it.filePath}" != "$fl" ) {
    			println "Skipping to the next file"
    			}else{
    		it.entries.each {
    		def info = project.getTranslationInfo(it)
    		def changeId = info.changer
    		def changeDate = info.changeDate
    		def creationId = info.creator
    		def creationDate = info.creationDate
    		def alt = 'unknown'
    		if (info.isTranslated()) {
    			if (newdate.before(new Date(changeDate))){
    				hitcount++
    				source = StaticUtils.makeValidXML(it.srcText)
    				target = StaticUtils.makeValidXML(info.translation)
    				exportfile.append("    <tu>\n", 'UTF-8')
    				exportfile.append("      <tuv xml:lang=\"" + sourceLocale + "\">\n", 'UTF-8')
    				exportfile.append("        <seg>" + "$source" + "</seg>\n", 'UTF-8')
    				exportfile.append("      </tuv>\n", 'UTF-8')
    				exportfile.append("      <tuv xml:lang=\"" + targetLocale + "\"", 'UTF-8')
    				exportfile.append(" changeid=\"${changeId ?: alt }\"", 'UTF-8')
    				exportfile.append(" changedate=\"${ changeDate > 0 ? new Date(changeDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    				exportfile.append(" creationid=\"${creationId ?: alt }\"", 'UTF-8')
    				exportfile.append(" creationdate=\"${ creationDate > 0 ? new Date(creationDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    				exportfile.append(">\n", 'UTF-8')
    				exportfile.append("        <seg>" + "$target" + "</seg>\n", 'UTF-8')
    				exportfile.append("      </tuv>\n", 'UTF-8')
    				exportfile.append("    </tu>\n", 'UTF-8')
    						}
    					}
    				}
    			}
    		}
    	}
    } else {
    	exportfile.append(" </header>\n", 'UTF-8')
    	exportfile.append("  <body>\n", 'UTF-8')
    	files = project.projectFiles
    		files.each {
    			it.entries.each {
    			def info = project.getTranslationInfo(it)
    			def changeId = info.changer
    			def changeDate = info.changeDate
    			def creationId = info.creator
    			def creationDate = info.creationDate
    			def alt = 'unknown'
    			if (info.isTranslated()) {
    				if (newdate.before(new Date(changeDate))){
    				hitcount++
    				source = StaticUtils.makeValidXML(it.srcText)
    				target = StaticUtils.makeValidXML(info.translation)
    				exportfile.append("    <tu>\n", 'UTF-8')
    				exportfile.append("      <tuv xml:lang=\"" + sourceLocale + "\">\n", 'UTF-8')
    				exportfile.append("        <seg>" + "$source" + "</seg>\n", 'UTF-8')
    				exportfile.append("      </tuv>\n", 'UTF-8')
    				exportfile.append("      <tuv xml:lang=\"" + targetLocale + "\"", 'UTF-8')
    				exportfile.append(" changeid=\"${changeId ?: alt }\"", 'UTF-8')
    				exportfile.append(" changedate=\"${ changeDate > 0 ? new Date(changeDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    				exportfile.append(" creationid=\"${creationId ?: alt }\"", 'UTF-8')
    				exportfile.append(" creationdate=\"${ creationDate > 0 ? new Date(creationDate).format("yyyyMMdd'T'HHmmss'Z'") : alt }\"", 'UTF-8')
    				exportfile.append(">\n", 'UTF-8')
    				exportfile.append("        <seg>" + "$target" + "</seg>\n", 'UTF-8')
    				exportfile.append("      </tuv>\n", 'UTF-8')
    				exportfile.append("    </tu>\n", 'UTF-8')
    					}
    				}
    			}
    		}
    }
    
    exportfile.append("  </body>\n", 'UTF-8')
    exportfile.append("</tmx>", 'UTF-8')
    
    final def title = 'TMX file written'
    final def msg   = "$hitcount TU's written to " + exportfile.toString()
    console.println msg
    showMessageDialog null, msg, title, INFORMATION_MESSAGE
    return
    

    The only difference compared to the previous version is lines 43-66. The external program is specified in line 50.

  • new2tmx_tweak
    #!/bin/bash
    DATE=$(zenity --calendar \
    --title "Select date" \
    --text="TU's newer than the selected date will be used for export" \
    --date-format=%F)
    
    HRS=({00..24})
    MINS=({00..59})
    
    HRS=$(zenity --entry --title "Select time" \
    --text="Hours" \
    --entry-text="${HRS[@]}" )
    
    MINS=$(zenity --entry --title "Select time" \
    --text="Minutes" \
    --entry-text="${MINS[@]}" )
    
    zenity --question --title="Select Files" \
    --text="Do you want to select individual files for export?"
    if [ $? == "0" ]; then 
    FILESEL='yes'
    else
    FILESEL='no'
    fi
    
    echo "$DATE $HRS:$MINS"
    echo $FILESEL
    

    This script should be saved somewhere (in this example it’s /home/user/.omegat/script/new2tmx_tweak) and made executable (chmod +x /home/user/.omegat/script/new2tmx_tweak). Zenity should be installed for it to work. One can cook up a nicer GUI using other tools, of course, but this serves just as a quick example. I guess, similar can be done using AutoIt or AutoHotKey on MS Windows or AppleScript on OSX.
    If you happen to come up with your own date and time picker for this, feel free to share in comments or link to your solution.

But as of now,


wordpress visitor

Good luck!

File Renamer (Bash from withing OmegaT)

Situation

You have a client who loves to give his files very descriptive names. That’s understandable as mostly the files you get to translate from him are lessons, lectures, howtos, manuals and so on. It makes sense to distribute them digitally with localized filenames.

Problem

What you need is a way to translate filenames in OmegaT thus keeping consistency with the contents of the translated files and past/future files from the same client.
Continue reading

OmegaT match insert/replace without tags

Situation

After having translated a complete user manual that you converted from PDF to ODT to be able to work on it in OmegaT, you receive another manual from the same client, but this time it’s a DOCX file. Great! You can start right away, without converting anything. That should be a peace of cake — half of the manual looks almost the same as the one you have just done.

Problem

After starting to work with it you find out that getting a lot of 95-97% would be really awesome, if it wasn’t for all those nasty tags that are very different in the source and in the match. And there is no “Insert match without tags” menu item in OmegaT (yet).

Continue reading

Bash (Perl, Python, Tcl/Tk and what not) Scripting from within OmegaT

Situation

So, right now you’re using quite a few scripts while working in OmegaT. Some of them are the ones included in the Scripting Plugin, others are taken from the Internet, several of them were written on your own, and a couple are still cooking in your head, promising to be something that will save you a couple hours of work everyday in future and now hindering you from concentrating on what is at hand. To run them from with a key shortcut you had to assign global key combinations, as from withing OmegaT you can run only 5 custom scripts with a key combo, and those are not just any scripts but the ones that the Scripting Plugin can run.

Problem

Now, with many other scripts and actions used elsewhere for your work/leisure you’re running out of available key combinations, plus you get more and more questions like, “Dad, what you just did doesn’t work on my computer. Do I press it wrong or what’s the matter?” from your elementary-school-aged son.
What you want is an ability to run any script from within OmegaT, not just the ones that the Scripting Plugin can run, as you don’t want to be limited to Java-like languages, but you look for a way to use anything that you’re comfortable with. Besides, these custom scripts should be aware of your current OmegaT project’s variables and settings (like project folders, language pairs etc.) Then at least you’ll be able to say to your son, “Boy, you don’t use OmegaT yet. Let me better show you this combo that you can use on your computer.”
Continue reading

Dummy OmegaT tags

Situation

Now you’re working on a nice text with very few tags. The recipe with nice navigation between tags is quite good, but somewhat cumbersome for just two or three tags you have to insert here and there.

Problem

Minimizing the number of keystrokes to insert tags and staying mouse-free while doing that.
Continue reading

OmegaT tags one by one

Situation

When working with a tag-rich text in OmegaT where tags are really needed (for instance, with HTML files), you can either insert the source into translation input field (Ctrl+Shift+I or Ctrl+Shift+R) and then overwrite the original with your translation between the tags, or put them all into the input field without any original text (Ctrl+Shift+T) and write your translation between the appropriate tags. That’s if you wanna use the keyboard only. With the mouse you can select, copy and paste them one by one, screeching your teeth over each one of them and hating the waste of time.

Problem

What you want is a possibility to select one tag at a time, insert it, select the next one, and all that without the mouse.
Continue reading