Insert Custom Tag-like Strings in OmegaT

This is a little article on how to semi-automatically insert context-dependent custom strings in OmegaT. To illustrate what I mean, I give a little example from the kind of work I get to do pretty often.

Many times when I translate Christian books, I have to insert biblical references which generally follow several patterns, most commonly they are in a Book chapter:verse format, like 2 Cor 1:3. Sometimes the reference may be given as a range, like 2 Pt 2:3-5 or 15:4-19:22; 21:12-13 etc. Maybe there are people who enjoy typing digits in, but I’m definitely not among them. So how do I insert them without actually typing them in? One possibility is to use custom tags (Options → Tag Validation; Regular expression for custom tags), but that gives me a possibility to describe only a small subset of all possible cases (due to my insufficient mastery of regex magic, perhaps), but there’s also a bigger problem in such approach, as Western and Eastern Bibles use different versification, so English Ps 51:7 becomes Пс. 50:9 in Russian and Пс. 51:9 in Ukrainian, and thus tag verification would fail. The same goes for those custom tags that need to be translated. Besides, custom tags is a global setting, whereas I’d like something more project specific.

Of course, not everyone has to deal with that kind of stuff in their work. But there still can be plenty of data that can be described by a regular expression or even quoted as is and then inserted verbatim or after a certain transformation. Good examples of such things might be measurements, time notations, names or even glossary items — anything that requires consistent use during the whole project.

So here I have a script that looks up such strings or snippets in the current segment and insert them one by one upon each script invocation. The snippets are described in an external file. I assigned this script to Ctrl+Shift+F4 (to be in the center of the keyboard), and now, if the source segments contains one or more of the strings I defined, they will be inserted one by one on each shortcut press. If one needs to be skipped, it’s done relatively easy — insert it with the script and press Ctrl+Z. In the listing below you’ll find the script that you can copy as text and save on your computer, or you can download it from pastebin.com, heading in bold is the link.

  • insert_custom_strings.groovy

    /*
     * Purpose:	Insert custom string specified in external file, one by one
     * #Files:	Requires 'custom_strings.ini' in  '.ini' subfolder
     * 	in the current project's root
     * #File format:	Plain text, where *each* line is:
     *	[Sting in the source] [Tab] [String to insert];
     *	only the last line *must* be empty.
     * #Details:	http://wp.me/p3fHEs-86
     *
     * @author	Kos Ivantsov
     * @date	2013-10-15
     * @version	0.3
     */
    import org.omegat.util.StaticUtils
    import static javax.swing.JOptionPane.*
    import static org.omegat.util.Platform.*
    
    use_seg_subs = 'no'
    
    def prop = project.projectProperties
    if (!prop) {
    	final def title = 'Insert custom strings'
    	final def msg   = 'Please try again after you open a project.'
    	console.println msg
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	return
    }
    
    def srcfile = editor.currentFile
    def ste = editor.currentEntry
    def cur_text = ste.getSrcText()
    def cur_seg = ste.entryNum()
    def configdir = StaticUtils.getConfigDir()
    
    def folder = prop.projectRoot
    def fileloc = folder+'.ini/custom_strings.ini'
    def stringfile = new File(fileloc)
    def infofile
    infofile = configdir+'script/custom_strings.txt/'
    infofile = new File(infofile)
    srcexport = new File(configdir+'script/source.txt/')
    srcmod = srcexport.lastModified().toString()
    subst_file = new File(folder+'.ini/segment_substitution.ini')
    
    if (! stringfile.exists()) {
    	final def title = 'No file' 
    	final def msg   = """\
    File $stringfile
    that defines strings to be inserted doesn't exist\
    """
    	showMessageDialog null, msg, title, INFORMATION_MESSAGE
    	console.println msg
    	return
    	}
    
    if (! subst_file.exists()) {
    	use_seg_subs = 'no'
    	}
    
    def length = stringfile.readLines().size()
    def search_array = []
    def replace_array = []
    def count = 0
    
    while ( count < length ) {
    	ln = stringfile.readLines().get(count).tokenize('\t')
    	sr = ln[0]
    	rp = ln[1]
    	search_array.add(sr)
    	replace_array.add(rp)
    	count++
    	}
    
    def tag_index_array = []
    def cur_txt_array = []
    
    def range = 0..<search_array.size()
    
    for (i in range) {
    	cur_text.findAll(search_array[i]) {
    		if (it =~ "\\["){
    		it = it[0]}
    		tag_index_array.add("${cur_text.indexOf(it)}:$i")
    		cur_txt_array.add("${cur_text.indexOf(it)}<rem>$it")
    		}
    }
    
    tag_index_array.sort{it.findAll("\\d+:")}
    cur_txt_array.sort{it.findAll("\\d+<")}
    for (i in 0..<tag_index_array.size()){
    	newitem = tag_index_array[i].replaceAll("\\d+:", "")
    	tag_index_array.putAt(i, newitem)
    	newword = cur_txt_array[i].replaceAll("\\d+<rem>", "")
    	cur_txt_array.putAt(i, newword)
    	}
    def seg_tag_count = tag_index_array.size()
    
    if (seg_tag_count == 0){
    	console.println "no insert_strings in current segment"
    	return
    	}
    
    def curtag_number
    def writeinfo(file, src, seg, num, mod) {
    	file.write(src+"\n",'UTF-8')
    	file.append(seg+"\n", 'UTF-8')
    	file.append(mod+"\n", 'UTF-8')
    	file.append(num+"\n", 'UTF-8')
    }
    
    if (!infofile.exists() || infofile.readLines().get(0) != "$srcfile" || infofile.readLines().get(1) != "$cur_seg" || infofile.readLines().get(2) != "$srcmod") {
    	writeinfo(infofile, srcfile, cur_seg, "0", srcmod)
    	curtag_number = infofile.readLines().get(3) as int
    	if (seg_tag_count > 1) {
    		writeinfo(infofile, srcfile, cur_seg, "1", srcmod)
    		}
    	} else {
    	curtag_number = infofile.readLines().get(3) as int
    	if (curtag_number < seg_tag_count-1){
    	writeinfo(infofile, srcfile, cur_seg, "${curtag_number+1}", srcmod)
    		} else
    		writeinfo(infofile, srcfile, cur_seg, "0", srcmod)
    }
    
    num = tag_index_array[curtag_number] as int
    text = cur_txt_array[curtag_number]
    ser = search_array[num]
    rep = (replace_array[num] =~ /\$(\d+)/ ).replaceAll( '\\${(it[$1] as String) }' )
    shell = new GroovyShell()
    eval = {statement, arg -> shell.setVariable 'it', arg; shell.evaluate '"' + statement + '"' }
    text = text.replaceAll(/null/, 'repl0')
    text = text.replaceAll(ser) { eval rep, it }
    text = text.replaceAll(/null/, '')
    text = text.replaceAll(/repl0/, 'null')
    
    if (use_seg_subs == 'yes'){
    	length = subst_file.readLines().size()
    	search_array = []
    	replace_array = []
    	count = 0
    	
    	while ( count < length ) {
    		ln = subst_file.readLines().get(count).tokenize('\t')
    		sr = ln[0]
    		rp = ln[1]
    		search_array.add(sr)
    		replace_array.add(rp)
    		count++
    		}
    		
    		range = 0..<search_array.size()
    		
    	for ( i in range) {
    		ser = search_array[i]
    		rep = (replace_array[i] =~ /\$(\d+)/ ).replaceAll( '\\${(it[$1] as String) }' )
    		shell = new GroovyShell()
    		eval = {statement, arg -> shell.setVariable 'it', arg; shell.evaluate '"' + statement + '"' }
    		text = text.replaceAll(/null/, 'repl0')
    		text = text.replaceAll(ser) { eval rep, it }
    		text = text.replaceAll(/null/, '')
    		text = text.replaceAll(/repl0/, 'null')
    		}
    	}
    
    editor.insertText(text)
    

    Custom stings are described in .ini/custom_strings.ini. It’s a plain text file where each line should follow this convention:
    [Sting in the source] [Tab] [String to insert],
    and only the last line must be empty.
    So, if, for instance, anything inside brackets needs to be inserted together with the brackets, there will be a line that reads:

    (\((.*?)\))	$1


    Inserting strings like 15:32 or 12:1-15, or 1:3, 12, 17 is achievable with this line:

     (\d+\:\d+)((\, |\-)?\d+)+?	$1$2


    In line 18 there’s a possibility to enable conversion along with insertion. If this line reads use_seg_subs = 'yes', then the script will try to find the snippet in the file segment_substitution.ini in .ini subfolder of the current project root. That file with the respective script was described in this post and this update (it has been updated since original publishing, so now it looks for its instructions in .ini subfolder, where my other scripts have their files with instructions; and now it can also work with selected text — check it out and update that script if you were using it before or download and start using it now).
    In order to be able to use the script described here, exporting segments to text files must be enabled (Options → Editing Behaviour; Export segments to text files). The script writes its temporary file (the user doesn’t need it for anything, I’m just explaining for those who like to know what actually goes on) in the same directory where the rest of the export files are, and checks their modification time to determine if the segment was changed since last invocation.

Here’s a link for my fellow team members who work with me in the same team plowing our way through Scott Hahn’s commentaries. (Don’t forget to enable conversion in line 18). Anyone else who translates Christian materials from English to Ukrainian using Khomenko Bible is welcome to help themselves. Nobody? Oh, well…


If you find it helpful or have any ideas how to improve it, I’m eager to hear.
But as of now,


Good Luck

About these ads

2 thoughts on “Insert Custom Tag-like Strings in OmegaT

  1. Pingback: (CAT) - Insert Custom Tag-like Strings in Omega...
  2. Pingback: (CAT) – Insert Custom Tag-like Strings in OmegaT | libretraduko.wordpress.com | Glossarissimo!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s