OmegaT’s speed on big files

Situation

Now you are working on a huge file with over a thousand segments where each segment is a complex sentence with numerous clauses and clarifying statements, and thus takes three or four lines in the editor pane. Using OmegaT for this work is a blessing in itself, as you get only this much at a time,  and, not getting lost too easily, can concentrate on your work much better.

Problem

When this particular file is open, you experience some latency in OmegaT’s response to your keystrokes or mouse clicks. Not a huge problem, but considering the size of each segment and the level of mental concentration this slowdown is rather annoying.


Solution

A natural solution would be splitting the big file into several smaller ones. While it sounds easy enough, here’s several hints to make the whole process easier, smoother,and eventually, faster.

  • The big file doesn’t need to be deleted or altered in any way. It stays where it is, and new smaller files are added to the project.
  • New smaller files should be named in a consecutive manner (with numbers, for instance) and stored in a separate sub-folder within the project’s source folder. It’s not absolutely necessary, but giving the files consecutive names saves the flow of the text, and storing them in a separate sub-folder makes it much simpler to locate and delete them afterwards.
  • 500-600 big segments in a file don’t seem to slow OmegaT down, so that can serve as a guideline when splitting the files.
  • If it’s a markup file (html, xml etc.), use a plain text editor and avoid WYSIWYG editors. The reason for that is preserving the same tags as in the big file. So, you just copy a big chunk of text, paste it in a new file created in your text editor, give the resulting file a proper extension (the same as the original) and store it in the sub-folder within the source folder where these temporary files are stored.
  • If it’s a odt or docx file, use “trimming” rather then copying and pasting. To do the trimming, copy the original to a new location, open it, scroll down to a section where the file should be split, and delete everything after that point. Then press Menu → Save as, give it a name and proper extension, and press Ok. Once it’s saved, press Ctrl+Z (Undo), and this time delete everything before that split point. Scroll down to the next split point and delete everything after the second split point. Now save under a new name. When saved, press Ctrl+Z, delete everything before the second split point, scroll down to the next split point… you get the picture.
  • Once new files are produced and stored within the project source folder, reload/reopen the project (F5 in OmegaT), select the first one of them in Project Files window and start translating. All the segments might look somewhat grayish — that’s ok because now every segment in the smaller file has its counterpart in the original file. To disable this grayish representation of non-unique segments, remove the check from View → Mark Non-Unique segments.
  • When creating translated documents in OmegaT (Ctrl+D), the huge file in target language will be created with the segments translated in the smaller ones.
  • Once everything in the temporary files is translated, they can be deleted along with the sub-folder where they were. The original file should be checked one more to make sure there isn’t any problem with tags.

Here you can read more about non-unique segments in OmegaT: Auto-Propagation and Alternative Translations of Internal Repetitions in OmegaT

Another — not-so-obvious — solution would be hiding little gray dots that represent whitespace (in case it was enabled): View → Mark Whitespace. It considerably speeds OmegaT up on big files when those are not shown. LanguageTool from OmegaT plugins does a great job showing you double spaces and other whitespace related issues/typos; but then you always check for that at the QA stage, don’t you?


Good luck

6 thoughts on “OmegaT’s speed on big files

  1. Hello Kos

    I increased the memory assigned to Omega T to get over this problem.

    The manual says
    The options for the program start-up in this case will be read from the OmegaT.l4J.ini file, which
    resides in the same folder as the exe file and which you can edit to reflect your setup.
    The following example for the INI file reserves 1GB of memory, requests French as the user
    language and Canada as the country:
    # OmegaT.exe runtime configuration
    # To use a parameter, remove the ‘#’ before the ‘-‘
    # Memory
    -Xmx1024M
    # Language
    -Duser.language=FR
    # Country
    -Duser.country=CA

    What do you think?

    • Yes, if you computer has enough memory, that’s a very worthwhile solution. Still, there might be some files where even 1024M or more don’t seem to make OmegaT much faster. And if I’m not mistaken, you can’t assign more than 2048M to a Java program.
      I used to have a project where each file was over 10 thousand segments (luckily, those were HTML files), and most of the segments were complete sentences 2-3 lines long, or 20-30 words. That caused me to look for a solution, which eventually I shared here.

  2. Pingback: 提高 OmegaT 处理大文件的速度(2个) | 译行者
  3. Hi, Kos, do you know any quick way to split xliff files? I have to work on a large sdlxliff file (more than 1000 segments. I have to give back the sdlxliff to my customer). I converted it in xlf (Rainbow) and I split it manually, but to split it in 3 or 4 parts I had to copy the heading and the final part of the xlf in every part file. Do you know if there is any tool to do it automatically?

    • Hi, Davide,
      If you’re working in OmegaT, you can load your big xliff and my other script that exports your whole project’s source (or only untranslated segments) to a text file. That text file will have all the tags, and you can split it in any text editor. Put the split txt files in /source, temporarily remove the original xliff (move it out of the /source folder), reload the project and keep translating. You may do a test with a couple of segments and then move the original xliff back (reloading the project) to see if it works. If it works, you may go on without the original xliff till you’re through, and then you can put your xliff back and remove the small txt’s.
      Links to these two scripts: http://wp.me/p3fHEs-4L
      http://wp.me/p3fHEs-5S

      If you don’t mind, please report back how it worked. Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s