Search
  • Jourik Ciesielski

The Excel headache: how memoQ and regular expressions save the day

If you work in the translation industry, you’ll undoubtedly know that Microsoft Excel is one of those file formats a lot of people make horrible abuse of. Prepping Excel documents for translation is very cumbersome in many cases and it usually involves a lot of cutting, copying, pasting and hiding. All this manual work requires time, a sharp focus, a good night’s rest and is obviously very expensive.


The aim of this article is to demonstrate how the prep work of a complex Excel document can be reduced to a couple of minutes if one manages to get the best out of his/her CAT tool (in this case memoQ) and make optimal use of regular expressions.


Source data


The source Excel data that I used for this use case is structured like this:

Cells that contain translatable content are placed underneath each other in groups of 3 (“SOURCE”). The corresponding translations need to end up in the 3 cells below the translatable cells, e.g. the translation of C2 needs to end up in C5, C3 in C6 and C4 in C7. Note that the source Excel document consists of 16 columns (A-P) and 277 rows (1-277) to process.


Pre-processing


memoQ has an amazing multilingual Excel filter that allows to define source and target columns unlimitedly, but this filter doesn’t work if you need to play around with rows, which is the case here. Alternatively you can select all the source content and paste it into a new sheet with the “Transpose” option:

The translation of A3 now needs to end up in D3, B3 in E3 and C3 in F3. In other words, rows have been converted to columns and vice versa, which allows to make use of memoQ’s multilingual Excel filter.


Filter configuration


A quick calculation shows that there are approx. 280 columns (A-JP) to define in the filter, which would obviously be too time-consuming. This is where regular expressions save the day. Let’s go through the process step by step.


Start building the filter configuration and define the meaning of the first couple of columns. Make sure to add enough columns as memoQ only displays column A-Z by default:

Close the filter and export it to *.mqres format (“Resource console” > “Filter configurations” > “Export”). Open the *.mqres file in a text editor (EditPad Pro, Notepad ++). This is what the configuration looks like:

Take the already configured columns as example to write an appropriate regex search & replace operation, in this case:


Search:


(<Column ColumnHeader=")([A-Z]+)(" Meaning=")NotImported(" ForColumn=")(" Lang=")(" ShowInPreview=")False(" MeaningRaw=")0(" />)(\r?\n *)<Column ColumnHeader="([A-Z]+)" Meaning="NotImported" ForColumn="" Lang="" ShowInPreview="False" MeaningRaw="0" />\r?\n *<Column ColumnHeader="([A-Z]+)" Meaning="NotImported" ForColumn="" Lang="" ShowInPreview="False" MeaningRaw="0" />\r?\n *<Column ColumnHeader="([A-Z]+)" Meaning="NotImported" ForColumn="" Lang="" ShowInPreview="False" MeaningRaw="0" />\r?\n *<Column ColumnHeader="([A-Z]+)" Meaning="NotImported" ForColumn="" Lang="" ShowInPreview="False" MeaningRaw="0" />\r?\n *<Column ColumnHeader="([A-Z]+)" Meaning="NotImported" ForColumn="" Lang="" ShowInPreview="False" MeaningRaw="0" />


Replace:


$1$2$3SourceText$4$5$6True$7four$8$9$1$10$3SourceText$4$5$6True$7four$8$9$1$11$3SourceText$4$5$6True$7four$8$9$1$12$3Translation$4$2$5ger-DE$6True$7five$8$9$1$13$3Translation$4$10$5ger-DE$6True$7five$8$9$1$14$3Translation$4$11$5ger-DE$6True$7five$8


Don’t forget to replace all occurrences of “four” and “five” with "4" and "5" after the search & replace. When the transformations are done, the meaning of every column will be defined correctly in the *.mqres file:

Save the *.mqres file and import it as a new filter configuration in memoQ (“Resource console” > “Filter configurations” > “Import new”). The meaning of every column will be defined correctly in the filter in memoQ itself as well:

The source file and the filter configuration are now ready to be used in the memoQ translation project.


Post-processing


If everything goes well, the target Excel document will look like this:

You can now select all the translated content and paste it back in the original sheet (and in its original range) with the “Transpose” option: