Genii Weblog

Redacting content with Midas regular expressions

Thu 18 Nov 2010, 11:20 AM

by Ben Langhinrichs
Sorry about the typo in the title earlier.

A customer asked whether it was possible to use our Midas Rich Text technology to search through rich text, inside tables, in different fonts and with different attributes, and replace a specified pattern with a string of X's.

It couldn't be much easier. The Midas Rich Text LSX and Midas Rich Text C++ API both support regular expressions (mostly consistent with Perl expressions), and contain a number of methods to allow their use.  In this case, let's say the pattern was ORDnnnnnnnnnnnn where the n's may be any twelve digit number.  The code would simply be:

Call rtitem.ConnectBackend(doc.Handle, "Body", True)
Call rtitem.Everything.RegexReplace("ORD([0-9]{12})", "XXXXXXXXXXXXXXX")

but, let's say that isn't specific enough, as it could turn CHORD1234567890123456 into CHXXXXXXXXXXXXXXX3456, so let's say that the string must either be separated by whitespace or by the beginning/end of the text, so that "CHORD1234567890123456" would not match but "The order# is ORD123456789012" would.  Simple enough, just change the regular expression as below:

Call rtitem.ConnectBackend(doc.Handle, "Body", True)
Call rtitem.Everything.RegexReplace("(^|[\s*])ORD([0-9]{12})([\s*]|$)", "$1XXXXXXXXXXXXXXX$3")

and that's all it takes.  We could make it easier, but we'd probably have to read your mind.

Copyright 2010 Genii Software Ltd.

What has been said:

No documents found