Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs







Recent posts

Wed 18 Sep 2019

Perils of PDF 5: Data Confusion



Mon 16 Sep 2019

About that email in Notes



Mon 9 Sep 2019

Perils of PDF 4: Missing and obscured data


November, 2019
SMTWTFS
     01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search the weblog





























Genii Weblog

Redacting content with Midas regular expressions

Thu 18 Nov 2010, 11:20 AM



by Ben Langhinrichs
Sorry about the typo in the title earlier.

A customer asked whether it was possible to use our Midas Rich Text technology to search through rich text, inside tables, in different fonts and with different attributes, and replace a specified pattern with a string of X's.

It couldn't be much easier. The Midas Rich Text LSX and Midas Rich Text C++ API both support regular expressions (mostly consistent with Perl expressions), and contain a number of methods to allow their use.  In this case, let's say the pattern was ORDnnnnnnnnnnnn where the n's may be any twelve digit number.  The code would simply be:

Call rtitem.ConnectBackend(doc.Handle, "Body", True)
Call rtitem.Everything.RegexReplace("ORD([0-9]{12})", "XXXXXXXXXXXXXXX")

but, let's say that isn't specific enough, as it could turn CHORD1234567890123456 into CHXXXXXXXXXXXXXXX3456, so let's say that the string must either be separated by whitespace or by the beginning/end of the text, so that "CHORD1234567890123456" would not match but "The order# is ORD123456789012" would.  Simple enough, just change the regular expression as below:

Call rtitem.ConnectBackend(doc.Handle, "Body", True)
Call rtitem.Everything.RegexReplace("(^|[\s*])ORD([0-9]{12})([\s*]|$)", "$1XXXXXXXXXXXXXXX$3")

and that's all it takes.  We could make it easier, but we'd probably have to read your mind.

Copyright © 2010 Genii Software Ltd.

What has been said:

No documents found