Finding Errors and Tools

Gearhead is chagrined, embarrassed and could kick himself for making a foolish mistake in last week's Gearhead column, as reader Steven Foust pointed out:

"You stated that egrep '[Qq][^u]' words.list would 'find all words that start with upper or lower case 'q' followed by anything that isn't a 'u'. Thus the regular expression will find Qantas, Iraqi and Iraq.'

Unfortunately, there are a few errors. If you try this, you'll find that although words.list contains the three words you gave, it does not match Iraq (as the end-of-line character does not match the [^u] pattern). Also, the regex given would find any word *containing* (Q or q) not followed by u - not words that start with Q or q. To find all words that start with upper or lower case 'q' followed by anything that isn't a 'u' would require egrep '^[Qq][^u]' words.list, which would match Qantas, but not Iraqi or Iraq. For further information, check out the 'Regular Expressions' section of the grep MAN page."

Our big mistake was in the "start with" claim, as Foust pointed out. He was also correct about the end-of-line character not matching the term [^u], although it is our understanding that some versions of egrep actually strip the end-of-line character so that there's no character there at all.

Another reader, Edward Mills, noted:

"There are two implementations of regular expressions: the one used by everyone in the universe, and the one used by Allaire Corp. (in my use of Homesite).

Almost every time I need to use Allaire's implementation I'm forced to look up the idiosyncrasies on its Web site. The biggest difference is they treat the entire file as one long string, as opposed to everybody else (that I know of) treating each line separately."

Interesting stuff, but Gearhead knows there are more than two implementations of regular expressions. There's the original Unix grep, extended grep or egrep, awk, GNU emacs, and . . . well, check out Table 6-1 on page 182 of the book we recommended last week: Mastering Regular Expressions by Jeffrey Friedl.

This comes down to a need to know the oddities of the regular expression processor you are using, or you could waste a lot of time.

How about a cool tool to do string matching which can cope with different regex dialects? Check out Eluent Find in Eluent Tools (currently at Version 1.2) from Eluent (www.eluent.com/).

This package contains a collection of file and folder tools that run under Microsoft Windows 95/98/NT/2000. Included are Eluent Replace ("an interactive, multifile, multipattern search-and-replace program that uses Perl regular expressions and supports Perl scripting"), Eluent Attrib ("a program for changing file and folder attributes") and Eluent EOL ("a program for converting text files to and from DOS/Windows, Macintosh and Unix formats") as well as Eluent Find.

Find is a file-content search tool that "supports a variety of regular expression dialects."

Gearhead is rather impressed with the Find tool, and thinks system managers may well find it indispensable. Eluent offers Eluent Tools on a 30-day evaluation, after which the product loses some features. Eluent Tools cost US$30.

Find any mistakes? Tell us at gh@gibbs.com.

Join the newsletter!

Error: Please check your email address.

More about AllaireMicrosoftQantas

Show Comments