[32] in Coldmud discussion meeting

Re: Regular expressions

daemon@ATHENA.MIT.EDU (Tue Nov 16 17:44:11 1993 )

From: joe@unipress.com
To: Greg Hudson <ghudson@MIT.EDU>
Cc: coldstuff@MIT.EDU
In-Reply-To: Your message of "Tue, 16 Nov 93 16:42:42 EST."
             <9311162142.AA18371@oliver.MIT.EDU> 
Date: Tue, 16 Nov 93 17:31:55 -0500


> First, I already provide two matching primitives, match_pattern() and
> match_template(), which should handle some of the cases.  (I'm
> interested in hearing about cases where these aren't useful.)  I don't
> want to provide a complicated tool which will be used for tasks for
> which simpler tools are more appropriate.

One of the larger purposes of C-- will be text processing.  Regular
expressions are incredibly useful for this.  Almost all UNIX text processing
tools make extensive use of them (not the best endorsement, but you get the
gist).  Emacs would be worthless without them.  Simpler, less powerful tools
and constructs just won't do.

Hell, I'd say a full-blown LALR parser and parser-generator wouldn't be too
strong.

> Second, regular expressions are slow compared to simpler matching
> primitives.  Precompiling them helps (I would probably implement this
> internally, with some form of caching, rather than add another data
> type), but don't solve the problem.

I feel so strongly about the utility of regular expressions that I would say
they deserve a "compiled pattern" data type of their own, actually. 
Regardless of implementation, some sort of means to reuse compiled regular
expressions is crucial.

> Third, regular expressions are complicated.  I can fairly easily link
> in, say, the gnu regexp library, but this means I have to worry about
> the GPL, the portability issues associated with gnu code, all the
> cruft that comes with general reusable code (fairly minimal in this
> case, granted), etc..

I'm fairly certain we can find a regular expression package that isn't GPL (or
even LGPL).

> Fourth, regular expressions are unreadable.  You can do wonderful
> things with them, but they're essentially a write-only language.

Do you mean "humans can't read them", or "programs can't easily decompile
them"?  Both are somewhat true.  Once one gets the knack of reading regular
expressions from a little practice (Emacs hacking helps), they aren't so bad. 
The latter problem is perhaps best solved by storing the original text of the
regexp along with the compiled version.

I have any number of gripes about regular expression syntax as commonly
encountered in UNIX systems.  I've even designed a few alternative syntaxes. 
I've yet to have the time to implement them, however.  The problem with
alternative syntaxes is that so many folks are used to the bent and strange
current syntax.  **shrug**  I can deal with the strange syntax in the absence
of anything better, just so long as I don't have to live without the
functionality.

You can expect that anyone with the cleverness to understand a language on the
level of C-- to have the power to figure out how to use regular expressions. 
Granted, just because somebody has the capability to learn C-- doesn't mean
they should be forced to learn other complicated things too, but since I
believe regular expressions are very useful ...

> Fifth, I'm kind of wary of totally in-server robots.  I think that
> decision mechanisms in robots should generally be done client-side
> (there are plenty of client tools for this).

In my experience, in-server robots are the way to go.  Client robots just
can't get at the information as expediently or easily using text-only
protocols.  Client robots also require the implementation in a different
language than the system which they are interfacing with, which deters
otherwise creative users.

> Sixth, regexps are not a backward-incompatible change, so they can be
> added to a post-1.0 release fairly easily.

Agreed.  They can wait, but not forever!

--joe