[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: escapes in quoted strings?
Erik Ostrom[SMTP:eostrom@research.att.com] wrote:
>>I'm not sure whether the intent is that values should be any possible
>>7-bit ASCII string or merely the 7-bit-printable/readable ASCII
>>string (i.e., ASCII 32-126 + TAB (maybe)) that MOO supports, and
>>perhaps you want to stay vague about that (*).
>
> Hey, you never got around to the footnote. Anyway, IMO the intent is
> ASCII 32-126 + TAB (maybe). Although at some point we may want to
> break into 8-bit ascii, unicode etc.
basically the footnote was going to be about how you might want to be
thinking about 8-bit ascii, unicode, etc... and in any event NOT tie
the standard to 7-bit US ASCII which is becoming less relevant these
days. And then I skimmed the archives and saw that people had already
mentioned this.
> Our intent with MCP 1.0 was that we be able to use MOO's built-in word
> splitter to parse the opening line into message, auth key, keywords,
> and values. So strings should basically be as they are in MOO:
> 7-bit-printable, delimited with ", escapes for " and \, etc. These
> days that goal seems less important to me, but I haven't seen a good
> reason to abandon it. (See Jay's recent post on concept reuse.)
> Anyway, my point here isn't "we must slavishly follow MOO", but "we
> MEANT to slavishly follow MOO" in this respect. Any divergence is
> probably a bug in the spec.
There's nothing wrong with slavishly following MOO in a particular
instance where MOO got something mostly Right... so long as there's
some agreement on what the basic principles are (e.g., "any one-line
string value must be representable as a quoted string") and an explicit
description of MOO is doing so that non-MOO implementations can do the
same.
While we're on the subject of semantics, I take it there is
expected/allowed to be a semantic difference between a single-line value
and the corresponding value done as a multi-line having only one line?
That is, are implementations allowed to translate
#$#... kwd: "foo"
and
#$#... kwd*: "" _data-tag: 666
#$#* 666 kwd: foo
#$#: 666
differently (e.g., the one case translates to a string value "foo" while
the other translates into a list value {"foo"})? The default answer is,
of course, "Yes", but if this is indeed your intent it may be good to be
explicit about this.
...this, by the way, makes it all that much more important to be able to
escape any character in a quoted string and that you leave room in the
protocol for being able to deal with whatever extensions you might do
later to the character set (i.e., specifying either that a '\' be
followed ONLY by one of the specified escape characters (", \, etc...)
or equivalently that '\' followed by an unexpected character may combine
with arbitrarily many following characters to produce something
indeterminate --- what you DON'T want to do is say that '\x' translates
to something like '\x' or 'x' in all other cases people might decide to
rely on this and you're screwed if you decide you want to use '\x' for
an extension later on)...
Meanwhile, the alternative (on 1-line multilines) is to stipulate that
1-line multilines and single-line strings must translate the same way,
just as
#$#... kwd: foo
and
#$#... kwd: "foo"
must translate the same way according to the current spec. With escapes
available, there probably isn't any reason to do that.
[... Sorry to dwell so much on something so trivial, but when stuff like
this is gotten wrong, it's very painful; I've been dealing far too much
recently with Windows-NT command shell where the answers to questions
like this are basically, "Specification? Hahahahahahahahaha..."]