[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wee-hooo, more revisions



This issue bugged me enough to look up ftp://ds.internic.net/rfc/rfc18
66.txt to see if my memory really *was* failing....

> the double quote (") is one of the 4 characters (", <, >, &) deemed to have
> special meaning in HTML.

Those are the four entities in printable ascii that have non-numeric 
entity names, yes.  (9.7.1).

> I'll grant that "s appearing outside of <>s tend
> to come through okay (at least in IE3 and Mosaic, they do; dunno about
> Netscape --- we're not allowed to use that around here :-), but I don't
> believe this is ever guaranteed by the spec.

You don't need to escape it outside of attribute values.  See the 
last table in section 3.2.1 for satisfactory evidence---that's the 
closest you're going to get without reading the DTD.

Inside double-quoted tag attribute values, you need to escape '"' 
somehow.  (See 3.2.4.)  Strictly speaking, you can use '"' bare 
inside a single-quoted attribute value:

  <IMG SRC=foo.gif ALT='Jay says, "Marca can BITE ME" "     " " "" '>

Of course, you'd then have to use the &#39; notation for single 
quotes.  All of this is pretty questionable from an SGML point of 
view; attributes aren't supposed to contain this kind of information, 
and entity processing in attributes is (to the best of my 
understanding) an extra non-SGML step required of HTML 
implementations.

This whole mess happened because this is Yet Another Way that IMG is 
seriously misdesigned.  If it was a container, we'd just throw the 
ALT text in the PCDATA section and we'd get the right thing.  Off the 
top of my head, there's no other element in HTML that represents text 
in an attribute or requires entity processing in an attribute---URLs 
already have a perfectly good quoting scheme.

> The other thing --- what motivated me to mention it at all --- is that the
> fontification for my html mode in Emacs gets seriously confused by singleton
> "s.  Maybe I just need a more sophisticated html mode, but I don't find it
> unreasonable for an html mode to expect that the special characters will
> only ever be used in the prescribed ways.

...except that this is and always has been perfectly legal.  The 
scapegoat here is Emacs's syntax tables, which don't quite capture 
the context required to get this right.

Jay