B.2.2 Ampersands in URI attribute values


The URI that is constructed when a form is submitted may be used as an anchor-style link (e.g., the href attribute for the A element). Unfortunately, the use of the “&” character to separate form fields interacts with its use in SGML attribute values to delimit character entity references. For example, to use the URI “http://host/?x=1&y=2″ as a linking URI, it must be written <A href=”http://host/?x=1&#38;y=2″> or <A href=”http://host/?x=1&amp;y=2”>.

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of “;” in place of “&” to save authors the trouble of escaping “&” characters in this manner.

B.3 SGML implementation notes

B.3.1 Line breaks

SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.

The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P>
Thomas is watching TV.

So must the following two examples:

<A>My favorite Website</A>
My favorite Website

B.3.2 Specifying non-HTML data

Script and style data may appear as element content or attribute values. The following sections describe the boundary between HTML markup and foreign data.

Note. The DTD defines script and style data to be CDATA for both element content and attribute values. SGML rules do not allow character references in CDATA element content but do allow them in CDATA attribute values. Authors should pay particular attention when cutting and pasting script and style data between element content and attribute values.

This asymmetry also means that when transcoding from a richer to a poorer character encoding, the transcoder cannot simply replace unconvertible characters in script or style data with the corresponding numeric character references; it must parse the HTML document and know about each script and style language’s syntax in order to process the data correctly.

Element content

When script or style data is the content of an element (SCRIPT and STYLE), the data begins immediately after the element start tag and ends at the first ETAGO (“</”) delimiter followed by a name start character ([a-zA-Z]); note that this may not be the element’s end tag. Authors should therefore escape “</” within the content. Escape mechanisms are specific to each scripting or style sheet language.

The following script data incorrectly contains a “</” sequence (as part of “</EM>”) before the SCRIPT end tag:

    <SCRIPT type="text/javascript">
      document.write ("<EM>This won't work</EM>")

In JavaScript, this code can be expressed legally by hiding the ETAGO delimiter before an SGML name start character:

    <SCRIPT type="text/javascript">
      document.write ("<EM>This will work</EM>")

In Tcl, one may accomplish this as follows:

    <SCRIPT type="text/tcl">
      document write "<EM>This will work</EM>"

In VBScript, the problem may be avoided with the Chr() function:

    "<EM>This will work<" & Chr(47) & "EM>"

Attribute values

When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of “&” if the “&” is not meant to be the beginning of a character reference.

  • ‘”‘ should be written as “&quot;” or “&#34;”
  • ‘&’ should be written as “&amp;” or “&#38;”

Thus, for example, one could write:

 <INPUT name="num" value="0"
 onchange="if (compare(this.value, &quot;help&quot;)) {gethelp()}">

B.3.3 SGML features with limited support

SGML systems conforming to [ISO8879] are expected to recognize a number of features that aren’t widely supported by HTML user agents. We recommend that authors avoid using all of these features.

B.3.4 Boolean attributes

Authors should be aware that many user agents only recognize the minimized form of boolean attributes and not the full form.

For instance, authors may want to specify:

<OPTION selected>

instead of

<OPTION selected="selected">

B.3.5 Marked Sections

Marked sections play a role similar to the #ifdef construct recognized by C preprocessors.

 <!-- this will be included -->

 <!-- this will be ignored -->

SGML also defines the use of marked sections for CDATA content, within which “<” is not treated as the start of a tag, e.g.,

 <an> example of <sgml> markup that is
 not <painful> to write with < and such.

The telltale sign that a user agent doesn’t recognize a marked section is the appearance of “]]>”, which is seen when the user agent mistakenly uses the first “>” character as the end of the tag starting with “<![“.

B.3.6 Processing Instructions

Processing instructions are a mechanism to capture platform-specific idioms. A processing instruction begins with <? and ends with >

<?instruction >

For example:

<?style tt = font courier>
<?page break>
<?experiment> ... <?/experiment>

Authors should be aware that many user agents render processing instructions as part of the document’s text.

B.3.7 Shorthand markup

Some SGML SHORTTAG constructs save typing but add no expressive capability to the SGML application. Although these constructs technically introduce no ambiguity, they reduce the robustness of documents, especially when the language is enhanced to include new elements. Thus, while SHORTTAG constructs of SGML related to attributes are widely used and implemented, those related to elements are not. Documents that use them are conforming SGML documents, but are unlikely to work with many existing HTML tools.

The SHORTTAG constructs in question are the following:

  • NET tags:
  • closed Start Tag:
  • Empty Start Tag:
  • Empty End Tag: