Документ взят из кэша поисковой машины. Адрес оригинального документа : http://oit.cmc.msu.ru/rfc/Classified/www/html/html-2.0/htmlsp20.txt
Дата изменения: Tue Mar 12 16:59:52 1996
Дата индексирования: Mon Oct 1 22:33:34 2012
Кодировка:






HTML Working Group T. Berners-Lee
INTERNET-DRAFT D. Connolly
MIT/W3C
Expires in six months May 4, 1995


Hypertext Markup Language 2.0

Status of this Memo

This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents at
any time. It is inappropriate to use Internet-Drafts as reference mate-
rial or to cite them other than as ``work in progress.''

To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow Direc-
tories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
(Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West
Coast).

Distribution of this document is unlimited. Please send comments
to the HTML working group (HTML-WG) of the Internet Engineering Task
Force (IETF) at html-wg@oclc.org. Discussions of the group are archived
at http://www.acl.lanl.gov/HTML_WG/archives.html.

Abstract

The Hypertext Markup Language (HTML) is a simple markup language
used to create hypertext documents that are platform independent. HTML
documents are SGML documents with generic semantics that are appropriate
for representing information from a wide range of domains. HTML markup
can represent hypertext news, mail, documentation, and hypermedia; menus
of options; database query results; simple structured documents with in-
lined graphics; and hypertext views of existing bodies of information.

HTML has been in use by the World Wide Web (WWW) global information
initiative since 1990. This specification roughly corresponds to the
capabilities of HTML in common use prior to June 1994. HTML is an appli-
cation of ISO Standard 8879:1986 Information Processing Text and Office
Systems; Standard Generalized Markup Language (SGML).





Berners-Lee, Connolly FORMFEED[Page 1]





INTERNET DRAFT May 1985


The "text/html; version=2.0" Internet Media Type (RFC 1590)
and MIME Content Type (RFC 1521) is defined by this specification.

















































Berners-Lee, Connolly FORMFEED[Page 2]





INTERNET DRAFT May 1985


1. Introduction The HyperText Markup Language (HTML) is a simple data
format used to create hypertext documents that are portable from one
platform to another. HTML documents are SGML documents with generic
semantics that are appropriate for representing information from a wide
range of domains.



1.1. Scope HTML has been in use by the World-Wide Web (WWW) global
information initiative since 1990. This specification corresponds to
the capabilities of HTML in common use prior to June 1994 and referred
to as ``HTML 2.0''.

HTML is an application of ISO Standard 8879:1986 Information Pro-
cessing Text and Office Systems; Standard Generalized Markup Language
(SGML). The HTML Document Type Definition (DTD) is a formal definition
of the HTML syntax in terms of SGML.

This specification also defines HTML as an Internet Media
Type[IMEDIA] and MIME Content Type[MIME] called `text/html', or
`text/html; version=2.0'. As such, it defines the semantics of the HTML
syntax and how that syntax should be interpreted by user agents.



1.2. Conformance This specification governs the syntax of HTML docu-
ments and the behaviour of HTML user agents.



Documents A document is a conforming HTML document only if:


o It is a conforming SGML document.


o It conforms to the application conventions in this specification.
For example, the value of the `HREF' attribute of the `A' element
must conform to the URI syntax.


There
-----------
There are a number of syntactic idioms that are not sup-
ported or are supported inconsistently in some historical
user agent implementations. These idioms are called out in
notes like this throughout this specification.
HTML documents should not contain these idioms, at least



Berners-Lee, Connolly FORMFEED[Page 3]





INTERNET DRAFT May 1985


The HTML DTD defines a standard HTML document type and several
variations, based on feature test entities:


HTML.Recommended
Certain features of the language are necessary for compatibility
with widespread usage, but they may compromise the structural
integrity of a document. This feature test entity enables a more
prescriptive document type definition that eliminates those fea-
tures.

For example, in order to preserve the structure of a document, an
editing user agent may translate HTML documents to the recommended
subset, or it may require that the documents be in the recommended
subset for import.


HTML.Deprecated
Certain features of the language are necessary for compatibility
with earlier versions of the specification, but they tend to be
used an implemented inconsistently, and their use is deprecated.
This feature test entity enables a document type definition that
eliminates these features.

Documents generated by tranlation software or editing software
should not contain these idioms.




User Agents An HTML user agent conforms to this specification if:


o It parses the characters of an HTML document into data characters
and markup as per [SGML].


o It behaves identically for documents whose parsed token sequences
are identical.

For example, comments and the whitespace in tags disappear during
tokenization, and hence they do not influence the behaviour of con-
forming user agents.



-----------
until such time as support for them is widely deployed.



Berners-Lee, Connolly FORMFEED[Page 4]





INTERNET DRAFT May 1985


o It allows the user to traverse (or at least attempt to traverse,
resources permitting) all hyperlinks in an HTML document.


o It allows the user to express all form field values specified in an
HTML document and to (attempt to) submit the values as requests to
information services.


In

@@Levels?































-----------
In the interest of robustness and extensibility, there are a
number of widely deployed conventions for handling non-
conforming documents. See `Undeclared Markup Error Han-
dling' for details.



Berners-Lee, Connolly FORMFEED[Page 5]





INTERNET DRAFT May 1985


2. HTML as an Application of SGML

HTML is an application of ISO Standard 8879:1986 - Standard Gener-
alized Markup Language (SGML). SGML is a system for defining structured
document types and markup languages to represent instances of those doc-
ument types[SGML]. The public text -- DTD and SGML declaration -- of
the HTML document type definition are provided in `HTML Public Text'.

The term HTML refers to both the document type defined here and the
markup language for representing instances of this document type.



2.1. SGML Documents An HTML document is an SGML document; that is, a
set of entities, including the document entity, which is text entity in
which parsing begins. The first production of the SGML grammar sepa-
rates an SGML document into three parts: an SGML declaration, a pro-
logue, and an instance.

For the purposes of this specification, the prologue is a DTD.
This DTD describes another grammar: the start symbol is given in the
doctype declaration; the terminals are data characters and tags, and
the productions are determined by the element declarations. The
instance must conform to the DTD, that is, it must be in the language
defined by this grammar.

The SGML declaration determines the lexicon of the grammar. It
specifies the document character set, which determines a character
repertoire that contains all characters that occur in all text entities
in the document, and the character numbers associated with those charac-
ters.

The SGML declaration also specifies the syntax character set of the
document, and a few other parameters that bind the abstract syntax of
SGML to a concrete syntax. This concrete syntax determines how each
text entity is mapped to a sequence of terminals in the grammar of the
prologue.

For example, consider the following document:




Parsing Example

Some text. *wow*






Berners-Lee, Connolly FORMFEED[Page 6]





INTERNET DRAFT May 1985


An HTML user agent should use the SGML declaration is given in
`SGML Declaration for HTML'. It specifies ISO-8859-1 as the document
character set, so that the markup `*' represents an asterisk charac-
ter.

The instance above is regarded as the following sequence of termi-
nals:


1. TITLE start tag

2. data characters: ``Parsing Example''

3. TITLE end tag

4. P start tag

5. data characters ``Some text. ''

6. EM start tag

7. ``*wow*''

8. EM end tag

The start symbol of the DTD grammar is HTML, and the productions
are given in the public text identified by `-//IETF//DTD HTML 2.0//EN'
(`HTML DTD'). Hence the terminals above parse as:



HTML
|
\-HEAD, BODY
| |
\-TITLE \-P
| |
| \-

,"Some text. ",EM
| |
| \-,"*wow*",
\-,"Parsing Example",










Berners-Lee, Connolly FORMFEED[Page 7]





INTERNET DRAFT May 1985


2.2. HTML Lexical Syntax The syntax character set for all HTML docu-
ments is ISO-646-IRV. A minimally conforming HTML user agent must sup-
port the SGML declaration in `SGML Declaration for HTML', which speci-
fies ISO Latin 1 (@@full name) as the document character set; it may
support other SGML declarations, in particular, SGML declarations with
other document character sets.

A complete discussion of SGML parsing, e.g. the mapping of a
sequence of characters to a sequence of tags and data is left to the
SGML standard[SGML]. This section is only a summary.



Data Characters Any sequence of characters that do not constitute markup
(see ``Delimiter Recognition,'' section @@@ of [SGML]) are mapped
directly to strings of data characters. Some markup also maps to data
character strings. Numeric character references also map to single-
character strings, via the document character set. Each reference to
one of the general entities defined in the HTML DTD also maps to a sin-
gle-character string.

For example,



abc<def => "abc","<","def"
abc<def => "abc","<","def"



Note that the terminating semicolon is only necessary when the
character following the reference would otherwise be recognized as
markup:



abc < def => "abc ","<"," def"
abc < def => "abc ","<"," def"



And note that an ampersand is only recognized as markup when it is
followed by a letter or number:



abc & lt def => "abc & lt def"
abc & 60 def => "abc & 60 def"



Berners-Lee, Connolly FORMFEED[Page 8]





INTERNET DRAFT May 1985




A useful technique for translating plain text to HTML is to replace
each '<', '&', and '>' by an entity reference or numeric character ref-
erence as follows:



ENTITY NUMERIC
CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION
& & & Ampersand
< < < Less than
> > > Greater than



There



Tags Tags delimit elements such as headings, paragraphs, lists, charac-
ter highlighting and links. Most HTML elements are identified in a doc-
ument as a start tag, which gives the element name and attributes, fol-
lowed by the content, followed by the end tag. Start tags are delimited
by `<' and `>'; end tags are delimited by `'. An example is:



This is a Heading





Some elements only have a start tag without an end tag. For exam-
ple, to create a line break, you use the `
' tag. Additionally, the
end tags of some other elements, such as Paragraph (`

'), List Item
(`'), Definition Term (`'), and Definition Description
(`
') elements, may be omitted.

The content of an element is a sequence of data character strings
and nested elements. Some elements, such as anchors, cannot be nested.
-----------
There are SGML mechanisms, CDATA and RCDATA, to allow most
`<', `>', and `&' characters to be entered without the use
of entity references. Because these features tend to be
used and implemented inconsistently, and because they con-
flict with techinques for reducing HTML to 7 bit ASCII for
transport, they are not used in this version of the HTML
DTD.



Berners-Lee, Connolly FORMFEED[Page 9]





INTERNET DRAFT May 1985


Anchors and character highlighting may be put inside other constructs.
See the HTML DTD, `HTML DTD' for full details. The



Names A name consists of a letter followed by up to 71 letters, digits,
periods, or hyphens. Element names are not case sensitive, but entity
names are. For example, `
', `
', and ` quote>' are equivalent, whereas `&' is different from `&'.

In a start tag, the element name must immediately follow the tag
open delimiter `<'.



Attributes In a start tag, white space and attributes are allowed
between the element name and the closing delimiter. An attribute typi-
cally consists of an attribute name, an equal sign, and a value, though
some attributes may be just a value. White space is allowed around the
equal sign.

The value of the attribute may be either:


o A string literal, delimited by single quotes or double quotes and
not containing any occurrences of the delimiting character.


o A name token (a sequence of letters, digits, periods, or hyphens)

In this example, img is the element name, `src' is the attribute
name, and `http://host/dir/file.gif' is the attribute value:







Some
-----------
The SGML declaration for HTML specifies SHORTTAG YES, which
means that there are other valid syntaxes for tags, such as
NET tags, `'; and empty
end tags, `'. Until support for these idioms is widely
deployed, their use is strongly discouraged.
Some historical implementations consider any occurrence of
the `>' character to signal the end of a tag. For ompati-



Berners-Lee, Connolly FORMFEED[Page 10]





INTERNET DRAFT May 1985


A useful technique for computing an attribute value literal for a
given string is to replace each quote and space character by an entity
reference or numeric character reference as follows:



ENTITY NUMERIC
CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION
TAB Tab
LF Line Feed
CR Carriage Return
Space
" " " Quotation mark
& & & Ampersand



For example:



First "real" example



Some

Note that the SGML declaration in section 13.3 limits the length of
an attribute value to 1024 characters.

Attributes such as ISMAP and COMPACT, may be written using a mini-
mized syntax. The markup: