Things That Don't Change -- SGML Basics

SGML and HTML are made up of elements (which are often called tags) and content. Tags and elements are not quite the same thing. An element is the fundamental basis for a tag. For example, in a tag which marks up a hyperlink, such as

<A href="">Hyperlink Text</A>

the element is the letter A (which stands for 'Anchor'), the attribute is href (which stands for 'Hypertext REFerence'), the value of the attribute is, which is the address of the document which will load when the hyperlink is activated, and the tag is the element, its attributes and their values, contained within delimiters. The content of this tag is the words Hyperlink Text. Tag delimiters are often called "brackets" and are the < (open delimiter, or 'Less-Than') and > (end delimiter, or 'Greater-Than') symbols. Attribute values must be delimited by (contained within) either single ( ' ascii 039 ) or double ( " ascii 034 ) quotation marks*. Most tags are used as "containers" in that the text which is to be rendered by the markup is contained between an open tag and an end tag. The end tag is recognizable by the forward slash, or "solidus" ( / ) which follows the open delimiter and precedes the element. End tags do not have attributes, and are not always required. For more details about which elements require end tags, the HTML 3.2 and the HTML 4.0 Reference Specifications at World Wide Web Consortium describe in detail the precise usage of each element.

In general, tags are case insensitive. A tag that says <BLOCKQUOTE> will do exactly the same thing that a tag which says <bLoCkQuOtE> although if you choose the latter style you should plan on spending a lot more time typing. Generally speaking people put tags in ALL UPPERCASE LETTERS to make them more visible, however I generally make all my tags lowercase, to save time. Many HTML authoring tools have color-coding functionality built into them to clearly identify what in the document is a tag and what is not, and some even color-code different types of tags. A HTML authoring tool is not necessary to write successful web pages, it can be done in any basic text editor such as NotePad (for Windows), SimpleText (for Mac) or PICO (for Unix-like systems), and can also be done using more advanced programs such as MSWord, or PageMaker that have the ability to output documents in a text-only (ASCII) format.

Certain characters which can be typed from the keyboard have special meanings in HTML and their usage is reserved. These special characters include ampersands ( & ), tag delimiters ( < and > ), quotation marks ( " ) and the number or hash sign ( # also called an "octothorpe"). There are also some characters which cannot be typed directly from the keyboard, such as the copyright symbol ( © ), and the registered trademark symbol ( ® ). When using these characters in a HTML document, escape sequences or character entities are used. An escape sequence or character entity is delimited in a HTML document by an ampersand ( & - the open delimiter ) and a semicolon ( ; - the end delimiter ). An escape sequence is usually a set of alphabetic characters and a character entity is usually the character's ASCII or (less commonly and currently unsupported) UNICODE equivalent preceded by the number or hash sign; both the escape sequence and the character entity are enclosed in delimiters. The escape sequence for the copyright symbol ( © ), for example, is &copy; and the character entity for the same symbol is &#169;. There are 255 character entites (one for each of the standard ASCII characters) which are extremely useful when authoring HTML documents about HTML. Escape sequences can be used to represent the entire text-content of a document if you're either very silly or very patient, but apart from that, such a use doesn't serve much purpose.

There are two ways of marking up content using basic HTML; Logical Styles and Physical Styles. A third way is also possible, the way that was used for the text formatting in this web site; Cascading Style Sheets. It should be noted that a thorough understanding of basic HTML is necessary before cascading style sheets should be attempted.

Logical Styles define the style of the text in terms of its usage or meaning within the document. They do not define the appearance of the text, but define the way the text is used in the document. Logical styles are more for defining how text functions in a page. Physical styles are more for defining the way the text appears when it is displayed by the browser. Both logical and physical style tags are containers meaning that they all have open and closed forms.

Examples of Logical styles are the standard Heading Styles; <H1> through <H6>, Unordered (<UL>) and Ordered Lists (<OL>) and their List Items (<LI>), and the Strong and Emphasis styles (<strong> and <em>). Other, less commonly used logical styles are <address>, <blockquote>, <samp>, <PRE>, and <cite>. SGML uses these elements to mark up sections of a document which might be used in research. A person might use an automated tool to search an internet site for headings to make a table of contents, or for a listing of code examples used, or for references to the works of a certain author.

Browsers (also called User Agents, or UAs) are "rated" on their levels of conformance to the standards set by SGML and HTML. All browsers are required to be Level 0 Conformant, which means that they are capable of correctly rendering Heading Styles, Lists and Anchors (hyperlinks). Other levels of compliance and their respective functionalities are as follows: Level 1, capable of correctly rendering Images, Emphasis and Text Highlighting; Level 2, capable of correctly rendering Forms and Character Definitions (as defined by such mechanisms as the <FONT> tag); Level 3, capable of correctly rendering Tables, Figures, etc. (Proposed as extensions to RFC1866, the HTML 2.0 standard); and Level 4 capable of correctly rendering Mathematical formulæ, which are proposed as an extension to RFC 1866 but not existing in practice.