What is the difference between HTML and XHTML?

What is the difference between HTML and XHTML?

 

 

When Web Standards such as XHTML and CSS began to be known, many misconceptions were circulating. Among these, there was the idea that XHTML was fundamentally different from the good old HTML that we knew. For many beginners, XHTML was associated with CSS and valid W3C code, and HTML with tables and tag soup. This is of course a wrong vision.

XHTML 1.0 is a transposition of HTML 4

XHTML 1.0 is an XML syntax translation of HTML 4. The difference is therefore only the syntax, which is more rigorous in XML (and thus in XHTML 1.0) than in HTML 4.

So, in XHTML:

  •     Any opening tag must be closed, and the so-called "empty" tags are written with a final slash (example: <br />).
  •     The names of tags and attributes are written in lowercase.
  •     The values ​​of the attributes are framed by 'quotes' (straight quotes) or 'double quotes'.
  •     Each attribute must have a value (no empty attribute like checked, which must be written checked = "checked").
  •     HTML elements must be properly nested
    (<strong> <span> content </ span> </ strong> and not
    <Strong> <span> content </ strong> </ span>).

That's all ! As you can see, no notion of tables, CSS, semantics or accessibility. In fact, these terms and domains already exist in HTML.

In the extreme, we can make valid code in HTML 4 and fully exploit the possibilities of CSS, or make the soup of tags in tables in XHTML 1.0 Strict!

Why use one or the other?


For starters, remember that using XHTML 1.0 is not an obligation at all. We can also wonder about the right way to declare and serve XHTML. But essentially HTML 4 and XHTML 1.0 are equivalent in terms of functionality (elements and attributes, main "grammatical" rules).

If their functionality is equivalent, why use XHTML 1.0 rather than HTML 4? An argument often given is the following: XHTML allows a simpler learning of HTML. For two reasons:
  •     as it is more strict, we avoid misinterpretation of browsers (if we misinterpret the elements, what should the browser do?);
  •     as the possible writings are fewer (no tags in turn in lower case or capital letters, no closing tags a sudden present and a missing shot ...), the syntax is mastered more easily.

 

The HTML4.01, XHTML1.0 and HTML5 DTDs: which doctype to choose?


If you have adopted the process of bringing your website into compliance with the W3C Standards (X) HTML and CSS, or if you are about to do so, you can not miss having to place, in mind of each of your pages, a strange piece of code of the type:

<! DOCTYPE html PUBLIC "- // W3C // DTD XHTML 1.0 Strict // EN"
   "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

This is called a Document Type Declaration (DTD), also called "doctype". Here is an introduction to this notion, followed by a brief description of the 6 DTDs that you will most often be asked to make your choice. This problem of the choice of the DTD is developed here in 2 successive versions: a first version, detailed, a second very simple version, if you want an immediate answer to the question: what do I put at the beginning of my page? Web?

Warning :

  •  These explanations are placed solely in the optics "Make an (X) HTML document that will be treated as HTML", and not as XML;
  • This approach indeed meets most of the current needs at the moment: we will not tackle here the case of the XHTML1.1 DTDs and beyond, of implementation much more complex, and which currently respond to very specific needs.
  • Pragmatic, these explanations are not exempt from simplifications.
  • If you want to do HTML5, go no further and use the simplified doctype: <! Doctype html>

DTD, what is it? What's the point ?

 The doctype and the Document Type Definition


The "Doctype", often referred to as the Document Type Declaration (DTD), is used to indicate which write rules obey the code of an HTML or XHTML page. These rules are in fact contained in rather special documents, the Document Type Definitions (also abbreviated as DTD), written in a language that may seem a little barbaric, and hosted on the W3C website (for those which interest us here). If you are curious, and want to see an example, download this one. But beware, you will be surprised: indeed, DTD are intended to be read primarily by the "machines" software, not by human beings.

Who serves the DTD?


Contrary to what one might think, it does not serve the current Web browsers to understand the structure of an (X) HTML page: these "read" this code (X) HTML without using the DTD for the decipher, using only the "rules" contained in their own "engine".

On the other hand, the DTD is indispensable to the validators (X) HTML like that of the W3C, which need it to know which rules the document is supposed to obey. Admittedly, it is possible to force the validation in the absence of DTDs, by using the extended interface of the W3C validator.

But a document that is even valid and devoid of DTD will pose a rendering problem in modern browsers, which we will examine below: the CSS rendering will indeed differ depending on the presence or absence of a DTD, because of the mechanism of the "doctype" switching ".

 The doctype switching, an unexpected detail, important for rendering


DTDs still have an influence on the behavior of modern browsers. They were not intended for this, but it is a "trick" that was created by the manufacturers of browsers to distinguish:

  •  web pages encoded in the old way, ie indifferent to the respect of a standardized universal format, optimized for a particular browser, and dependent on proprietary rendering modes;
  • and valid web pages (X) HTML, a priori interpretable by all browsers in the same way.
  • The browser will not process the HTML code and CSS code of these pages in the same way. Why ? Because if it applied modern (X) HTML-CSS processing rules to old-fashioned pages, the displayed result could be catastrophic:
  • some proprietary codes of the browser X may no longer be recognized in the same way by it, or emulated by other browsers. All need to rely on their own error handling mode to render as consistently as possible those documents that do not obey sufficiently precise rules (the HTML specifications do not define in an exhaustive manner this mode of management of fault).
  • The way CSS styles are applied differs between standard and older implementations.

So it's a "simple" (sic) compatibility problem. Concretely, Internet Explorer Windows since version 6.0, IE5Mac, Opera, FireFox, Safari, etc. simply rank the DTDs into at least two categories, which can be summarized as:

  •     incomplete, incorrect, out-of-date DTDs, etc. (or the absence of DTD): the page is presumed "old coded", and the rendering will be in "quirks" mode, compatible with the old implementations of each browser;
  •     the complete and recent DTDs (shown below): the page is presumed to be encoded according to the standard indicated by the DTD, and the rendering will be done in "strict" mode, conforming to the standards in force.

In practice, must one take into account the doctype switching?

No. That is to say :

  •     That it can allow you to solve the question of the rendering of old pages of which you do not want to correct the code;
  •     But do not deliberately use a DTD that will switch the browser Quirks mode in a new site, a new page, or content that you put in compliance with the standards.

Indeed, this mode is designed to process the rendering of old pages that can not be upgraded for obvious reasons of cost. It was not designed to create new ones, except in the very special case where you would be, for example, totally dependent on Microsoft proprietary implementations: this concerns "large accounts" customers, among which you probably do not include . Unlike them, you most likely have the opportunity to review your code without the cost of this standardization is prohibitive, or to adopt new and good coding practices, without the incompatibility of your new pages with the old is an unmanageable problem.

As long as we can produce a validable code, we will always use correct, complete DTDs, and the strict rendering mode of browsers, adapting the CSS presentation data accordingly: rather than adapting the code or the DTD to a choice of CSS properties, we will do the opposite approach by first making his choice of structure and validity before taking care of presentation.

Concretely, if for example, you find that your layout is based on the non-standard CSS box model, specific to Microsoft: do not play on the DTD to make Internet Explorer Quirks mode (where it applies this proprietary model); stay in one of the DTDs below, and redo your presentation respecting the CSS2.1 box template. Your design will gain in strength, and you will gain in skills.

If it is impossible, for one reason or another, to produce a validable code (forced presence of a proprietary element), and only in this case: not to put DTD. For example, if you use a <marquee> element: do not put a DTD using any trick to hide your <marquee> from the validator. This DTD and your pure form validity will not be useful.

Lastly, the 6 DTDs listed below will all make modern browsers work in strict rendering mode, even if they do not call themselves all "strict".

Choose your DTD, detailed version

You currently have 6 DTDs for what you need to do here. No more no less. In practice, two of them are sufficient to meet the majority of needs. What differentiates them? Essentially:

  • Different HTML syntax rules: You will not write your markup in the same way;
  •  A slightly different set of tags between the three categories transitional, strict and frameset, knowing that the main difference is the tags used only to create text presentation effects. These effects can be handled in a simpler and more flexible way using CSS styles. All the more, to say the least, that these are the easiest CSS properties for a beginner (text alignment, border appearance, italics and colors, etc.). ;

So it is up to you to choose according to the constraints that you are ready to accept in writing your code, knowing that accepting a more constraining grammar (XHTML) will not complicate the task, on the contrary: you will obviously have to be more careful in writing your code or choosing your editing tool. But you will largely compensate for this effort by producing a code whose interpretation (and therefore rendering) is unambiguous.

Here are the main criteria for choosing between these different DTDs:

HTML5:

<! DOCTYPE html>

Here is no mystery, the new doctype for HTML5 is lightened and simplified. It does not drag the browsers on the slippery Quirks way.

HTML4.01 transitional


<! DOCTYPE HTML PUBLIC "- // W3C // DTD HTML 4.01 Transitional // EN"
   "Http://www.w3.org/TR/html4/loose.dtd">

  •     the document is HTML: it can not be treated as XML;
  •     no proprietary tags (<marquee>, <embed>, etc);
  •     closing the area, dt, dd, p, li, thead, tfoot, colgroup, col, tr, th, and td tags is optional;
  •     Tags can be written in upper or lower case;
  •  for the values ​​of the HTML attributes, the quotation marks can be omitted if the value of the attribute contains only letters (az and AZ), digits (0-9), hyphen (-), the character comma (,), underscore (_), and colon (:)
  • the attributes can be minimized: we write <OPTION selected> instead of <OPTION selected = "selected">
  • the following attributes and presentation elements are allowed:
        BASEFONT and FONT elements;
        CENTER, U, STRIKE and S elements;
        ALINK, BACKGROUND, BGCOLOR, LINK, VLINK, TEXT attributes of the BODY element;
        Attributes BGCOLOR, HEIGHT, NOWRAP, WIDTH internal elements of tables;
        Attributes BORDER, HSPACE, VSPACE images and objects;
        CLEAR, NOSHADE, SIZE, WIDTH Attributes of HR Separation Lines;
        COMPACT attributes, TYPE of list items, and START, VALUE attributes of numbered lists;
        WIDTH attribute of the PRE element;
  •     the target attribute of links is allowed;
  •     IFRAME elements are allowed (but not FRAMESET or FRAME);

Strict HTML4.01

<! DOCTYPE HTML PUBLIC "- // W3C // DTD HTML 4.01 // EN"
   "Http://www.w3.org/TR/html4/strict.dtd">

As in HTML4.01 transitional:
  •     the document is HTML: it can not be treated as XML;
  •     no proprietary tags (<marquee>, <embed>, etc);
  •     closing the area, dt, dd, p, li, thead, tfoot, colgroup, col, tr, th, and td tags is optional;
  •     Tags can be written in upper or lower case;
  •     for the values ​​of the HTML attributes, the quotation marks can be omitted if the value of the attribute contains only letters (az and AZ), digits (0-9), hyphen (-), the character comma (,), underscore (_), and colon (:)
  •     the attributes can be minimized: we write <OPTION selected> instead of <OPTION selected = "selected">

But, unlike the transitional HTML4.01:
  •     previous attributes and presentation elements are no longer allowed. They must be replaced by CSS styles;
  •     the target attribute of links is not allowed;
  •     IFRAMEs are not allowed (FRAMESET and FRAME).

HTML4.01 frameset

<! DOCTYPE HTML PUBLIC "- // W3C // DTD HTML 4.01 Frameset // EN"
   "Http://www.w3.org/TR/html4/frameset.dtd">

The rules are the same as Transitional HTML4.01, but the BODY element no longer exists. It is replaced by the FRAMESET element, which contains the FRAME elements.

XHTML1.0 transitional:

<! DOCTYPE html PUBLIC "- // W3C // DTD XHTML 1.0 Transitional // EN"
   "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The document can be treated as HTML (which you will do in practice) but also as XML (which is more complicated, and that Internet Explorer can not do correctly in this case).

The syntax rules are different from those in HTML:

  •     proprietary tags are not allowed;
  •     all tags without exceptions must be closed;
  •     all tags and their attributes must be lowercase;
  •     attributes can no longer be minimized: you can not write <option selected>. Write only <option selected = "selected">;
  •     quotation marks are required around all attribute values;
But, just like in transitional HTML4.01 (if not the fact of writing them in lowercase):

  • the following attributes and presentation elements are allowed:
        * Elements base and do;
        * Elements center, u, strike and s;
        * Attributes alink, background, bgcolor, link, vlink, text of the body element;
        * Attributes bgcolor, height, nowrap, width of internal elements of tables;
        * Border, hspace, vspace attributes of images and objects;
        * Attributes clear, noshade, size, width of separation lines hr;
        * Compact attributes, type of list items, and start, value attributes of numbered lists;
       *  Width attribute of the pre element;
  •     the target attribute of links is allowed;
  •     iframe elements are allowed (but not frameset or frame);

In short: the stock of available tags is the same as HTML4.01 transitional, but their syntax is more rigorous.

Strict XHTML1.0:


<! DOCTYPE html PUBLIC "- // W3C // DTD XHTML 1.0 Strict // EN"
   "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

As in Transitional XHTML1.0, the syntax rules are strict:

  •     The document can be treated as HTML (which you will do in practice) but also as XML (which is more complicated, and that Internet Explorer can not do correctly in this case);
  •     proprietary tags are not allowed;
  •     all tags without exceptions must be closed;
  •     all tags and their attributes must be lowercase;
  •     attributes can no longer be minimized: you can not write <option selected>. Write only <option selected = "selected">;
  •     quotation marks are required around all attribute values;

But, unlike the transitional XHTML1.0, and exactly like in HTML4.01 Strict:

  • previous attributes and presentation elements are no longer allowed. They must be replaced by CSS styles;
  •     the target attribute of links is not allowed;
  •     iframes are not allowed (neither frameset and frame).

In short: the stock of available tags is the same as in strict HTML4.01, but their syntax is more rigorous.

XHTML1.0 frameset


<! DOCTYPE html PUBLIC "- // W3C // DTD XHTML 1.0 Frameset // EN"
   "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

The rules are the same as Transitional XHTML1.0, but the body element no longer exists. It is replaced by the frameset element, which contains the frame elements.

In short: the stock of available tags is the same as HTML4.01 frameset, but their syntax is more rigorous (you guessed it, right?)

Choose your DTD, version for readers in a hurry

Use preferably XHTML1.0 Strict: it is a priori the most adapted to your needs and it is the easiest to use and to learn.

If you use iframe or a target attribute, use XHTML1.0 transitional.

if you want to use the frameset and frame: use the XHTML1.0 frameset DTD.

A tool for creating structures (X) HTML

To help you design the skeletons of your (X) HTML pages according to the chosen Doctype, Alsacréations has set up a tool called Skeleton.

Tool Test Skeleton Generator (X) HTML

Conclusion

By way of conclusion, let us underline 4 observations on which misunderstandings are frequent:
  •     XHTML1.0 does not separate the content and layout more than HTML4.01: in both cases, it is actually the choice between strict and transitional that makes the difference;
  •     None of these DTDs no longer provide accessibility a priori: XHTML1.0 is not more accessible than HTML4.01. It is the use you make of it that will make the difference;
  •     XHTML1.0 provides no gain "semantics" compared to HTML4.01, which takes the elements and almost all attributes. Again, that's what you'll do that will make the difference.
  •     But XHTML1.0 is no more difficult to learn than HTML4.01, on the contrary: the rigorous syntax limits the risk of errors.






No comments

Powered by Blogger.