Every StartTag Must Have an EndTag

One of the problems with parsing HTML documents is that not every element requires a start-tag and an end-tag. Take the following example:

<p>Here is some text in an HTML paragraph. <br>

Here is some more text in the same paragraph.

<P>And here is some text in another HTML paragraph.</p>

Notice that the first <p> tag has no closing </p> tag. This is allowed in HTML, because most web browsers can figure out where the end of the paragraph should be. (In fact, years ago, this type of practice was even encouraged in some circles to reduce file size.) In this case, when the browser comes across the second <P> tag, it knows to end the first paragraph and begin a new paragraph. Then there's the <br> tag (line break), which by definition has no closing tag.

In addition, notice that the second, uppercase <P> start-tag is matched by a </p> end-tag, in lowercase. This is not a problem for HTML browsers, because HTML is not case sensitive; but as you'll soon see, this would cause a problem for an XML parser.

The problem is that this makes HTML parsers harder to write. Developers must add code to take into account all of these factors, which often makes the parsers larger and much harder to debug. What's more, the way in which files are parsed is not standardized—different browsers do it differently, leading to incompatibilities (perhaps not in this simple example, but when it comes to HTML tables, browser inconsistencies are a nightmare, and badly created HTML markup makes things much worse!).

For now, just remember that in XML the end-tag is required, and its name has to exactly match the start-tag's name.

0 0

Post a comment

  • Receive news updates via email from this site