Next: XML Schema
Up: Principles of SOAP Web
Previous: Principles of SOAP Web
Contents
XML - Extended Markup Language
The World Wide Web (WWW) as we know it consists of computing nodes
serving documents which contain information. These documents are stored
in several data formats, most often in the ``Hyper Text Markup
Language'' (HTML). Other data formats are proprietary and can only be
read by specific applications. Many of these data formats describe
presentation issues but most often do not deal with the meaning of the
document. [DAC01] points out that this leads to information
overload and poor content aggregation.
To overcome these problems, documents have to be structured and the
meaning of the contained data has to be added. One mechanism for doing
this is markup. [XML1] states ``Markup is a method of conveying
metadata (information about another dataset).'' All
SGML-based9 languages use so-called ``tags'' for
separating pieces of information into so-called ``elements''. These
tags add the needed metadata for describing the meaning of the
elements.
One widely adopted markup language is the ``Hyper Text Markup
Language'' (HTML), which is currently used for most of the documents
available on the Internet. HTML contains a fixed set of tags that are
used to add formatting and presentation logic to documents but lacks
the possibility to define new tags which is required to add other than
formatting metadata to documents.
In the end of the 1990s the World Wide Web Consortium designed an
extensible markup language called ``Extended Markup Language'' (XML)
that combines the flexibility of SGML and the widespread acceptance of
HTML.
The basic structure of an XML document is best explained by a simple
example as given in Listing 1. The first line is a
so-called ``processing instruction'' (PI) and provides commands and
information to the XML parser. In this case the parser is told that
the document complies to the XML 1.0 standard. The third line is a
comment that is used for documentation purposes and is ignored by the
parser.
The rest of the document consists of various elements which are
arranged in a ``1:n'' parent/child structure10. The first element - the XML
standard calls it the ``root element'' - opens with the tag currency list and has an attribute which specifies its date. The
contents are multiple currency elements which also have
attributes and contain other elements. XML documents can be modeled
in a tree-like structure as shown in figure 9.
language=XML
Figure 9:
XML tree of the given example.
![\begin{figure}\centering
\includegraphics[scale=0.7]{graphics/xml_tree.eps}\end{figure}](img10.png) |
XML has the following key features:
- Extensible:
- An important issue of XML is that tags
can by freely specified. This means that metadata of any kind can be
added to XML documents.
- Legible to humans /easy to create:
- XML documents are normally
created and edited by specific tools, called ``XML parsers'' but there
can be specific cases (e.g. debugging or testing) where the documents
have to be edited by hand. XML has the advantage that it is legible to
humans, moreover, there are editors available that support syntax
highlighting and structuring of the document, which makes the editing
of XML documents very easy.
- Verification of syntax and semantics:
- XML documents can be
checked for syntactic and semantic correctness by the XML parser. If
the document is syntactically correct, it is called ``well formed'',
which means that it fulfills the XML standard. To check the semantics
of a document, the user has to supply information about the document's
structure and its grammar to the XML parser, which is then used for
validation. For this task, the XML standard suggests two formal
description languages, namely the ``Document Type Definition''
(DTD)11 and
``XML Schema'', which is explained in section 2.2.
- Namespaces:
- When combining structures with different
vocabularies into one document, naming conflicts can occur. XML solves
this problem with the use of so-called namespaces. [BIR01]
describes namespaces like this: ``An XML Namespace is simply a group
of names, usually with a related purpose or context, where the group
has a globally unique name (the ``namespace name''). This is often
ensured by using a domain name (from Internet DNS) as the first part
of the namespace name.'' The namespace concept is best described with
a simple example, given in listing 2. Here two
namespaces are defined which refer to two different vocabularies which
both contain the word ISO but have a different meaning. With
the namespace prefixes cur: and cnt: these two words can
be distinguished.
language=XML
XML has gained broad acceptance and is used in various fields. Many
data formats, languages and protocols are XML-based and it is expected
that XML will obsolete various other data formats. The protocols and
languages described in the next sections are all XML-based.
Next: XML Schema
Up: Principles of SOAP Web
Previous: Principles of SOAP Web
Contents
Hermann Himmelbauer
2006-09-27