XML (Extensible Markup Language) (Linktionary term)

Site home page
(news and notices)

Get alerts when Linktionary is updated

Book updates and addendums

Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)

Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!

Contribute to this site

Electronic licensing info

XML (Extensible Markup Language)

Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.

XML is a markup language that is more extensible than HTML and derived from the earlier SGML (Standard Generalized Markup Language) standard. HTML was also derived from SGML, but is limited to presenting graphical hyperlinked information on Web pages. XML allows developers to create more functional documents for exchanging structured information across the Internet. These documents may be Web pages viewed and manipulated by people or business documents that are exchanged during automated computer-to-computer business transactions. XML is a W3C (World Wide Web Consortium) specification. The first release appeared in February 1998.

XML creates a framework for documents in which data has meaning as defined by tags. In contrast, data in HTML documents is tagged only with a style. It is formatted, but there is no description about what a particular block of data is so that some other applications can grab it and use it in a meaningful way.

As an analogy, compare watching the financial channel on TV to viewing financial Web pages on the Internet. The TV presents information that you can try to remember. In contrast, financial Web pages give you information that you can capture and save. If the data is on an HTML page, that's better than the TV, but not as good as XML. Since HTML does not provide a description of what information on a page is, you have to highlight the data, copy and paste it, then edit it to get the data into tabular form. In some cases, it's easier to just retype the data! XML documents define data inside tagged elements, allowing you to extract the data directly into spreadsheets or databases.

In traditional data storage, records are defined by lines of text and fields are represented by the position of the data element in the line (and separated by a character such as a comma). Now, assume you open the document with a text editor and try to figure out what all the data is without descriptive information. Not easy. With XML, data is defined by descriptive tags. For example, the following tag identifies a document author:

<creator>Tom Sheldon</creator>

Descriptive tags benefit searching, document classification, and so on. Search engines can easily identify the author as the name between these tags. The industry has worked to standardize the most common descriptors such as "title," "subject," "publisher," "date," and so on. Most are defined in the Dublin Core and other metadata standards. See "Metadata." In addition, entire industries (automotive, medical, construction, and so on) have worked to create their own standard sets of descriptors to define common elements within documents.

By describing data in XML documents with tags, Web pages move from being just "human-readable" to being "computer readable." Documents all over the Web will move from an information presentation paradigm to an information database paradigm. While people will be able to turn XML documents into useful data, the bigger picture is computer-to-computer information exchange and automated business transactions over Internet links.

In a multitiered client/server model, a user accesses a middle-tier presentation and business logic server, which itself accesses back-end databases. The middle-tier server retrieves data from the back-end system based on user requests, then converts the data to XML format and sends it to the client. Since all the data from the back-end server has been identified and tagged by the mid-tier server, the client knows how to identify the data as well. It can reformat the information, use it in other programs, or extract parts of the data as needed.

Markup languages were used as early as the 1960s. Charles Goldfarb at IBM headed a team that created GML (Generalized Markup Language). This later expanded into SGML, which included more automation. In the 1990s, Tim Berners-Lee derived HTML from SGML when he created the technology for the Web. XML is a hybrid of concepts and ideas learned in HTML and SGML, with an emphasis on universal data exchange.

One of the most concise descriptions of XML was made by Simon Phipps, IBM's evangelist for Java and XML, during a chat session (see Web address on the related entries page). He said that XML itself is nothing but the assertion "let's use tags to format data."

Phipps likes to describe the transition to XML as the "last gap" in defining a new world of information sharing. This has been achieved through the following progression of events:

TCP/IP has become the near-universal communications protocol for connecting information systems.

Browsers have become the common space into which solutions can be loaded.

Component technologies such as Java are now established as the standard for platform-neutral computing.

Data was the last gap. An open data-formatting specification was needed. XML is that specification.

More and more, XML is being selected as a solution for building large software projects. According to the W3C, opting for XML is a bit like choosing SQL for databases: you still have to build your own database and your own programs/procedures that manipulate it, but XML is license-free and there is a growing community of developers with tools and experience.

This topic will be expanded with a section describing the basics of XML, along with a list of XML specifications, initiatives, and developments. You'll also find a list of industry-defined schemas.