What HTML/XML Does
Since the internet exploding back in the late nineties, a ton of terms have been birthed in the public mind. HTML, Javascript, Java (which is entirely separate from Javascript), CSS, XHTML, and XML. Just to name a few. And that’s just in the area of web development. In my opinion, though, XML is the most fun idea ever.
What is XML? To put it deceivingly simply: it’s labels for the internet. XML allows you to identify pieces of information, describe what they are and how a web browser should use them. The fun part, though, is how much more flexible it is than it’s predecessor, HTML.
HTML
HTML, or HyperText Markup Language, is one of the biggest reasons the internet is so useful today. Prior to HTML, when the internet was still just a government and university project, all you could do was pass entire documents back and forth. Straight text. No formatting, no bold, no font size. No nothing. Just. Text.
In the late eighties, early nineties, two guys, Tim Berners-Lee and Robert Cailliau worked together to create what would become a markup language that was used to describe what certain parts of the document were. It used tags that were a word or letter inside two carrot brackets lt; gt;. For example, the “h1” tag ( lt; h1 gt;) was used to describe a header section, while the “p” tag ( lt; p gt;) was used to describe a paragraph. You could nest these tags as well, so you could have a header inside paragraph, and so on. HTML included tags for quotes, tables, listed items, and a bunch of other fun stuff.
Even Robert Cailliau and his partner would not have thought that their markup would be such a success, thereby making HTML a benchmark for other programming techniques in the current times web design company has taken inspiration from it for their projects.
The Problem
HTML changed the way the internet worked. To a limited extent, you could use some HTML properties to describe how things looked. You could use a table to organize the layout of a page, you could change the color of text to distinguish it from other text. And best of all, the link tag ( lt; a gt;) allowed you to link to multiple pages.
But it was still limited. If you wanted to make a calendar document, you didn’t have a lt; date gt; tag, or a lt; startTime gt; tag. If you wrote YouTube, there was no lt; inaneComment gt; tag. It was all just paragraphs of text. HTML only describes data on the most basic level. Once again, you could send an entire document to someone across the world. But you couldn’t send just a piece, or use the information in your document to fuel some other application.
The Reason
Under the hood, HTML is pretty simple. It has been called a programming/scripting language, but it’s really not. Because you don’t decide what the tags do. You just tell the computer what information is what. It decides what it’s going to do. Every HTML document refers to a specific file called a Document Type Definition (.dtd). Most websites refer to the official HTML .dtd for instructions. And all web browsers have a default in case none is specified. On any web page you can View Source and see just which .dtd a document is using. Usually found at the very top of the document.
This document is the real engine behind HTML. A .dtd is what says that an h1 tag is going to be bold and be bigger than an h3 tag. It determines that an ordered list will have bullet points while an unordered list will not. And while the appearance can be altered by CSS, that .dtd document determines how a web browser will structure the HTML a document contains.
This document contains all the rules for all the HTML tags that exist. Nothing else is defined. Which is why you can’t have a lt; description gt; tag or a lt; facebookStatus gt; tag.
The Solution
That’s where XML comes in. XML or eXtensible Markup Language is a technology that allows you to write your own tags, and then write a definitions file to determine how they are handled. And it’s this massive flexibility that makes XML so much fun.
One of the best and certainly most popular examples is an RSS feed. RSS or Really Simple Syndication is just that. Most blogs’ (and a lot of frequently updated websites) entries are automatically encoded into an RSS feed. Behind the scenes, RSS, which is one application of an XML standard, places things like lt; title gt;, lt; content gt; , lt; author gt; and lt; date gt;tags around all the pieces of information in a blog entry. That information can then be passed from one website to another.
My website is ocentertainment.net. It’s blog-based which means any new entries are made into a blog editor. Once I’m finished, it passes all the information to the main page, where you’ll find the title, the content, who wrote it, etc. But I also have my site set up to syndicate to my Facebook account, where the entry is posted as a note. The Facebook note application retrieves that information from my RSS feed, and is able, based on the RSS standard built in XML, to determine what part is supposed to be the title, the content, the….you get the idea.
Similarly, when I get invited to an event on Facebook, I have that set up to syndicate my events to my Google Calendar. When I accept an invitation on Facebook, the name of the event, the start and end times, the description, they all get sent to my Google account and appear right inside my pre-existing Google calendar. Without XML, Google would just see a bunch of lines of text and not have a clue what to do with it.
The Future
XML has already changed how we handle information a ton. From blogs to music metadata, from productivity applications like calendars, to internet forums, we can now describe to a computer what information is and how to use it.
It’s no surprise that the W3C (World Wide Web Consortium) has decided to standardize XHTML. XHTML combines the best of both world. The bare necessity of HTML, which, to be fair, changed the internet as it is, and is a thorough, robust backbone. Combined with the flexibility that XML offers in describing complex pieces of information.
I’d like to say you’ll start to see this happen in the future, but truth be told, I’m late in the game. It’s already happening now. Websites utilizing XML-based technology are already revolutionizing what the internet is for. Combined with AJAX concepts, which is a whole ‘nother article in itself, and we’re looking at the worldwide merging of information and computing.
Another day, another tale, though.