A RESTful 3D web

Lately, I have come across many articles talking about opening the realm of accelerated 3D graphics to the web. While there has been many other initiatives to do such a thing in the past, it has gotten more serious lately since the big players have started to show interest in it. For example, the Khronos group (the consortium responsible for OpenGL, OpenAL, OpenGL | ES) has just recently launched a proposal to build a standardised JavaScript API for that purpose while Google just released O3D, its own JavaScript API for creating 3D applications on browsers. This is rather exciting as this seems like a definitive step towards moving away from proprietary applications to display rich graphics on the web, flash being the most ubiquitous.

However, judging by those press releases, I have a great concern over the direction a potential standard will be evolving towards. As it stands, it aims at implementing this new feature using JavaScript. While there is nothing wrong with using this language in general, using it to add a dimension to the Web does not go along with the philosophy the web is built on (REST) for reasons that will be detailed below. On top of that, there is already a standard that brings 3D to the Web in a RESTful way: X3D. For some reason, it is still in the dark as these lines are written and has not seen wide acceptance yet, but it is in my opinion the right way of doing things. Using JavaScript would relinquish X3D (or any perspective of a declarative way of describing 3D) to speciality applications because it is for now hard to work with. The fact is, people like to take the easy route, but in this case it will mean a lot more trouble down the line. To develop a bit more on this problem, this article will try to outline a few arguments for a RESTful implementation of the 3D web through a description language over an API based one and explain why a XML based solution is a good contender for an implementation.

Representations: guaranteed to work.

Mark-up languages currently in use on the web all share a simple fundamental goal: to describe the visual and semantic organization of information. HTML, for instance, describes the document tree or what relationship blocks of information (text) have with one another and what their respective purpose is with regards to the visual and semantic aspect of the data. The HTML specification also permits the description of visual features trough inline styling (b, font, h1, h2, etc.), but this usage will slowly disappear in order to give way to CSS. CSS, on the other hand, concerns itself mostly with visuals through describing both the styling and spatial representation of a document: nodes from the document tree can be moved around and styled as the designer sees fit. CSS and HTML are both different languages used for almost different purposes but they tackle two intersecting areas of the same problem space: pleasing and adapting to the human visual system.

The usage pattern of the two aforementioned languages in the context of the web fits perfectly with the REST mentality. Call an HTTP GET on a resource and it returns an HTML representation with embedded links to the CSS style sheets and scripts it uses. Then, upon reception of the documents, the browser from which the request originated will render this HTML + CSS representation and respond to user events according to the script. This request and render activity is at the core of the REST architecture and actually constitutes the bulk of the traffic on the web: get a representation and render it; representational state transfer. Trough transacting representations this way, the server cannot enforce any technical constraints with regards to what is done with the document once it is has been transferred to the requestor; the only exception being the version and type of the language. Hence, rendering representation is the client’s responsibility. The navigating can happen from a cell phone or by calling a wget on a Linux terminal; the concerned software will take care of transforming what it receives to the best of its ability. A representation is only a declaration issued by the resource on how it suggests it is best presented; if for some reason, the request originator cannot correctly render or understand the description language it just fetched, it remains possible to get a partial view and if all fails, the software can display the document itself, which happens to be human-readable. For example, with Windows computers whose ActiveX controls are disabled , web pages very often fail to display correctly and sometime are just plain unreadable. In this case, the user can just check the HTML source, from which he can infer the document layout but more importantly still get access to the information. Had the browser received a pile of vectors with several hundred lines of JavaScript code to render them instead, it is very likely that the individual could not have guessed it was actually rendered text or a teapot. This guaranteed level of service is not a feature of the Web itself but a consequence of the declarative nature of REST. Representations that are generated using scripting like JavaScript violate this principle because there is no way to know what it is without executing the script nor is there a way to tailor (to a certain extent) it for specific constraints like hardware, accessibility or internationalization; if the script fails, the user is left with nothing or very little to work with. The correct execution of scripts is their creators’ responsibility and there use as representation generators is therefore problematic because they cannot be validated and interpreted, not to mention the inherent security risks associated with their usage.

The declarative advantage.

Declarative architectures such as REST not only provide a consistent quality of service, but they also enable other entities to perform other operations than rendering the resources that compose them. A whole lot more information that has nothing to do with visuals can be inferred from the documents that describe the representation of those resources The semantic web, linking, microforms, search engines or mashups are very compelling examples of the declarative advantage. This type of interaction between resources is probably possible with scripted 3D, but not without a serious overhead in analysis and a very strict naming standard. Even there, the use of the aforementioned technologies would not integrate naturally with scripted 3D because they would have to remain within the declarative structure of the document.

The API tar pit.

JavaScript is quite different from HTML and CSS, because the way it acts on a representation has nothing to do with spatial representation: it adds interactivity. In a sense, JavaScript can be seen as the description of the interactive aspect of representation although it is not a declarative language. The programmatic nature of scripting makes JavaScript very versatile for certain tasks but it also makes matters a lot more complicated. The web would be a lot simpler without JavaScript, but it would also be completely static, just like in 1994. Scripting is a necessary evil, but it is nonetheless evil because it cannot be easily analysed and interpreted (not in the programming sense), either you do exactly what the script command says either you don’t. If the script wants to display a pop-up there is little you can do to stop it without interfering with the pages that make an honest use of this feature. Thankfully the language itself is textual and interpreted (in the programming sense), which makes it a very portable and powerful tool, but insofar as it remains true to its function: adding interactivity to representations. If it is used for any other purposes, we then run into the risk of negating the many advantages of the REST architectural style. It might not appear to be such a big deal, but if one looks at the way things are messed up and complicated in the application software world, they come to realize that using JavaScript as a full-fledged programming language is somewhat risky in the Web context, even if it remains on the last layer of a software stack (if it is not interacted with). If the 3D web is implemented using an API, then it will not be long until other APIs based on it start proliferating and what was originally a great idea will turn into an immense collection of multiply-versioned and incompatible APIs doing more or less the same thing. The browser is not supposed to be a runtime environment; it is a window on the Web whose only purpose is to act as an interpreter for humans navigating it. If we build JavaScript APIs to add 3D content to the web, we face the risk of turning it into a tar pit, even with standardization. Microsoft is notorious for not following standards; now imagine we include Nvidia and ATI in this business. 3D solutions vendors operate with different marketing techniques than in other fields; they and their customers are all about visuals, and vendors will not hesitate to break standards and to promote a new feature on their product. Naturally, that feature will only be available on hardware that supports it. The pace of the 3D market is just too quick for standardized APIs; vendors need a lot more flexibility, they need an extensible language.

XML.

A 3D environment is not that different from a webpage and can easily be described using XML. It involves many objects that all share relationships of dependence with one another; just like the document tree (the equivalent in 3D jargon is called the scene graph). Reality in fact, which 3D usually aims at approximating is no different and can be represented using a tree structure. Take for example a table with a teapot on it. If the table is moved around, the teapot will follow because its absolute position is dependant on the table’s position. The teapot’s location with regards to the table, its relative position did not change. This makes the teapot a child of the table. This example failed to account for physics for the sake of simplicity, but it shows XML based languages are perfectly fit for describing 3D spaces. As a matter of fact, the idea is not new and many languages exist for this purpose, like VRML, X3D and COLLADA just to name a few. Consequently, using such a document to convey the 3D representation of a resource stays true to the declarative nature of the Web. If a browser is not compatible with an API, it cannot just skip the unknown script lines; otherwise, the whole script will most likely fail. On the other hand, if a browser cannot interpret a tag on a 3D description document it can skips that node of the document tree without worrying whether or not it will compromise the rest of the rendering. The user will be presented with an approximate view of the representation that might very well be sufficient for what he wants to accomplish. There will be no need to specify many render paths for different hardware or rely on the JavaScript engine to do it, if a tag cannot be rendered, it is just skipped. Programmable shader pipelines are a nice technology, but they do not add very much to the functionality of a 3D environment; if a teapot is to be displayed, it does not need to be refractive for the user to figure out it is a transparent teapot. Put differently, no one should need a cutting edge GPU to see some polygons. With XML based languages, descriptions are naturally extensible so vendors are free to add their own tags without waiting for standard approval and without sacrificing the user-base that does not support this new feature; they still break the standard, but the consequences are not as grave. In the absence of 3D rendering capacities, XML always remain fairly readable and can be consulted directly, a 3D scene generated with JavaScript is, on the other hand, very difficult if not impossible to infer without execution of the script.
The advantages of using a XML language to describe spaces do not end there. If a developer wants to add physical properties to a set of objects, all he has to do is to insert the pertinent tags in the document tree describing the scene. With an API things are much more complicated. The same can apply for movement, which can also be considered an integral part of a representation. Displaying 3D this way is completely RESTful and it leaves JavaScript doing the job it does best: add user interactivity through modifying the DOM.
XML also provide a fair amount of interoperability out of the box; by mixing a spatial description language with other compatible languages, like XHTML, it becomes possible to blend many types of content together. As an example, a website could be developed to provide a small service where users can consult multiple web pages simultaneously using a cube like Linux’s Compiz or tiling like Mac OS X’s exposé. The different faces involved would contain XHTML IFrames, or for a more static display, the XHTML could be part of the document tree describing the scene as a child of the face displaying it.

Complexity.

A 3D description language is without a doubt much more complex than any other one that deals with a lesser number of dimensions. The X3D specification, for instance, is many pages long and makes a fair amount of assumptions over the reader’s proficiency with computer graphics concept, but it is nonetheless much easier to deal with than program; the syntax is self-explanatory and there is not need to deal with the complex resource management required to program efficient 3D. Many already know OpenGL and Direct3D and they surely use their present skills over learning a new description language. However, they are far from representative of the majority; for a newcomer, it is much easier learning a description language than an API. Plus, WYSIWYG tools can be developed to automate the generation of 3D, so anyone can with little effort create a 3D web page. Thanks to the ease of use of its core languages and the many authoring tools available, programmers are now far from being the main creators of content on the Web. Doing it with a JavaScript 3D API would be way too intimidating and would drive away the vast majority of users, making the 3D web inaccessible to most.

The bottom line.

Could the 3D web be implemented with an API? Certainly, computers provide us with infinite ways to do an infinite amount of things, but some ways are better than others. Since the inception of the Web, there has only been a handful of versions of its core components, and thanks to this consistency, 10 years old web browsers can probably still navigate it; the same cannot be said for a five year old GPU and current games. Programs are strict successions of operations and are not subject to interpretation; visualisation, on the other hand is everything but that. After all, we already use XML to describe 2D so why should it be different for 3D? The ease of use of the core languages of the Web has made the creation of content accessible to anyone; I would like to see the use and authoring of 3D become an integral part of it as well, not some obscure feature only gamers and the technical crowd can make use of.