A RESTful 3D web

Lately, I have come across many articles talking about opening the realm of accelerated 3D graphics to the web. While there has been many other initiatives to do such a thing in the past, it has gotten more serious lately since the big players have started to show interest in it. For example, the Khronos group (the consortium responsible for OpenGL, OpenAL, OpenGL | ES) has just recently launched a proposal to build a standardised JavaScript API for that purpose while Google just released O3D, its own JavaScript API for creating 3D applications on browsers. This is rather exciting as this seems like a definitive step towards moving away from proprietary applications to display rich graphics on the web, flash being the most ubiquitous.

However, judging by those press releases, I have a great concern over the direction a potential standard will be evolving towards. As it stands, it aims at implementing this new feature using JavaScript. While there is nothing wrong with using this language in general, using it to add a dimension to the Web does not go along with the philosophy the web is built on (REST) for reasons that will be detailed below. On top of that, there is already a standard that brings 3D to the Web in a RESTful way: X3D. For some reason, it is still in the dark as these lines are written and has not seen wide acceptance yet, but it is in my opinion the right way of doing things. Using JavaScript would relinquish X3D (or any perspective of a declarative way of describing 3D) to speciality applications because it is for now hard to work with. The fact is, people like to take the easy route, but in this case it will mean a lot more trouble down the line. To develop a bit more on this problem, this article will try to outline a few arguments for a RESTful implementation of the 3D web through a description language over an API based one and explain why a XML based solution is a good contender for an implementation.

Representations: guaranteed to work.

Mark-up languages currently in use on the web all share a simple fundamental goal: to describe the visual and semantic organization of information. HTML, for instance, describes the document tree or what relationship blocks of information (text) have with one another and what their respective purpose is with regards to the visual and semantic aspect of the data. The HTML specification also permits the description of visual features trough inline styling (b, font, h1, h2, etc.), but this usage will slowly disappear in order to give way to CSS. CSS, on the other hand, concerns itself mostly with visuals through describing both the styling and spatial representation of a document: nodes from the document tree can be moved around and styled as the designer sees fit. CSS and HTML are both different languages used for almost different purposes but they tackle two intersecting areas of the same problem space: pleasing and adapting to the human visual system.

The usage pattern of the two aforementioned languages in the context of the web fits perfectly with the REST mentality. Call an HTTP GET on a resource and it returns an HTML representation with embedded links to the CSS style sheets and scripts it uses. Then, upon reception of the documents, the browser from which the request originated will render this HTML + CSS representation and respond to user events according to the script. This request and render activity is at the core of the REST architecture and actually constitutes the bulk of the traffic on the web: get a representation and render it; representational state transfer. Trough transacting representations this way, the server cannot enforce any technical constraints with regards to what is done with the document once it is has been transferred to the requestor; the only exception being the version and type of the language. Hence, rendering representation is the client’s responsibility. The navigating can happen from a cell phone or by calling a wget on a Linux terminal; the concerned software will take care of transforming what it receives to the best of its ability. A representation is only a declaration issued by the resource on how it suggests it is best presented; if for some reason, the request originator cannot correctly render or understand the description language it just fetched, it remains possible to get a partial view and if all fails, the software can display the document itself, which happens to be human-readable. For example, with Windows computers whose ActiveX controls are disabled , web pages very often fail to display correctly and sometime are just plain unreadable. In this case, the user can just check the HTML source, from which he can infer the document layout but more importantly still get access to the information. Had the browser received a pile of vectors with several hundred lines of JavaScript code to render them instead, it is very likely that the individual could not have guessed it was actually rendered text or a teapot. This guaranteed level of service is not a feature of the Web itself but a consequence of the declarative nature of REST. Representations that are generated using scripting like JavaScript violate this principle because there is no way to know what it is without executing the script nor is there a way to tailor (to a certain extent) it for specific constraints like hardware, accessibility or internationalization; if the script fails, the user is left with nothing or very little to work with. The correct execution of scripts is their creators’ responsibility and there use as representation generators is therefore problematic because they cannot be validated and interpreted, not to mention the inherent security risks associated with their usage.

The declarative advantage.

Declarative architectures such as REST not only provide a consistent quality of service, but they also enable other entities to perform other operations than rendering the resources that compose them. A whole lot more information that has nothing to do with visuals can be inferred from the documents that describe the representation of those resources The semantic web, linking, microforms, search engines or mashups are very compelling examples of the declarative advantage. This type of interaction between resources is probably possible with scripted 3D, but not without a serious overhead in analysis and a very strict naming standard. Even there, the use of the aforementioned technologies would not integrate naturally with scripted 3D because they would have to remain within the declarative structure of the document.

The API tar pit.

JavaScript is quite different from HTML and CSS, because the way it acts on a representation has nothing to do with spatial representation: it adds interactivity. In a sense, JavaScript can be seen as the description of the interactive aspect of representation although it is not a declarative language. The programmatic nature of scripting makes JavaScript very versatile for certain tasks but it also makes matters a lot more complicated. The web would be a lot simpler without JavaScript, but it would also be completely static, just like in 1994. Scripting is a necessary evil, but it is nonetheless evil because it cannot be easily analysed and interpreted (not in the programming sense), either you do exactly what the script command says either you don’t. If the script wants to display a pop-up there is little you can do to stop it without interfering with the pages that make an honest use of this feature. Thankfully the language itself is textual and interpreted (in the programming sense), which makes it a very portable and powerful tool, but insofar as it remains true to its function: adding interactivity to representations. If it is used for any other purposes, we then run into the risk of negating the many advantages of the REST architectural style. It might not appear to be such a big deal, but if one looks at the way things are messed up and complicated in the application software world, they come to realize that using JavaScript as a full-fledged programming language is somewhat risky in the Web context, even if it remains on the last layer of a software stack (if it is not interacted with). If the 3D web is implemented using an API, then it will not be long until other APIs based on it start proliferating and what was originally a great idea will turn into an immense collection of multiply-versioned and incompatible APIs doing more or less the same thing. The browser is not supposed to be a runtime environment; it is a window on the Web whose only purpose is to act as an interpreter for humans navigating it. If we build JavaScript APIs to add 3D content to the web, we face the risk of turning it into a tar pit, even with standardization. Microsoft is notorious for not following standards; now imagine we include Nvidia and ATI in this business. 3D solutions vendors operate with different marketing techniques than in other fields; they and their customers are all about visuals, and vendors will not hesitate to break standards and to promote a new feature on their product. Naturally, that feature will only be available on hardware that supports it. The pace of the 3D market is just too quick for standardized APIs; vendors need a lot more flexibility, they need an extensible language.

XML.

A 3D environment is not that different from a webpage and can easily be described using XML. It involves many objects that all share relationships of dependence with one another; just like the document tree (the equivalent in 3D jargon is called the scene graph). Reality in fact, which 3D usually aims at approximating is no different and can be represented using a tree structure. Take for example a table with a teapot on it. If the table is moved around, the teapot will follow because its absolute position is dependant on the table’s position. The teapot’s location with regards to the table, its relative position did not change. This makes the teapot a child of the table. This example failed to account for physics for the sake of simplicity, but it shows XML based languages are perfectly fit for describing 3D spaces. As a matter of fact, the idea is not new and many languages exist for this purpose, like VRML, X3D and COLLADA just to name a few. Consequently, using such a document to convey the 3D representation of a resource stays true to the declarative nature of the Web. If a browser is not compatible with an API, it cannot just skip the unknown script lines; otherwise, the whole script will most likely fail. On the other hand, if a browser cannot interpret a tag on a 3D description document it can skips that node of the document tree without worrying whether or not it will compromise the rest of the rendering. The user will be presented with an approximate view of the representation that might very well be sufficient for what he wants to accomplish. There will be no need to specify many render paths for different hardware or rely on the JavaScript engine to do it, if a tag cannot be rendered, it is just skipped. Programmable shader pipelines are a nice technology, but they do not add very much to the functionality of a 3D environment; if a teapot is to be displayed, it does not need to be refractive for the user to figure out it is a transparent teapot. Put differently, no one should need a cutting edge GPU to see some polygons. With XML based languages, descriptions are naturally extensible so vendors are free to add their own tags without waiting for standard approval and without sacrificing the user-base that does not support this new feature; they still break the standard, but the consequences are not as grave. In the absence of 3D rendering capacities, XML always remain fairly readable and can be consulted directly, a 3D scene generated with JavaScript is, on the other hand, very difficult if not impossible to infer without execution of the script.
The advantages of using a XML language to describe spaces do not end there. If a developer wants to add physical properties to a set of objects, all he has to do is to insert the pertinent tags in the document tree describing the scene. With an API things are much more complicated. The same can apply for movement, which can also be considered an integral part of a representation. Displaying 3D this way is completely RESTful and it leaves JavaScript doing the job it does best: add user interactivity through modifying the DOM.
XML also provide a fair amount of interoperability out of the box; by mixing a spatial description language with other compatible languages, like XHTML, it becomes possible to blend many types of content together. As an example, a website could be developed to provide a small service where users can consult multiple web pages simultaneously using a cube like Linux’s Compiz or tiling like Mac OS X’s exposé. The different faces involved would contain XHTML IFrames, or for a more static display, the XHTML could be part of the document tree describing the scene as a child of the face displaying it.

Complexity.

A 3D description language is without a doubt much more complex than any other one that deals with a lesser number of dimensions. The X3D specification, for instance, is many pages long and makes a fair amount of assumptions over the reader’s proficiency with computer graphics concept, but it is nonetheless much easier to deal with than program; the syntax is self-explanatory and there is not need to deal with the complex resource management required to program efficient 3D. Many already know OpenGL and Direct3D and they surely use their present skills over learning a new description language. However, they are far from representative of the majority; for a newcomer, it is much easier learning a description language than an API. Plus, WYSIWYG tools can be developed to automate the generation of 3D, so anyone can with little effort create a 3D web page. Thanks to the ease of use of its core languages and the many authoring tools available, programmers are now far from being the main creators of content on the Web. Doing it with a JavaScript 3D API would be way too intimidating and would drive away the vast majority of users, making the 3D web inaccessible to most.

The bottom line.

Could the 3D web be implemented with an API? Certainly, computers provide us with infinite ways to do an infinite amount of things, but some ways are better than others. Since the inception of the Web, there has only been a handful of versions of its core components, and thanks to this consistency, 10 years old web browsers can probably still navigate it; the same cannot be said for a five year old GPU and current games. Programs are strict successions of operations and are not subject to interpretation; visualisation, on the other hand is everything but that. After all, we already use XML to describe 2D so why should it be different for 3D? The ease of use of the core languages of the Web has made the creation of content accessible to anyone; I would like to see the use and authoring of 3D become an integral part of it as well, not some obscure feature only gamers and the technical crowd can make use of.

Cool things do not happen by accident

No they don’t, just the shitty ones do because you rarely go out looking for bad things to happen to you.

If you want something cool to happen (I am not talking of a car, promotion or fame here but about creative ambitions), you have to set the optimal conditions. It will most likely not trigger it automatically, but at least you will up the chances of it happening by a fair amount. Be at the right places, talk with the right people, get youself known by those that might be interested in what you do but above anything stay focused and devote your energies to it. You will most likely not suceed at first, but you will most often have a foot in the door. The biggest effort is removing yourself from that semi-comfortable materialistic life, the rest is easy because at that point, it starts being cool already.

If you want to dance profesionnally (not that I do) then staying in an office just for the sake of financial comfort is most likely not going to cut it. I do not advise on quitting a well paying job to pursue any dream (some are worth it though), but what I would do is retargeting all your motivation and energy towards your ambition; in other words, quit the job emotionally. Stop caring about promotions, about fame, do things you like and for youself and stop worrying about money, you only need so much to be comfortable as creativity is mostly free.

This guy found a job refurbishing an old particle accelerator. If you read his blog a bit, you will find that it did not happen by accident.

I just encounter too many people stating have ambitions of being this or doing that. Then, when I ask them if they are doing anything to make it happen, they just reply they are too tired, don’t want to risk too much over it, or that they are just too lazy. If you are too lazy then it is not an ambition, if you are too tired then you are not investing your time an energy in at the righ place and you are afraid of risk, the only thing you are actually putting at stake is money, which, when all things come to an end, is only worth somthing to your descendants.

Take it with a grain of salt, nothing is absolute and there are as many ways to achieve as many things as there are ways not to achieve anything. I just think some have a better chance at working than others; either way.

The Elements framework

This is another project I have fun on while I get a bit of spare time from having no life. It has actually been in the works for quite a while but I have just recently started the actual programming (in C++). It is called the Elements Framework. Despite being just another tool for myself, I have had a few friends who expressed interest into knowing a bit of details about it. Since I cannot precisely really formulate what it is myself, I though it would be a worthy exercise to write a short description of the framework. Now you might be asking yourself, how the hell can you not know what you have been working on for the past year? Well the answer is: when I think, I do not necessarily use words, I use ideas, I use examples, I use chunks of code, I use drugs, but since I do not work with verbal constructs to begin with, explaining what I do is somewhat tricky because I have to compile a bunch of concepts into correctly formed sentences. Something I can normally do just fine with everyday objects and concepts (I am not speech impaired), but when it concerns software architecture its philosophy, it gets hard, especially if your interlocutor has no clue as to what HTTP really is.

The idea.

The Elements Framework has one simple goal: apply the philosophy of the web to traditional software. Sounds far fetched? Let me explain. The web is to me one if not the greatest of systems created by men. It has not failed once since its inception, it has seen worldwide acceptance within only a few years, it’s easy to use and it’s actually shaping the future of mankind at a rate never seen before. In fact, it is so ubiquitous and important that it is called “The World Wide Web”. Do you know any other system that got popular enough to get the “World Wide” prefix? I don’t. But above all, the workings of the Web are blindingly simple; it is built upon a principle called REST and relies on only two (RESTful) protocols to function: HTTP and DNS (Maybe WAKA soon…)

Before we move on, let me define one crucial thing: the Web and Internet, even if the two terms are often interchanged and refer to the same entity in popular speech, are not the same systems. The Internet is TCP, UDP, IP, ARP, MAC and a bunch of other protocols. In other words, the Internet is everything below layer 5 of the TCP/IP protocol suite. The Web is just HTTP and DNS. Although HTTP is most of the time tunnelled through the Internet, it does not require it to function. HTTP only assumes an error-free transmission, and TCP is the provider of this service in the case of the Internet.

Now, what the Elements Framework will try to achieve is making software systems that run on it behave just like the Web as well as being part of it. Let me develop on this:

  • Out of the box, you will be able to “browse” your software system just like you browse a website. In fact, that software system will look exactly like a website. This means that every component will provide its own HTML (of any other markup language) page for users to check and interact with. Through this, it will be possible to build rich user interfaces through the use of javascript, CSS and whatever other markup and styling language. This also means that every web browser will be able to display those interfaces, be it on a *nix, Windows, Cell-phone, etc. True to the REST philosophy, the server provides a representation of the resource and the browser decides how it displays it. Humans will be able to inspect systems and get a sense of how they work just like they can do with websites since the architecture of the system will be an exact match of its apparent organization as seen from the web.
  • An Elements system and its components will be naturally distributable, just like the Web. So now, the keyboard, the microphone, the screen, the XML parser and the neural network can be anywhere there is connectivity. If one becomes unavailable, ask Google for another…
  • Interaction will solely happen through HTTP. Since the interface will be standard, it will be much easier to connect heterogenic components together, just like Mashups. Moreover, every component of an Elements system will be able to expose its interface to the rest of the world if it wants to.
  • The Elements framework wants the web to be a “Web of things” where every single piece of electronic equipment can be connected to it if needed. To achieve that, it will be made as lean as possible in order to function on the smallest microcontrollers. Imagine every electrical switch and outlet in your home being accessible through your browser? Every home would have its own web, shielded from the WWW of course.

How it works.

The basic construction block of an Elements framework is, as you have guessed it, an Element. The active part of an Element is called a Resource. Simply put, a Resource is a lean HTTP server. It takes HTTP messages, checks the URL if the message was for itself, or forwards it to its children if it is directed to them. If the message was its, it processes it. Take for example three Resources, res1, res2 and res3 with res2 and res3 being children of res1. If you want to direct a message to res2, the message’s address would be /res1/res2/ . res1, just by receiving the message, would know it was the intended recipient (we trust DNS and TCP to get stuff where it is suppose to go), so it would look for the url of the message for the next recipient: res2. Turns out res1 has a children named res2, so res1 would then happily pass the message to res2 who, upon analysis, would figure out it is the final recipient because there is nobody after the last “/”. res2 would then process the content of the message and take appropriate action, like sending a response to the originator if the message was a request or turn on a light fixture in your home.

The other elemental building block of an Elements framework is the Authority. Authorities implement concurrency in the Elements framework and are responsible for a whole lot of other things (hence their authoritative nature). They also have everything a Resource has because they inherit from it so for this short description, there is no need to explain more about authorities.

As stated earlier, an Elements system is just like a website. By assembling a bunch of Resources together and by providing them with the amount of specialization required for them to have a bit of intelligence, we can actually build more complex systems that behave like websites. For example, let’s look at what the organisation of a calculator would look like. Mind you this is a rather simplistic because programs as simple as calculators rarely get componentized.

  • Calc/
    • Parser.Calc/
    • Add.Calc/
      • /Left
      • /Right
      • /Result
    • Mul.Calc/
      • /Left
      • /Right
      • /Result

When the user directs his browser to http://Calc/ (with an HTTP GET request), he is presented with a webpage (the representation of http://Calc/) containing a simple form: a text box with on its right a button that reads “=” and some instruction text (or images, publicity, whatever you could put on the WWW). Through reading the instructions, the user promptly figures out how this calculator works and proceeds with doing some math. He types a mathematical expression (5 + 7, 125 * 8, etc.) in the text box and then presses “=” to expecting the result to appear on the right of the button. Upon clicking on the “=”, quite a lot will happen in the Elements framework.

First, the user’s browser will send an HTTP POST request to http://Calc/ . When the root Element (whose name is Calc) receives the request, he analyses the header and deducts that he is the intended recipient and that request is actually a user that needs to do some intense computations. Calc is somewhat lazy; his job is just to display the calculator, not to do the calculations. He then proceeds to build another HTTP POST request and sends the expression he received to http://Parser.Calc/.

Parser receives the request, figures it is for him and then proceeds to decompose the expression. “3 + 6” is two values and one operand, so Parser now packages two HTTP POST requests. He sends one containing 3 to http://Add.Calc/Left and 6 to http://Add.Calc/Right, waits for the responses and then sends a HTTP GET to http://Add.Calc/Result/ who then returns 9 as a response. Parser then packages the 9 in an HTTP response and sends it to Calc who then re-renders a representation of itself, but this time with the result of the computation.

Neat, it took a massive amount of HTTP message producing, passing and parsing for something that could have been done in a single assembly instruction, but it was just an example. However, the Resource Calc could have been instructed to use add.computations.co.nz (In New-Zealand) as an adder, it would have made absolutely no difference. Also, imagine that Parser is for some reason unavailable and the user of the calculator does not know where to find a working one. Other Parsers on the web, being web pages, can be indexed by search engines; all the user needs to do now is search the web for a working parser, instruct calc to use it and he is up and running again. This, of course, could have all been automated in order to get one hell of a resilient calculator.

Should the user becomes curious about the inner workings of the calculator, he is always free to go check out the Resources that compose it with its browser, they are all HTTP servers after all so they have to. An HTTP GET on http://Add.Calc/ could for example return a page that explains a bit about what this adding Resource does, display an overview of its state (does it need an oil change anytime soon?) and give links to Right, Left and Result. Or it could be that the calculator is in fact configurable, and at http://Mul.Calc you can find options to return different format of numbers.

That concludes the example. If you browse a bit, you will realize that this kind of thing is already possible with current technology, it is in fact done every day through the use of Web APIs. Using the web in such a way is what the Web 2.0 is all about and is only a natural consequence of RESTful architectures. If you want something more like this framework, the closest thing I have found is Java Servlets. I have actually never used this technology myself but I imagine that Servlets will not run on a microcontroller with 1kB of RAM; at least the java virtual machine wont. What I am trying to achieve is to make the leanest possible HTTP server so it can be used on any platform; the smallest computers could only have one simple Element while a modern day computer could host tens of thousands. After that, developing more complex systems is just a matter of assembling Elements together. The URLs topology you use to browse an Elements system is an exact representation of the architecture of the system. If the components of a system entertain a relationship that is somewhat similar to tree, where some are responsible for others then I think that system can be made to use the Elements framework.

So if you want a real-life running example of an “almost” Elements system, the Tree Framework (If you develop programs you are cool, but if you develop frameworks, you have attained a higher degree of coolness) on which this website runs is one.

The Elements Framework is still in the forge. When I get a working version, I will make it available as open source software. There are many other projects in the pipeline that will use the Elements framework when it is done. Below are a few examples, maybe they will give more down-to-earth examples of what you can do if you apply REST to classic software making:

  • A home automation system, where every outlet, switch and thermostat is accessible from the web and controlled by an Elements system sitting on a server in my house.
  • A web-based oscilloscope and logic analyser.
  • A web content management system based on the Elements Framework.
  • An alarm system.

You can follow developments right here, as children posts of this project post entry. Until then, go create something!

Impressions of v0.1a

I have been using Tree v0.1a, the initial release, for two good months now and I must say I am overly satisfied with the quality of the product. It has not suffered a single failure and is performing up to my expectations.

I know I am not in position to give constructive comments about my own creation, but I think I am well past the “This thing is the next Windows” point. Even if the framework is seriously alpha, I am not the only person who can use it (to a certain extent) and it has not failed miserably whenever I wanted to show it to someone.

Finally some software of my own making that did not get the boot halfway through development and that everyone can use(everyone visiting my site that is). Yes, defects and enhancements are piling up in Trac, but thats a side effect of improvement.

Giving a meaning to port scans

I am a strong advocate of judgment being the best anti-virus, anti-trojan, anti-worm and anti-etc, but when it comes to protection against intrusion well, judgement cannot be of any help, so I put my trust in firewalls. However, I have lately seen many of my 8-thoushandish (8000, 8001, 8002) ports that I use for development http servers being taken by unknown programs.

In order to identify the culprits, I portscanned all my interfaces (you can do that with nmap) in order to find the associated protocols in the hope that this would give me hints on what processes are to blame. Turns out portscans just give you the name of the protocol that is registed with that port through IANA, which gives you no guaranty the process bound to this port is using that particular protocol (I momentarily had forgotten that TCP and UDP do not care for what they transport). For instance, 8000 is reserved for irdmi, which seems to be lost technology as no ones has clue on what the hell its for.

The commands that are actually needed to find what process owns what port under *nix OSes (beside netstat, whose output I find painful to read) is lsof (list open files):

sudo lsof -i | grep “number of port”

or

sudo lsof -i | grep “protocol name”

If the port has an associated protocol, lsof will use the name of the protocol instead of the number. The output you get, when not piping to grep of course, is a list of every file (ports, sockets, ttys,, RS-232, etc are considered files by *nix OSes) open on your system with the number and name of the process that owns it.

So in the end, 8000 and 8002 belonged to eclipse while 8001 was Camino’s.