Representations: guaranteed to work.
Mark-up languages currently in use on the web all share a simple fundamental goal: to describe the visual and semantic organization of information. HTML, for instance, describes the document tree or what relationship blocks of information (text) have with one another and what their respective purpose is with regards to the visual and semantic aspect of the data. The HTML specification also permits the description of visual features trough inline styling (b, font, h1, h2, etc.), but this usage will slowly disappear in order to give way to CSS. CSS, on the other hand, concerns itself mostly with visuals through describing both the styling and spatial representation of a document: nodes from the document tree can be moved around and styled as the designer sees fit. CSS and HTML are both different languages used for almost different purposes but they tackle two intersecting areas of the same problem space: pleasing and adapting to the human visual system.
The declarative advantage.
Declarative architectures such as REST not only provide a consistent quality of service, but they also enable other entities to perform other operations than rendering the resources that compose them. A whole lot more information that has nothing to do with visuals can be inferred from the documents that describe the representation of those resources The semantic web, linking, microforms, search engines or mashups are very compelling examples of the declarative advantage. This type of interaction between resources is probably possible with scripted 3D, but not without a serious overhead in analysis and a very strict naming standard. Even there, the use of the aforementioned technologies would not integrate naturally with scripted 3D because they would have to remain within the declarative structure of the document.
The API tar pit.
XML also provide a fair amount of interoperability out of the box; by mixing a spatial description language with other compatible languages, like XHTML, it becomes possible to blend many types of content together. As an example, a website could be developed to provide a small service where users can consult multiple web pages simultaneously using a cube like Linux’s Compiz or tiling like Mac OS X’s exposé. The different faces involved would contain XHTML IFrames, or for a more static display, the XHTML could be part of the document tree describing the scene as a child of the face displaying it.
The bottom line.
Could the 3D web be implemented with an API? Certainly, computers provide us with infinite ways to do an infinite amount of things, but some ways are better than others. Since the inception of the Web, there has only been a handful of versions of its core components, and thanks to this consistency, 10 years old web browsers can probably still navigate it; the same cannot be said for a five year old GPU and current games. Programs are strict successions of operations and are not subject to interpretation; visualisation, on the other hand is everything but that. After all, we already use XML to describe 2D so why should it be different for 3D? The ease of use of the core languages of the Web has made the creation of content accessible to anyone; I would like to see the use and authoring of 3D become an integral part of it as well, not some obscure feature only gamers and the technical crowd can make use of.