RFC: Potential XML.com article
- Please contribute comments on this article I am working on for xml.com.
Second Generation Web Services
In the early days of the Internet, it was common for enlightened
businesses to connect to the Internet merely by using SMTP, NTTP and FTP
clients and servers to deliver messages, text files, executables and
source code. The Internet became a more fundamental tool when businesses
started to integrate their corporate information (both public and
private) into the emerging Web framework. The Internet became popular
when it shifted from a focus on transactional protocols to a focus on
data objects and the links between them.
The technologies that characterize the early Web framework were
HTML/GIF/JPEG, HTTP and URLs. This combination of standardized formats,
a single application protocol and a single universal namespace was
incredibly powerful. Using these technologies, corporations integrated
their diverse online publishing systems into something much more
compelling than any one of them could have built.
Once organizations converged on common formats, the HTTP protocol and a
single addressing scheme, the Web became more than a set of Web sites.
It became the world's most diverse and powerful information system.
Organizations built links between their own information and other
people's. Amazing third party applications also weaved the information
together. Examples include Google, Yahoo, Babelfish and Robin Cover's
First generation Web Services are like first generation Internet
connections. They are not integrated with each other and are not
designed so that third parties can easily integrate them in a uniform
way. I posit that the next generation will be more like the integrated
Web that arose for online publishing and human/computer interactions. In
fact, I believe that second generation web services will actually build
much more heavily on the architecture that made the Web work. Look for
the holy trinity: standardized formats (XML vocabularies), a
standardized application protocol and a single URI namespace.
This next generation of Web Services will likely bear the name "REST"
Web Services. REST is the underlying architectural model of the current
Web. It stands for REpresentational State Transfer. Roy Fielding of
eBuilt invented the name in his PhD dissertation.
http://www.ebuilt.com/fielding/pubs/dissertation/top.htm. Recently, Mark
Baker of PlanetFred has been a leading advocate of this architecture.
features. It has many aspects and I would not claim to understand it in
detail. I'm going to focus on the aspects that are most interesting to
XML users and developers.
The Current Generation
SOAP was originally intended to be a cross-Internet form of DCOM or
CORBA. The name of an early SOAP-like technology was "WebBroker" -
Web-based object broker. It made perfect sense to model an
inter-application protocol on DCOM, CORBA, RMI etc. because they were
the current models for solving inter-application interoperability
These RPC protocols achieved only limited success before they were
ported to the Web. Some believe that the problem was merely that
Microsoft and the OMG supporters could not get along. I disagree. There
is a deeper issue. RPC models are great for closed-world problems. A
closed world problem is one where you know all of the users, you can
share a data model with them, and you can all communicate directly as to
your needs. Evolution is comparatively easy in such an environment: you
just tell everybody that the RPC API is going to change on such and such
a date and perhaps you have some changeover period to avoid downtime.
When you want to integrate a new system you do so by building a
On the other hand, when your user base is too large to communicate
coherently you need a different strategy. You need a pre-arranged
framework that allows for evolution on both the client and server sides.
You need to depend less on a shared, global understanding of the rights
and responsibilities of a participant. You need to put in hooks where
your users can innovate without contacting you. You need to leave in
explicit mechanisms for interoperating with systems that do not have the
same API. RPC protocols are traditionally poor at this kind of
evolution. Changing interfaces tends to be extremely difficult. I
believe that this is why no enterprise has ever successfully unified all
of their systems with an RPC protocol such as DCOM, CORBA or RMI.
Now we come to the crux of the problem: SOAP RPC is DCOM for the
There are many problems that can be solved with an RPC methodology. But
I believe that the biggest, hairiest problems will require a model that
allows for independent evolution of clients, servers and intermediaries.
It is therefore important for us to study the only distributed
applications in history to ever scale to the size of the Internet.
The archetypical scalable application
There two most massively scalable, radically interoperable, distributed
applications in the world today and they are the Web and email. What
makes these two so scalable and interoperable feature? For starters,
they both depend on standardized, extensible message formats (HTML and
MIME). They both depend on standardized, extensible application
protocols (HTTP and SMTP). But I believe that the most important thing
is that each has a global addressing scheme.
In the real estate world there is a joke that there are three things
that make a property valuable: location, location and location. The same
is true in the world of XML web services. Properly implemented, XML web
services allow you assign addresses to data objects so that they may be
located for sharing or modification.
In particular, the web's central concept is a single unifying namespace
of URIs. URIs allow the dense web of links that make the Web worth
using. URIs identify resources. Resources are conceptual objects.
Representations of them are delivered across the web in HTTP messages.
These ideas are so simple and yet they are profoundly powerful and
demonstrably successful. URIs are extremely "loosely coupled". You can
pass a URI from one "system" to another using a piece of paper and OCR!
URIs are "late bound". They do not declare what can or should be done
with the information they reference. It is because they are so radically
"loose" and "late" that they scale to the level of the Web.
Unfortunately, most of us do not think of our web services in these
terms. Rather we think of them in terms of remote procedure calls
between endpoints that represent software components. This is CORBA/DCOM
thinking. Web thinking is organized around URIs for resources.
Claim: The next generation of web services will use individual data
objects as endpoints. Software component boundaries will be invisible
An Illustrative Example
UDDI is an example of a Web Service that could be made much, much more
robust as a second generation Web Service. I'm not discussing the
philosophical issues of UDDI's role in the web services world but the
very concrete issue of how to get information into and out of it. These
arguments will apply to most of the Web Services in existence, including
stock quote services, airplane reservations systems and so forth.
UDDI has a concept of a businessEntity representing a corporation.
Businesses are identified by UUIDs. The Web-centric way to do this would
have been to identify them by URIs. The simplest way to do this would be
to make a businessEntity an XML document addressable at a URI
like"http://www.uddi.org/businessEntity/ibm.com" or perhaps
"http://www.uddi.org/getbusinessEntity?ibm.com". The difference between
these two is subtle and does not have many technical implications so
let's not worry about it.
You can think of "http://www.uddi.org/businessEntity" as a directory
with files in it or a web service pulling data from a database. A
wonderful feature of the Web is that there is no way to tell which is
true just from looking at the URI. That is "loose coupling" in action!
Let's consider the implications of using HTTP-based URIs instead of
UUIDs for business entities:
* Anybody wanting to inspect that business entity would merely point
their (XML-aware!) browser at that URI and look at the businessEntity
* Anybody wanting to reference the businessEntity (in another web
service or a document) could just use the URL.
* Anybody wanting to incorporate the referenced information into another
XML document could use an XLink, XPointer or XInclude.
* Anybody wanting a permanent copy of the record could use a command
line tool like "wget" or do a "Save As" from the browser.
* Any XSLT stylesheet could fetch the resource dynamically to combine it
with others in a transformation.
* Access to the businessEntity could be controlled using standard HTTP
authentication and access control mechanisms
* Metadata could be associated with the businessEntity using RDF
* Any client-side application (whether browser-based or not) could fetch
the data without special SOAP libraries.
* Two business entities could represent their merger by using a standard
HTTP redirect from one businessEntity to another.
* Editing and analysis tools like Excel, XmetaL, Word and EMACS could
import XML from the URL directly using HTTP. They could write back to it
* UUIDs or other forms of location-independent addresses could still be
assigned as an extra level of abstraction as demonstrated at purl.org.
The current UDDI "API" has a method called get_businessDetail. Under an
address-centric model, that method would become entirely redundant and
could thus be removed from the API. UDDI has several get_ methods that
operate on data objects such as tModels and business services. These
data objects could all be represented by logical XML documents and the
methods could be removed. Note how we have substantially simplified the
user's access to UDDI information.
Business entities are not the only things in UDDI that should be
identified by URI-addressable resources rather than SOAP APIs. In fact
all of the data in a UDDI database could be represented this way.
Summary: Resources (data objects) are like children. They need to have
names if they are to participate in society.
Now let's consider the extensibility characteristics of the REST model
versus the original SOAP RPC model. Let's say that your company has a
private UDDI registry and mine does also. You and I are business
partners. We agree to share our customer databases. The customer
databases have pointers into our UDDI registries for referring to
If our registries have little or no overlap then it makes sense for you
to maintain yours and for me to maintain mine. Rather than replicating
between them (which has serious security and maintainability
implications) I would like to just add you to the access control lists
for some records and allow you to refer to them from your customer
database and I'll do the opposite from mine.
If the customer databases use UUIDs then they have no way of knowing
whether a particular UUID should be looked up in the local database, the
partner's database or even the public UDDI In The Sky. URIs are not just
globally unique but also typically embed enough information to allow
them to be de-referenced without further context. Using URIs instead of
UUIDs, new repositories can be integrated whenever we want. In fact, if
we use URIs, the customer database could refer just as easily to
businessEntity records sitting on somebody's hard disk as in a formal
UDDI registry. The database maintainer could choose whether to allow
that or not.
Because the businessEntity documents are XML, it is relatively easy to
add elements, attributes or other namespaces. This makes the document
format extensible. It is also easy to extend the protocol by adding
specialized HTTP headers or even new HTTP methods.
Performance of web services will be an important issue. Any resource
representation retrieved from a GET-based URI can be cached. It can be
cached in a cache server in front of the server, in an intermediate
provided by an ISP, at a corporate firewall or on the client computer.
Caching is built-in to HTTP. SOAP get_businessDetail messages are not
cached by any existing technology.
As an optimization, the URI "http://www.uddi.org/businessEntity/ibm.com"
might be represented as a raw text file on a hard disk of an operating
system optimized towards serving files over HTTP. There is not and will
likely never be any server that can invoke SOAP methods as quickly as a
fast HTTP server can serve files from disk.
UDDI has other methods for working with businessEntities. One is
delete_business. HTTP already has a DELETE method. Therefore this method
would be redundant in the REST model. Instead of doing a UDDI
SOAP-RPC-specific delete you could do an HTTP delete. This would have
the benefit of being compatible with tools that know how to do HTTP
deletes like the Windows 2000 explorer and MacOS X finder. In theory,
businesses could delete portions of their own records (perhaps obsolete
branch plant addresses) by merely hitting the "delete" key.
Obviously authentication and access control is key. Microsoft should not
be able to delete their competitors (or at least should be forced to
delete them in the old fashioned way, by competing with them). HTTP
already has the authentication, authorization and encryption features
that UDDI's SOAP RPC protocol lacks. It already works.
UDDI has a save_business method. This is for uploading new businesses.
The HTTP equivalent is PUT or POST. A pleasant side effect of using HTTP
methods instead of a SOAP method is that you can do a POST from an HTML
form. So the web service can be used either from other programs or (with
a browser) by a human editor.
UDDI has a find_business method. This is no different in principle than
the search features built into every website in the world and search
engine sites in particular. That would be a form of GET. On the URL
line, the service would take a series of search parameters and return an
XML document representing the matching businessEntities (either by
reference, as URLs, or by value, as XML elements).
The Role of HTTP
You may notice a recurring theme. Everything that we want to do in this
Web Service is already supported in HTTP. The only things that we need
to innovate on are our URI structure and our XML schemas. Bingo! That
was the whole point of XML: to focus on data interchange instead of
Everything in UDDI can be represented in terms of HTTP operations on
resources. So HTTP isn't accidentally paired with URIs as one of the
central technologies of the Web. It is designed specifically as a major
part of the location-centric REST architecture.
Here's the radical idea: no matter what your problem, you can and should
think about it as a data resource manipulation problem rather than as an
API design problem. Think of your web server as this big information
repository: like a database. You are doing data manipulation operations
In UDDI I've chosen a web service that is ripe for an easy conversion to
REST philosophy but we can apply these principles to anything. What
about something like a purchase order submission? That seems more
transactional. Well purchase orders want to be named also! If you POST
or PUT a purchase order to a new URI then internal systems all over your
company can instantly refer to it no matter where they are. Using HTTP,
an arbitrary XSLT stylesheet or Perl script sitting on an employee's
desktop in the Beijing office can massage data from a purchase order
sitting on the accounting mainframe in Los Angeles. Accessing
HTTP-addressable resources is no more difficult than accessing files off
of the local file system, but it requires much less coordination than
standard file system sharing technologies.
What about a request for quote? RFQs want to be named! Once you give
them a name you can pass around the URL to your partners rather than the
text. Then your partners can build references to them using hyperlinks
from their documents and databases. Use access controls to keep out your
competitors. You can think about any business problem in this way.
Even web services with complicated work flows can be organized in a
URI-centric manner. Consider a system that creates airline reservations.
In a traditional HTML system there are a variety of pages representing
the different stages in the logical transaction. First you look up
appropriate flights. You get back a URI representing the set of
appropriate flights. Then you choose a light. You get back a URI
representing your choice. Then you decide to commit. You get back a web
page that returns reservation number. Ideally the URL for that page will
persist for a reasonable amount of time so that you can bookmark it.
An XML based web service could go through the exact same steps. Rather
than returning HTML forms at each step, the service would return XML
documents conforming to a standard airline industry vocabulary. Those
same XML documents could be used on a completely different airline
reservation site to drive exactly the same process.
Summary: Any business problem can be thought of as a data resource
manipulation problem and HTTP is a data resource manipulation protocol.
Metcalfe's Law Revisited
Metcalfe's law is that the value of a network is proportional to the
square of the number of people on the network, because each pair of
people can make a connection between them. One telephone is useless. One
billion phones cause a major telecommunications revolution - if they can
all access each other through a single global naming system.
Metcalfe's law also applies to data objects. Elements in UDDI can only
(with a few exceptions) refer to each other. They cannot refer to
objects elsewhere on the Web (for instance in other UDDI repositories).
Similarly, objects on the Web (for instance web pages) cannot refer to
the XML elements in the UDDI repository. A URL-centric solution would
unify these data domains as the phone number system unifies telephones.
Making your data universally addressable is not equivalent to making it
universally available! It is easy to hide objects by merely never
publishing their URIs. It is also easy to apply security policies to
objects. In fact, REST simplifies security greatly.
Under the SOAP RPC model, the objects that you work with are implicit
and their names are hidden in method parameters. Therefore you need to
invent a new security strategy for each and every web service. UDDI is
completely unlike .NET My Service which will likely be completely unlike
Liberty and so forth. Under REST, you can apply the four basic
permissions to each data object: GET permission, PUT permission, DELETE
permission and POST permission. You might also want to allow or disallow
GET/PUT/DELETE and POST on sub-resources. This model is exactly like the
one used for today's file systems! It is proven and it works. I know of
no security model that works in a similarly generic manner for remote
procedure call models.
In fact, security is just one form of maintainability that is simplified
by REST. Any network administrator will tell you that every level of
networking causes its own headaches. Some days IP works but DNS doesn't
(DNS server down or DNS settings misconfigured). Some days IP/DNS works
but HTTP doesn't (firewall or proxy misconfigured). If you run a web
service protocol on top of HTTP it will add its own layer of
configuration and software headaches on top of the existing ones. It
cannot be more reliable than its foundational HTTP layer. It can only
add one more layer of unreliability.
Once you have your service working, it is possible to "test" REST web
services just by looking at them in a browser. It is possible to make
simple HTML forms to test POSTs. QA departments can easily pretend to be
multiple users by changing their HTTP credentials. Standard web tools
can monitor availability. In essence, testing REST services is often
easy if you already know how to test web sites. On the other hand, every
SOAP RPC service will have its own security model, its own addressing
model, an implicit data model and its own set of methods. Of these four
things, only the security model is even currently a candidate for
standardization. Testing such a system is much more challenging.
The Rest of the Story
This brief introduction can only whet your appetite to the theory and
practice of REST-based web services. In an upcoming article, I will:
* describe in more detail how any web service can be transformed into a
* show how the REST philosophy and the XML philosophy are highly
* show an example of a successful, public, widely used web service that
uses this model today.
* discuss the role of SOAP in these sorts of web services.
* discuss reliability, coordination, transactions, encryption, firewalls
If you would like to discuss these issue in the meantime, please
consider contributing to the rest-wiki
(http://internet.conveyor.com/RESTwiki/moin.cgi/FrontPage) and the REST
mailing list (http://groups.yahoo.com/group/rest-discuss/).