Loading ...
Sorry, an error occurred while loading the content.

RFC: Potential XML.com article

Expand Messages
  • Paul Prescod
    Please contribute comments on this article I am working on for xml.com. ============== Second Generation Web Services In the early days of the Internet, it was
    Message 1 of 1 , Jan 15, 2002
    • 0 Attachment
      Please contribute comments on this article I am working on for xml.com.
      Second Generation Web Services

      In the early days of the Internet, it was common for enlightened
      businesses to connect to the Internet merely by using SMTP, NTTP and FTP
      clients and servers to deliver messages, text files, executables and
      source code. The Internet became a more fundamental tool when businesses
      started to integrate their corporate information (both public and
      private) into the emerging Web framework. The Internet became popular
      when it shifted from a focus on transactional protocols to a focus on
      data objects and the links between them.

      The technologies that characterize the early Web framework were
      HTML/GIF/JPEG, HTTP and URLs. This combination of standardized formats,
      a single application protocol and a single universal namespace was
      incredibly powerful. Using these technologies, corporations integrated
      their diverse online publishing systems into something much more
      compelling than any one of them could have built.

      Once organizations converged on common formats, the HTTP protocol and a
      single addressing scheme, the Web became more than a set of Web sites.
      It became the world's most diverse and powerful information system.
      Organizations built links between their own information and other
      people's. Amazing third party applications also weaved the information
      together. Examples include Google, Yahoo, Babelfish and Robin Cover's
      XML citations.

      First generation Web Services are like first generation Internet
      connections. They are not integrated with each other and are not
      designed so that third parties can easily integrate them in a uniform
      way. I posit that the next generation will be more like the integrated
      Web that arose for online publishing and human/computer interactions. In
      fact, I believe that second generation web services will actually build
      much more heavily on the architecture that made the Web work. Look for
      the holy trinity: standardized formats (XML vocabularies), a
      standardized application protocol and a single URI namespace.

      This next generation of Web Services will likely bear the name "REST"
      Web Services. REST is the underlying architectural model of the current
      Web. It stands for REpresentational State Transfer. Roy Fielding of
      eBuilt invented the name in his PhD dissertation.
      http://www.ebuilt.com/fielding/pubs/dissertation/top.htm. Recently, Mark
      Baker of PlanetFred has been a leading advocate of this architecture.

      REST details why the Web has URIs, HTTP, HTML, JavaScript and many other
      features. It has many aspects and I would not claim to understand it in
      detail. I'm going to focus on the aspects that are most interesting to
      XML users and developers.

      The Current Generation

      SOAP was originally intended to be a cross-Internet form of DCOM or
      CORBA. The name of an early SOAP-like technology was "WebBroker" -
      Web-based object broker. It made perfect sense to model an
      inter-application protocol on DCOM, CORBA, RMI etc. because they were
      the current models for solving inter-application interoperability

      These RPC protocols achieved only limited success before they were
      ported to the Web. Some believe that the problem was merely that
      Microsoft and the OMG supporters could not get along. I disagree. There
      is a deeper issue. RPC models are great for closed-world problems. A
      closed world problem is one where you know all of the users, you can
      share a data model with them, and you can all communicate directly as to
      your needs. Evolution is comparatively easy in such an environment: you
      just tell everybody that the RPC API is going to change on such and such
      a date and perhaps you have some changeover period to avoid downtime.
      When you want to integrate a new system you do so by building a
      point-to-point integration.

      On the other hand, when your user base is too large to communicate
      coherently you need a different strategy. You need a pre-arranged
      framework that allows for evolution on both the client and server sides.
      You need to depend less on a shared, global understanding of the rights
      and responsibilities of a participant. You need to put in hooks where
      your users can innovate without contacting you. You need to leave in
      explicit mechanisms for interoperating with systems that do not have the
      same API. RPC protocols are traditionally poor at this kind of
      evolution. Changing interfaces tends to be extremely difficult. I
      believe that this is why no enterprise has ever successfully unified all
      of their systems with an RPC protocol such as DCOM, CORBA or RMI.

      Now we come to the crux of the problem: SOAP RPC is DCOM for the
      There are many problems that can be solved with an RPC methodology. But
      I believe that the biggest, hairiest problems will require a model that
      allows for independent evolution of clients, servers and intermediaries.
      It is therefore important for us to study the only distributed
      applications in history to ever scale to the size of the Internet.

      The archetypical scalable application

      There two most massively scalable, radically interoperable, distributed
      applications in the world today and they are the Web and email. What
      makes these two so scalable and interoperable feature? For starters,
      they both depend on standardized, extensible message formats (HTML and
      MIME). They both depend on standardized, extensible application
      protocols (HTTP and SMTP). But I believe that the most important thing
      is that each has a global addressing scheme.

      In the real estate world there is a joke that there are three things
      that make a property valuable: location, location and location. The same
      is true in the world of XML web services. Properly implemented, XML web
      services allow you assign addresses to data objects so that they may be
      located for sharing or modification.

      In particular, the web's central concept is a single unifying namespace
      of URIs. URIs allow the dense web of links that make the Web worth
      using. URIs identify resources. Resources are conceptual objects.
      Representations of them are delivered across the web in HTTP messages.
      These ideas are so simple and yet they are profoundly powerful and
      demonstrably successful. URIs are extremely "loosely coupled". You can
      pass a URI from one "system" to another using a piece of paper and OCR!
      URIs are "late bound". They do not declare what can or should be done
      with the information they reference. It is because they are so radically
      "loose" and "late" that they scale to the level of the Web.

      Unfortunately, most of us do not think of our web services in these
      terms. Rather we think of them in terms of remote procedure calls
      between endpoints that represent software components. This is CORBA/DCOM
      thinking. Web thinking is organized around URIs for resources.

      Claim: The next generation of web services will use individual data
      objects as endpoints. Software component boundaries will be invisible
      and irrelevant.

      An Illustrative Example

      UDDI is an example of a Web Service that could be made much, much more
      robust as a second generation Web Service. I'm not discussing the
      philosophical issues of UDDI's role in the web services world but the
      very concrete issue of how to get information into and out of it. These
      arguments will apply to most of the Web Services in existence, including
      stock quote services, airplane reservations systems and so forth.

      UDDI has a concept of a businessEntity representing a corporation.
      Businesses are identified by UUIDs. The Web-centric way to do this would
      have been to identify them by URIs. The simplest way to do this would be
      to make a businessEntity an XML document addressable at a URI
      like"http://www.uddi.org/businessEntity/ibm.com" or perhaps
      "http://www.uddi.org/getbusinessEntity?ibm.com". The difference between
      these two is subtle and does not have many technical implications so
      let's not worry about it.

      You can think of "http://www.uddi.org/businessEntity" as a directory
      with files in it or a web service pulling data from a database. A
      wonderful feature of the Web is that there is no way to tell which is
      true just from looking at the URI. That is "loose coupling" in action!

      Let's consider the implications of using HTTP-based URIs instead of
      UUIDs for business entities:

      * Anybody wanting to inspect that business entity would merely point
      their (XML-aware!) browser at that URI and look at the businessEntity

      * Anybody wanting to reference the businessEntity (in another web
      service or a document) could just use the URL.

      * Anybody wanting to incorporate the referenced information into another
      XML document could use an XLink, XPointer or XInclude.

      * Anybody wanting a permanent copy of the record could use a command
      line tool like "wget" or do a "Save As" from the browser.

      * Any XSLT stylesheet could fetch the resource dynamically to combine it
      with others in a transformation.

      * Access to the businessEntity could be controlled using standard HTTP
      authentication and access control mechanisms

      * Metadata could be associated with the businessEntity using RDF

      * Any client-side application (whether browser-based or not) could fetch
      the data without special SOAP libraries.

      * Two business entities could represent their merger by using a standard
      HTTP redirect from one businessEntity to another.

      * Editing and analysis tools like Excel, XmetaL, Word and EMACS could
      import XML from the URL directly using HTTP. They could write back to it
      using WebDAV.

      * UUIDs or other forms of location-independent addresses could still be
      assigned as an extra level of abstraction as demonstrated at purl.org.

      The current UDDI "API" has a method called get_businessDetail. Under an
      address-centric model, that method would become entirely redundant and
      could thus be removed from the API. UDDI has several get_ methods that
      operate on data objects such as tModels and business services. These
      data objects could all be represented by logical XML documents and the
      methods could be removed. Note how we have substantially simplified the
      user's access to UDDI information.

      Business entities are not the only things in UDDI that should be
      identified by URI-addressable resources rather than SOAP APIs. In fact
      all of the data in a UDDI database could be represented this way.

      Summary: Resources (data objects) are like children. They need to have
      names if they are to participate in society.


      Now let's consider the extensibility characteristics of the REST model
      versus the original SOAP RPC model. Let's say that your company has a
      private UDDI registry and mine does also. You and I are business
      partners. We agree to share our customer databases. The customer
      databases have pointers into our UDDI registries for referring to

      If our registries have little or no overlap then it makes sense for you
      to maintain yours and for me to maintain mine. Rather than replicating
      between them (which has serious security and maintainability
      implications) I would like to just add you to the access control lists
      for some records and allow you to refer to them from your customer
      database and I'll do the opposite from mine.

      If the customer databases use UUIDs then they have no way of knowing
      whether a particular UUID should be looked up in the local database, the
      partner's database or even the public UDDI In The Sky. URIs are not just
      globally unique but also typically embed enough information to allow
      them to be de-referenced without further context. Using URIs instead of
      UUIDs, new repositories can be integrated whenever we want. In fact, if
      we use URIs, the customer database could refer just as easily to
      businessEntity records sitting on somebody's hard disk as in a formal
      UDDI registry. The database maintainer could choose whether to allow
      that or not.

      Because the businessEntity documents are XML, it is relatively easy to
      add elements, attributes or other namespaces. This makes the document
      format extensible. It is also easy to extend the protocol by adding
      specialized HTTP headers or even new HTTP methods.


      Performance of web services will be an important issue. Any resource
      representation retrieved from a GET-based URI can be cached. It can be
      cached in a cache server in front of the server, in an intermediate
      provided by an ISP, at a corporate firewall or on the client computer.
      Caching is built-in to HTTP. SOAP get_businessDetail messages are not
      cached by any existing technology.

      As an optimization, the URI "http://www.uddi.org/businessEntity/ibm.com"
      might be represented as a raw text file on a hard disk of an operating
      system optimized towards serving files over HTTP. There is not and will
      likely never be any server that can invoke SOAP methods as quickly as a
      fast HTTP server can serve files from disk.

      Other methods

      UDDI has other methods for working with businessEntities. One is
      delete_business. HTTP already has a DELETE method. Therefore this method
      would be redundant in the REST model. Instead of doing a UDDI
      SOAP-RPC-specific delete you could do an HTTP delete. This would have
      the benefit of being compatible with tools that know how to do HTTP
      deletes like the Windows 2000 explorer and MacOS X finder. In theory,
      businesses could delete portions of their own records (perhaps obsolete
      branch plant addresses) by merely hitting the "delete" key.

      Obviously authentication and access control is key. Microsoft should not
      be able to delete their competitors (or at least should be forced to
      delete them in the old fashioned way, by competing with them). HTTP
      already has the authentication, authorization and encryption features
      that UDDI's SOAP RPC protocol lacks. It already works.

      UDDI has a save_business method. This is for uploading new businesses.
      The HTTP equivalent is PUT or POST. A pleasant side effect of using HTTP
      methods instead of a SOAP method is that you can do a POST from an HTML
      form. So the web service can be used either from other programs or (with
      a browser) by a human editor.

      UDDI has a find_business method. This is no different in principle than
      the search features built into every website in the world and search
      engine sites in particular. That would be a form of GET. On the URL
      line, the service would take a series of search parameters and return an
      XML document representing the matching businessEntities (either by
      reference, as URLs, or by value, as XML elements).

      The Role of HTTP

      You may notice a recurring theme. Everything that we want to do in this
      Web Service is already supported in HTTP. The only things that we need
      to innovate on are our URI structure and our XML schemas. Bingo! That
      was the whole point of XML: to focus on data interchange instead of
      software components!

      Everything in UDDI can be represented in terms of HTTP operations on
      resources. So HTTP isn't accidentally paired with URIs as one of the
      central technologies of the Web. It is designed specifically as a major
      part of the location-centric REST architecture.

      Here's the radical idea: no matter what your problem, you can and should
      think about it as a data resource manipulation problem rather than as an
      API design problem. Think of your web server as this big information
      repository: like a database. You are doing data manipulation operations
      on it.

      In UDDI I've chosen a web service that is ripe for an easy conversion to
      REST philosophy but we can apply these principles to anything. What
      about something like a purchase order submission? That seems more
      transactional. Well purchase orders want to be named also! If you POST
      or PUT a purchase order to a new URI then internal systems all over your
      company can instantly refer to it no matter where they are. Using HTTP,
      an arbitrary XSLT stylesheet or Perl script sitting on an employee's
      desktop in the Beijing office can massage data from a purchase order
      sitting on the accounting mainframe in Los Angeles. Accessing
      HTTP-addressable resources is no more difficult than accessing files off
      of the local file system, but it requires much less coordination than
      standard file system sharing technologies.

      What about a request for quote? RFQs want to be named! Once you give
      them a name you can pass around the URL to your partners rather than the
      text. Then your partners can build references to them using hyperlinks
      from their documents and databases. Use access controls to keep out your
      competitors. You can think about any business problem in this way.

      Even web services with complicated work flows can be organized in a
      URI-centric manner. Consider a system that creates airline reservations.
      In a traditional HTML system there are a variety of pages representing
      the different stages in the logical transaction. First you look up
      appropriate flights. You get back a URI representing the set of
      appropriate flights. Then you choose a light. You get back a URI
      representing your choice. Then you decide to commit. You get back a web
      page that returns reservation number. Ideally the URL for that page will
      persist for a reasonable amount of time so that you can bookmark it.

      An XML based web service could go through the exact same steps. Rather
      than returning HTML forms at each step, the service would return XML
      documents conforming to a standard airline industry vocabulary. Those
      same XML documents could be used on a completely different airline
      reservation site to drive exactly the same process.

      Summary: Any business problem can be thought of as a data resource
      manipulation problem and HTTP is a data resource manipulation protocol.

      Metcalfe's Law Revisited

      Metcalfe's law is that the value of a network is proportional to the
      square of the number of people on the network, because each pair of
      people can make a connection between them. One telephone is useless. One
      billion phones cause a major telecommunications revolution - if they can
      all access each other through a single global naming system.

      Metcalfe's law also applies to data objects. Elements in UDDI can only
      (with a few exceptions) refer to each other. They cannot refer to
      objects elsewhere on the Web (for instance in other UDDI repositories).
      Similarly, objects on the Web (for instance web pages) cannot refer to
      the XML elements in the UDDI repository. A URL-centric solution would
      unify these data domains as the phone number system unifies telephones.


      Making your data universally addressable is not equivalent to making it
      universally available! It is easy to hide objects by merely never
      publishing their URIs. It is also easy to apply security policies to
      objects. In fact, REST simplifies security greatly.

      Under the SOAP RPC model, the objects that you work with are implicit
      and their names are hidden in method parameters. Therefore you need to
      invent a new security strategy for each and every web service. UDDI is
      completely unlike .NET My Service which will likely be completely unlike
      Liberty and so forth. Under REST, you can apply the four basic
      permissions to each data object: GET permission, PUT permission, DELETE
      permission and POST permission. You might also want to allow or disallow
      GET/PUT/DELETE and POST on sub-resources. This model is exactly like the
      one used for today's file systems! It is proven and it works. I know of
      no security model that works in a similarly generic manner for remote
      procedure call models.


      In fact, security is just one form of maintainability that is simplified
      by REST. Any network administrator will tell you that every level of
      networking causes its own headaches. Some days IP works but DNS doesn't
      (DNS server down or DNS settings misconfigured). Some days IP/DNS works
      but HTTP doesn't (firewall or proxy misconfigured). If you run a web
      service protocol on top of HTTP it will add its own layer of
      configuration and software headaches on top of the existing ones. It
      cannot be more reliable than its foundational HTTP layer. It can only
      add one more layer of unreliability.

      Once you have your service working, it is possible to "test" REST web
      services just by looking at them in a browser. It is possible to make
      simple HTML forms to test POSTs. QA departments can easily pretend to be
      multiple users by changing their HTTP credentials. Standard web tools
      can monitor availability. In essence, testing REST services is often
      easy if you already know how to test web sites. On the other hand, every
      SOAP RPC service will have its own security model, its own addressing
      model, an implicit data model and its own set of methods. Of these four
      things, only the security model is even currently a candidate for
      standardization. Testing such a system is much more challenging.

      The Rest of the Story

      This brief introduction can only whet your appetite to the theory and
      practice of REST-based web services. In an upcoming article, I will:

      * describe in more detail how any web service can be transformed into a
      URI-centric one.

      * show how the REST philosophy and the XML philosophy are highly

      * show an example of a successful, public, widely used web service that
      uses this model today.

      * discuss the role of SOAP in these sorts of web services.

      * discuss reliability, coordination, transactions, encryption, firewalls

      If you would like to discuss these issue in the meantime, please
      consider contributing to the rest-wiki
      (http://internet.conveyor.com/RESTwiki/moin.cgi/FrontPage) and the REST
      mailing list (http://groups.yahoo.com/group/rest-discuss/).
    Your message has been successfully submitted and would be delivered to recipients shortly.