Loading ...
Sorry, an error occurred while loading the content.

Re: [soaplite] Any information on transaction loads, performance studies, etc?

Expand Messages
  • Byrne Reese
    Wow - a lot of good questions... not necessarily SOAP::Lite related, but good nonetheless. Let me see if I can share some of the things I have learned over the
    Message 1 of 3 , Mar 21, 2004
    • 0 Attachment
      Wow - a lot of good questions... not necessarily SOAP::Lite related, but
      good nonetheless. Let me see if I can share some of the things I have
      learned over the past 4 years with Web services.

      Jim Lancaster wrote:

      > I realize how open-ended the questions that follow are going to be, but
      > I was wondering if there are there any XML-RPC/SOAP performance studies
      > available?

      I am not aware of any SOAP specific benchmark - it is really platform
      dependant. And the permuations are immense:

      * toolkit used (SOAP::Lite, Axis, BEA, .NET, etc)
      * platform used (Solaris, Linux 2.[246], Windows, etc)
      * VM used (from Perl, to JDK v1.[234], etc)
      * XML parser used...
      * XML parsing method used...
      * message size
      * number of concurrent threads posting/parsing messages...

      Oy, what a matrix. All of these things could have a substantial impact
      on performance.

      > I'm experimenting with the idea of building a XML-RPC/SOAP-based
      > performance monitoring/log consolidation system where remote agents
      > collect the data and transmit it back to a central server where it is
      > written to a database for analysis and reporting.

      Interesting - there is a real need for this. As you point out there are
      a number of ways to go about doing this. Here are some of the ways
      logging has been done in the WS space that I am aware of:

      * a logging Web service that acts as an intermediary or as an endpoint
      (WS-I uses something like this for the Base Profile implementations)
      * a proxy service to log requests (Flamenco does something like this
      using software installed at every end point, and Grand Central does this
      as a service - you address a message to an endpoint on their network and
      they do tons of logging for you)
      * a local process like syslog or something more proprietary that can
      handle logging requests, or log aggregation (the log can be "pollable"
      by MRTG etc - you discuss something like this in your original email)

      I personally think that a cool service to be provided is the first one
      above. SOAP messages can be sent to a log agregator. I can log debug
      information, or log numerical information for number crunching and what
      not by the service itself. It exposes a WSDL, everyone authenticates to
      the service, and then can log into a web site to view logged
      information, along with statistics generated by the service. I could go
      on and on. Grand Central (the company I work for) will implement
      something like this in the coming months.

      One advantage of a proxy solution is the minimal additional overhead. If
      you want to log and generate stats based on a message payload, then you
      may need to buffer the message somehow, or resend the entire message to
      be logged. Of course if you instrument a logging WS then the logger can
      choose what to log and what not to - which is speedy and very handy.

      > With this background, my questions are these:
      > Is this a practical application for XML-RPC/SOAP?

      Absolutely. And it could be incredibly valuable. I like to think of it
      in terms of this problem/solution: providing a "Data Warehouse On
      Demand." Such an application could be utilized by anyone, has an obvious
      business model and application, etc. Go for it!!!

      > How many transactions per/sec(min)(hr)(day) can a server reasonably handle?

      This goes back to the crazy matrix above. Analysis done by Grand Central
      leads me to believe that the biggest bottlenecks will be:

      * network latency as affected by your platform (linux 2.2 performs much
      differently than 2.4.16+ for example - in some remarkable ways)
      * different JVMs on different platforms perform remarkable different
      * SAX vs. DOM parsing
      * Message Size - this could be one of the biggest bottlenecks, as I have
      yet to encounter an XML parser that performs exceptionally well with
      large messages - SOAP::Lite for example is horrible with large (>1MB)
      messages :(

      > What impact does CPU speed, # CPUs, memory, disk speed affect performance?

      These can be dramatic. Just search for "XML Parsing Benchmarks" on Google.

      > Which is more important, i.e., has the greatest impact?

      Depends on the application really. And depends on the benchmark
      variables. Memory is key when parsing large messages, and if you have
      multiple concurrent threads, memory can quickly become a premium resource.

      > What impact does server OS have?

      Good question - I have never written an application exclusively for
      Windows - why would you really when performance is important? :)

      > Finally, am I completely off in the weeds?

      No. Quite the contrary.

      > Has anyone tried this before?
      > Are there any active projects attempting this?

      Yes. Xmethods has a logging Web service used exclusively for their Base
      Profile demo implementations. Grand Central logs information for you on
      your behald and has a WS for querying it. Flamenco etc logs info for
      you. DataPower's XML appliance can do this kind of thing (but requires a
      hardware install)...

      However, there is no general purpose logging service. Want a business
      partner - this could be incredibly valuable. Seriously. Talk to me buddy
      - I want in.

      --------------------------

      Ok - I am done now. Anyone else want to chime in on this thread?

      Byrne
    • Jim Lancaster
      I realize how open-ended the questions that follow are going to be, but I was wondering if there are there any XML-RPC/SOAP performance studies available? I m
      Message 2 of 3 , Mar 22, 2004
      • 0 Attachment
        I realize how open-ended the questions that follow are going to be, but
        I was wondering if there are there any XML-RPC/SOAP performance studies
        available?

        I'm experimenting with the idea of building a XML-RPC/SOAP-based
        performance monitoring/log consolidation system where remote agents
        collect the data and transmit it back to a central server where it is
        written to a database for analysis and reporting.

        The traditional solutions I've tried (MRTG, RRDTool, WhatsUp Gold, cacti
        to name a few) rely on "pull technology" polling methods where a server
        originates a poll, and "pulls" the data back to the server and writes it
        to a database. They require the polling server have direct network
        access to the remote device. If a device is on a remote network, then
        the polling server needs some sort of VPN to get to it. These solutions
        also suffer from a "single-enterprise" perspective in that they assume
        that all of the devices being managed are part of a single enterprise.
        There is no provision for non-performance data like company, location,
        etc that would be needed by a MSP (like me).

        It would appear that using XML-RPC/SOAP might be a solution. My thought
        is to create an 'army' of agents to collect, validate and "push" the
        data back to a central server. Since virtually all of the devices I want
        to manage are remote, this would eliminate the need for VPNs to get to
        them. It would also be trivial to add things like device ID, company,
        location info in the XML payload, which would go a long way towards
        helping me organize the data for reporting purposes.

        I realize all too well the volume of information that could easily be
        generated in such a scenario. I'm not looking to scale this thing to
        1000s of devices with 5-minute polling intervals. I want to gather data
        from several hundred devices and I expect to stretch out the polling
        interval to as much as an hour in some cases. (How fast does disk space
        change anyway?) In the case of log files, I'm not interested in
        firewall or proxy logs. I plan to concentrate on event logs, e-mail
        logs, and other lower-volume logs. I'm not necessarily looking for
        real-time performance; I'm more interested in historical performance.

        With this background, my questions are these: Is this a practical
        application for XML-RPC/SOAP? How many transactions
        per/sec(min)(hr)(day) can a server reasonably handle? What impact does
        CPU speed, # CPUs, memory, disk speed affect performance? (Which is
        more important, i.e., has the greatest impact?) What impact does server
        OS have?

        Finally, am I completely off in the weeds? Has anyone tried this before?
        Are there any active projects attempting this? Please stop me before I
        waste my time or reinvent the wheel.

        Any/all input is welcome.

        Thanks,

        Jim
      • Jim Lancaster
        Byrne, Thanks for your detailed response! [snip] ... I thought I would start out with Linux 2.4[6], Apache, Perl, FastCGI, MySQL on the backend (and
        Message 3 of 3 , Mar 23, 2004
        • 0 Attachment
          Byrne,

          Thanks for your detailed response!

          [snip]
          > I am not aware of any SOAP specific benchmark - it is really
          > platform dependant. And the permuations are immense:
          >
          > * toolkit used (SOAP::Lite, Axis, BEA, .NET, etc)
          > * platform used (Solaris, Linux 2.[246], Windows, etc)
          > * VM used (from Perl, to JDK v1.[234], etc)
          > * XML parser used...
          > * XML parsing method used...
          > * message size
          > * number of concurrent threads posting/parsing messages...
          >
          > Oy, what a matrix. All of these things could have a
          > substantial impact on performance.

          I thought I would start out with Linux 2.4[6], Apache, Perl, FastCGI,
          MySQL on the backend (and SOAP::Lite). If my understanding is correct,
          it shouldn't matter what the servers or agents are written in as long as
          they 'speak SOAP'. My intention was to start writing both the agents and
          the server in Perl because that is what I'm most comfortable with.
          However, if the server starts to bog down, it could be ported to
          something else (java?). My intention is also to keep the agent
          architecture open enough so that anyone could write one to their own
          needs, sharing common libraries for things like log access (or SNMP
          polling) and data transmittal and storage. Each agent would only need
          to be unique in what data it gathered and how to parse it.

          > > I'm experimenting with the idea of building a XML-RPC/SOAP-based
          > > performance monitoring/log consolidation system where remote agents
          > > collect the data and transmit it back to a central server
          > where it is
          > > written to a database for analysis and reporting.
          >
          > Interesting - there is a real need for this. As you point out
          > there are a number of ways to go about doing this. Here are
          > some of the ways logging has been done in the WS space that I
          > am aware of:
          >
          > * a logging Web service that acts as an intermediary or as an
          > endpoint (WS-I uses something like this for the Base Profile
          > implementations)
          > * a proxy service to log requests (Flamenco does something
          > like this using software installed at every end point, and
          > Grand Central does this as a service - you address a message
          > to an endpoint on their network and they do tons of logging for you)
          > * a local process like syslog or something more proprietary
          > that can handle logging requests, or log aggregation (the log
          > can be "pollable"
          > by MRTG etc - you discuss something like this in your original email)

          Hmmm. I'm not familiar with Flamenco or Grand Central. Sounds like I
          need to do a little research. Have you seen/used Lire?
          (www.logreport.org) They have a *very* interesting approach to log
          consolidation. They use a concept called DLF (distilled log format) to
          'normalize' logs from different vendors into a common format from which
          Lire generates its reports. There is a DLF for web server logs, mail
          server logs, etc. This allows the developer community to focus on data
          analysis and report generation.

          > I personally think that a cool service to be provided is the
          > first one above. SOAP messages can be sent to a log
          > agregator. I can log debug information, or log numerical
          > information for number crunching and what not by the service
          > itself...

          I think the agents should be designed in such a way to filter out
          'uninteresting events' and organize the data into DLF format *before*
          sending it on to the log aggregator. This would cut down on the traffic
          being sent to the server and offload the input validation from the
          server. All the server would have to do is post the data to a database.

          >...It exposes a WSDL, everyone authenticates to the
          > service, and then can log into a web site to view logged
          > information, along with statistics generated by the service.
          > I could go on and on. Grand Central (the company I work for)
          > will implement something like this in the coming months.

          Hmmm. I will definitely have to check out this Grand Central.

          > One advantage of a proxy solution is the minimal additional
          > overhead. If you want to log and generate stats based on a
          > message payload, then you may need to buffer the message
          > somehow, or resend the entire message to be logged. Of course
          > if you instrument a logging WS then the logger can choose
          > what to log and what not to - which is speedy and very handy.

          The proxy server sounds like a nice idea.

          > > With this background, my questions are these:
          > > Is this a practical application for XML-RPC/SOAP?
          >
          > Absolutely. And it could be incredibly valuable. I like to
          > think of it in terms of this problem/solution: providing a
          > "Data Warehouse On Demand." Such an application could be
          > utilized by anyone, has an obvious business model and
          > application, etc. Go for it!!!
          >
          > > How many transactions per/sec(min)(hr)(day) can a server
          > reasonably handle?
          >
          > This goes back to the crazy matrix above. Analysis done by
          > Grand Central leads me to believe that the biggest
          > bottlenecks will be:
          >
          > * network latency as affected by your platform (linux 2.2
          > performs much differently than 2.4.16+ for example - in some
          > remarkable ways)
          > * different JVMs on different platforms perform remarkable different
          > * SAX vs. DOM parsing
          > * Message Size - this could be one of the biggest
          > bottlenecks, as I have yet to encounter an XML parser that
          > performs exceptionally well with large messages - SOAP::Lite
          > for example is horrible with large (>1MB) messages :(

          At this early stage, I can't imagine a message approaching anywhere near
          the 1MB size. Even if it did, the filtering mechanism, or the polling
          interval could be adjusted to cut the message size down.

          > > What impact does CPU speed, # CPUs, memory, disk speed
          > affect performance?
          >
          > These can be dramatic. Just search for "XML Parsing
          > Benchmarks" on Google.

          Ouch, I should have thought to check? ;) A very quick check reveals a
          wealth of information.
          (http://www.devsphere.com/xml/benchmark/index.html, e.g.) Thanks.

          > > Which is more important, i.e., has the greatest impact?
          >
          > Depends on the application really. And depends on the
          > benchmark variables. Memory is key when parsing large
          > messages, and if you have multiple concurrent threads, memory
          > can quickly become a premium resource.
          >
          > > What impact does server OS have?
          >
          > Good question - I have never written an application
          > exclusively for Windows - why would you really when
          > performance is important? :)

          Just asking the question...

          > > Finally, am I completely off in the weeds?
          >
          > No. Quite the contrary.
          >
          > > Has anyone tried this before?
          > > Are there any active projects attempting this?
          >
          > Yes. Xmethods has a logging Web service used exclusively for
          > their Base Profile demo implementations. Grand Central logs
          > information for you on your behald and has a WS for querying
          > it. Flamenco etc logs info for you. DataPower's XML appliance
          > can do this kind of thing (but requires a hardware install)...

          More research. More research.

          > However, there is no general purpose logging service. Want a
          > business partner - this could be incredibly valuable.
          > Seriously. Talk to me buddy
          > - I want in.

          Thanks for the input! I've been kicking this idea around for the last
          couple of months and the more I've learned, the more intrigued with the
          idea I've become. What I need to stop is for someone to pour a bucket
          of cold reality on me. Like:
          http://mail.iocaine.com/pipermail/loganalysis/2002-June/000085.html (I
          found this over on www.loganalysis.org.) Even so, I still think this is
          a workable idea.

          Jim
        Your message has been successfully submitted and would be delivered to recipients shortly.