Thanks for your detailed response!
> I am not aware of any SOAP specific benchmark - it is really
> platform dependant. And the permuations are immense:
> * toolkit used (SOAP::Lite, Axis, BEA, .NET, etc)
> * platform used (Solaris, Linux 2., Windows, etc)
> * VM used (from Perl, to JDK v1., etc)
> * XML parser used...
> * XML parsing method used...
> * message size
> * number of concurrent threads posting/parsing messages...
> Oy, what a matrix. All of these things could have a
> substantial impact on performance.
I thought I would start out with Linux 2.4, Apache, Perl, FastCGI,
MySQL on the backend (and SOAP::Lite). If my understanding is correct,
it shouldn't matter what the servers or agents are written in as long as
they 'speak SOAP'. My intention was to start writing both the agents and
the server in Perl because that is what I'm most comfortable with.
However, if the server starts to bog down, it could be ported to
something else (java?). My intention is also to keep the agent
architecture open enough so that anyone could write one to their own
needs, sharing common libraries for things like log access (or SNMP
polling) and data transmittal and storage. Each agent would only need
to be unique in what data it gathered and how to parse it.
> > I'm experimenting with the idea of building a XML-RPC/SOAP-based
> > performance monitoring/log consolidation system where remote agents
> > collect the data and transmit it back to a central server
> where it is
> > written to a database for analysis and reporting.
> Interesting - there is a real need for this. As you point out
> there are a number of ways to go about doing this. Here are
> some of the ways logging has been done in the WS space that I
> am aware of:
> * a logging Web service that acts as an intermediary or as an
> endpoint (WS-I uses something like this for the Base Profile
> * a proxy service to log requests (Flamenco does something
> like this using software installed at every end point, and
> Grand Central does this as a service - you address a message
> to an endpoint on their network and they do tons of logging for you)
> * a local process like syslog or something more proprietary
> that can handle logging requests, or log aggregation (the log
> can be "pollable"
> by MRTG etc - you discuss something like this in your original email)
Hmmm. I'm not familiar with Flamenco or Grand Central. Sounds like I
need to do a little research. Have you seen/used Lire?
(www.logreport.org) They have a *very* interesting approach to log
consolidation. They use a concept called DLF (distilled log format) to
'normalize' logs from different vendors into a common format from which
Lire generates its reports. There is a DLF for web server logs, mail
server logs, etc. This allows the developer community to focus on data
analysis and report generation.
> I personally think that a cool service to be provided is the
> first one above. SOAP messages can be sent to a log
> agregator. I can log debug information, or log numerical
> information for number crunching and what not by the service
I think the agents should be designed in such a way to filter out
'uninteresting events' and organize the data into DLF format *before*
sending it on to the log aggregator. This would cut down on the traffic
being sent to the server and offload the input validation from the
server. All the server would have to do is post the data to a database.
>...It exposes a WSDL, everyone authenticates to the
> service, and then can log into a web site to view logged
> information, along with statistics generated by the service.
> I could go on and on. Grand Central (the company I work for)
> will implement something like this in the coming months.
Hmmm. I will definitely have to check out this Grand Central.
> One advantage of a proxy solution is the minimal additional
> overhead. If you want to log and generate stats based on a
> message payload, then you may need to buffer the message
> somehow, or resend the entire message to be logged. Of course
> if you instrument a logging WS then the logger can choose
> what to log and what not to - which is speedy and very handy.
The proxy server sounds like a nice idea.
> > With this background, my questions are these:
> > Is this a practical application for XML-RPC/SOAP?
> Absolutely. And it could be incredibly valuable. I like to
> think of it in terms of this problem/solution: providing a
> "Data Warehouse On Demand." Such an application could be
> utilized by anyone, has an obvious business model and
> application, etc. Go for it!!!
> > How many transactions per/sec(min)(hr)(day) can a server
> reasonably handle?
> This goes back to the crazy matrix above. Analysis done by
> Grand Central leads me to believe that the biggest
> bottlenecks will be:
> * network latency as affected by your platform (linux 2.2
> performs much differently than 2.4.16+ for example - in some
> remarkable ways)
> * different JVMs on different platforms perform remarkable different
> * SAX vs. DOM parsing
> * Message Size - this could be one of the biggest
> bottlenecks, as I have yet to encounter an XML parser that
> performs exceptionally well with large messages - SOAP::Lite
> for example is horrible with large (>1MB) messages :(
At this early stage, I can't imagine a message approaching anywhere near
the 1MB size. Even if it did, the filtering mechanism, or the polling
interval could be adjusted to cut the message size down.
> > What impact does CPU speed, # CPUs, memory, disk speed
> affect performance?
> These can be dramatic. Just search for "XML Parsing
> Benchmarks" on Google.
Ouch, I should have thought to check? ;) A very quick check reveals a
wealth of information.
> > Which is more important, i.e., has the greatest impact?
> Depends on the application really. And depends on the
> benchmark variables. Memory is key when parsing large
> messages, and if you have multiple concurrent threads, memory
> can quickly become a premium resource.
> > What impact does server OS have?
> Good question - I have never written an application
> exclusively for Windows - why would you really when
> performance is important? :)
Just asking the question...
> > Finally, am I completely off in the weeds?
> No. Quite the contrary.
> > Has anyone tried this before?
> > Are there any active projects attempting this?
> Yes. Xmethods has a logging Web service used exclusively for
> their Base Profile demo implementations. Grand Central logs
> information for you on your behald and has a WS for querying
> it. Flamenco etc logs info for you. DataPower's XML appliance
> can do this kind of thing (but requires a hardware install)...
More research. More research.
> However, there is no general purpose logging service. Want a
> business partner - this could be incredibly valuable.
> Seriously. Talk to me buddy
> - I want in.
Thanks for the input! I've been kicking this idea around for the last
couple of months and the more I've learned, the more intrigued with the
idea I've become. What I need to stop is for someone to pour a bucket
of cold reality on me. Like:
found this over on www.loganalysis.org.) Even so, I still think this is
a workable idea.