Hi Michele, Thanks for the continued input, it s useful. Yes, we need to include some information that clarifies how to build a separate tool (as a job jar)Message 1 of 6 , Apr 27, 2012View SourceHi Michele,Thanks for the continued input, it's useful.Yes, we need to include some information that clarifies how to build a separate tool (as a job jar) using Bixo.On my list, hopefully this weekend.Regards,-- KenOn Apr 19, 2012, at 2:18am, Michele Costabile wrote:
Hello all. Today I started with a freshly extracted bixo 0.7.1 archive download.
runs with no problems
bin/bixo bixo.examples.webmining.WebMiningTool -agentname frooga -workingdir $HOME/Desktop/webmining
did not run, failed to see the seed urls file (found in directory examples/src/main/resources/). I think two lines about how to fix this would be welcome. I tried to move the working directory in various places, trying to match what Eclipse does, but this did not solve the problem.
top level ant eclipe
examples ant eclipse
makes a working project and WebMiningTool runs smoothly. So, I think the most effective way to start a web mining project might be to clone this project and start hacking, leaving the rest of bixo aside in the external dependencies. There is the need for a bit of explanation about how to create a self standing jar for running from the command line or a hadoop job for deploying on a cluster.
I have seen http://openbixo.org/documentation/running-bixo-in-ec2/, but am not clear about how would I include my own version of webmining in the job. Also, I am not sure where would my output go in the cloud. This part could be described a little better. By the way, I have done a lot of technical driving (not very much in English, but I have published for fifteeen year in Italian) and I might help, if need be. I might also consider writing a book, with some help from the committers.