Looking for Classified Documents
- We are testing some automatic classification algorithms and would like
to know if there is any testbed or benchmark we could use, with a good
number of documents associated to a taxonomy, that could be easily
M. Luiza Campos
Department of Computer Science
Federal University of Rio de Janeiro
Some friends of mine used documents from the US Securities and Exchange Commission. Each US public company has to file documents with them, such as the 10K quarterly report, and they contain a brief description of the company and an SIC industry code. They used that data to train and test several SIC classifiers and compare the results. See http://citeseer.ist.psu.edu/dolin99practical.html for a writeup.