Another idea for the wiki diamond project. Use bayesian filters to automatically suggest how to classify new entries, given a set of existing documents that have already been classified. This would presumably be more reliable than auto-classification techniques, since auto-classification schemes usually don't have any specific knowledge about your particular set of documents.
Or perhaps a clustering algorithm would be more useful -- so that after I've written, say, five entries on the wiki diamond project, it suggests creating a category for them all.
Once you have bayesian filters, it might be interesting to occasionally rerun them on older documents, in case the categorization taxonomy has changed enough that the old documents should be recategorized. Perhaps the old documents were simply classified under "programming languages" even though you've since split that category into "Java", "Perl", etc.
A list of articles related to this one is here.
Posted on August 20, 2003 06:31 PM
More projects articles