java - Multi-Label Document Classification -
i have database in store data based upon following 3 fields: id, text, {labels}. note each text has been assigned more 1 label \ tag \ class. want build model (weka \ rapidminer \ mahout) able recommend \ classify bunch of labels \ tags \ classes given text.
i have heard svm , naive bayes classifier, not sure whether support multi-label classification or not. guides me right direction more welcome!
the basic multilabel classification method one-vs.-the-rest (ovr), called binary relevance (br). basic idea take off-the-shelf binary classifier, such naive bayes or svm, create k instances of solve k independent classification problems. in python-like pseudocode:
for each class k: learner = svm(settings) # example labels = [class_of(x) == k x in samples] learner.learn(samples, labels)
then @ prediction time, run each of binary classifiers on sample , collect labels predict positive.
(both training , prediction can done in parallel, since problems assumed independent. see wikipedia links 2 java packages multi-label classification.)
Comments
Post a Comment