machine learning - How to use different dataset for scikit and NLTK? -
i trying implement inbuilt naive bayes classifier of scikit , nltk raw data have. data have set tab-separated-rows each having label, paragraph , other attributes. interested in classifying paragraphs. need convert data format suitable inbuilt classifiers of scikit/ nltk. want implement gaussian,bernoulli , multinomial naive bayes paragraphs.
question 1: scikit, example given imports iris data. checked iris data, has precalculated values data set. how can convert data such format , directly call gaussian function? there standard way of doing so?
question 2: nltk, should input naivebayesclassifier.classify function? dict boolean values? how can made multinomial or gaussian?
@ question 2:
nltk.naivebayesclassifier.classify expects called 'featureset'. featureset dictionary feature names keys , feature values values, e.g. {'word1':true, 'word2':true, 'word3':false}
. nltks' naive bayes classifier cannot used multinomial approach. however, can install scikit learn , use nltk.classify.scikitlearn wrapper module deploy scikit's multinomial classifier.
Comments
Post a Comment