machine learning - How to use different dataset for scikit and NLTK? -

- May 15, 2014

i trying implement inbuilt naive bayes classifier of scikit , nltk raw data have. data have set tab-separated-rows each having label, paragraph , other attributes. interested in classifying paragraphs. need convert data format suitable inbuilt classifiers of scikit/ nltk. want implement gaussian,bernoulli , multinomial naive bayes paragraphs.

question 1: scikit, example given imports iris data. checked iris data, has precalculated values data set. how can convert data such format , directly call gaussian function? there standard way of doing so?
question 2: nltk, should input naivebayesclassifier.classify function? dict boolean values? how can made multinomial or gaussian?

@ question 2:

nltk.naivebayesclassifier.classify expects called 'featureset'. featureset dictionary feature names keys , feature values values, e.g. {'word1':true, 'word2':true, 'word3':false}. nltks' naive bayes classifier cannot used multinomial approach. however, can install scikit learn , use nltk.classify.scikitlearn wrapper module deploy scikit's multinomial classifier.

Search This Blog

Sort

machine learning - How to use different dataset for scikit and NLTK? -

Comments

Post a Comment

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -