nlp - How to represent Dependency Triplets in .arff file format? -
i'm developing classifier text categorisation using weka java libraries. i've extracted number of features using stanfords' corenlp package, including dependency parse of text returns string "(rel, head, mod)".
i wanting use dependency triplets returned features classification cannot figure out how represent them in arff file. basically, i'm stumped; each instance, there arbitrary number of dependency triplets, can't define them explicitly in attributes, example:
@attribute entitycount numeric @attribute deptriple_1 string @attribute deptriple_2 string . . @attribute deptriple_n string
is there particular way go this? i've spent better part of day searching , have not found yet.
thanks lot reading.
extracted weka wiki:
import weka.core.attribute; import weka.core.fastvector; import weka.core.instance; import weka.core.instances; /** * generates little arff file different attribute types. * * @author fracpete */ public class so_test { public static void main(string[] args) throws exception { fastvector atts; fastvector attsrel; fastvector attvals; fastvector attvalsrel; instances data; instances datarel; double[] vals; double[] valsrel; int i; // 1. set attributes atts = new fastvector(); // - numeric atts.addelement(new attribute("att1")); // - nominal attvals = new fastvector(); (i = 0; < 5; i++) attvals.addelement("val" + (i+1)); atts.addelement(new attribute("att2", attvals)); // - string atts.addelement(new attribute("att3", (fastvector) null)); // - date atts.addelement(new attribute("att4", "yyyy-mm-dd")); // - relational attsrel = new fastvector(); // -- numeric attsrel.addelement(new attribute("att5.1")); // -- nominal attvalsrel = new fastvector(); (i = 0; < 5; i++) attvalsrel.addelement("val5." + (i+1)); attsrel.addelement(new attribute("att5.2", attvalsrel)); datarel = new instances("att5", attsrel, 0); atts.addelement(new attribute("att5", datarel, 0)); // 2. create instances object data = new instances("myrelation", atts, 0); // 3. fill data // first instance vals = new double[data.numattributes()]; // - numeric vals[0] = math.pi; // - nominal vals[1] = attvals.indexof("val3"); // - string vals[2] = data.attribute(2).addstringvalue("this string!"); // - date vals[3] = data.attribute(3).parsedate("2001-11-09"); // - relational datarel = new instances(data.attribute(4).relation(), 0); // -- first instance valsrel = new double[2]; valsrel[0] = math.pi + 1; valsrel[1] = attvalsrel.indexof("val5.3"); datarel.add(new instance(1.0, valsrel)); // -- second instance valsrel = new double[2]; valsrel[0] = math.pi + 2; valsrel[1] = attvalsrel.indexof("val5.2"); datarel.add(new instance(1.0, valsrel)); vals[4] = data.attribute(4).addrelation(datarel); // add data.add(new instance(1.0, vals)); // second instance vals = new double[data.numattributes()]; // important: needs new array! // - numeric vals[0] = math.e; // - nominal vals[1] = attvals.indexof("val1"); // - string vals[2] = data.attribute(2).addstringvalue("and one!"); // - date vals[3] = data.attribute(3).parsedate("2000-12-01"); // - relational datarel = new instances(data.attribute(4).relation(), 0); // -- first instance valsrel = new double[2]; valsrel[0] = math.e + 1; valsrel[1] = attvalsrel.indexof("val5.4"); datarel.add(new instance(1.0, valsrel)); // -- second instance valsrel = new double[2]; valsrel[0] = math.e + 2; valsrel[1] = attvalsrel.indexof("val5.1"); datarel.add(new instance(1.0, valsrel)); vals[4] = data.attribute(4).addrelation(datarel); // add data.add(new instance(1.0, vals)); // 4. output data system.out.println(data); } }
your problem in particular "relational" attribute. code segment has dealt such relational attribute.
Comments
Post a Comment