stdin piping from python to java utf8 encoding error -
i trying pipe unicode characters python java.
python code:
thai = u"ฉันจะกลับบ้านในคืนนี้" command = "java - jar tokenizer.jar " + thai p = subprocess.popen(command, stdout = subprocess.pipe, stdin = subprocess.pipe, stderr = subprocess.pipe)
i plan pipe them java via args[]
.
the results of tokenizer different when ran in java this:
public static void main(string[] args) { string thai = "ฉันจะกลับบ้านในคืนนี้" thaianalyzer ana = new thaianalyzer(); ana.analyze(thai) }
vs
public static void main(string[] args) { string thai; thai = args[0] // "ฉันจะกลับบ้านในคืนนี้"(this string should passed python) thaianalyzer ana = new thaianalyzer(); ana.analyze(args[0]) }
i believe encoding issue.
pardon short java code not have code me.
what trying example if pipe python java tokenize string
"hi going home"
i might end
"hi", "i", "am", "going", "home"
if use former method
and latter method might yield
"hi i", "am", "going home"
my question due difference in results of output. using english illustrate problem.
Comments
Post a Comment