stdin piping from python to java utf8 encoding error -


i trying pipe unicode characters python java.

python code:

thai = u"ฉันจะกลับบ้านในคืนนี้"  command = "java - jar tokenizer.jar " + thai p = subprocess.popen(command, stdout = subprocess.pipe, stdin = subprocess.pipe, stderr = subprocess.pipe) 

i plan pipe them java via args[].

the results of tokenizer different when ran in java this:

public static void main(string[] args) {     string thai = "ฉันจะกลับบ้านในคืนนี้"     thaianalyzer ana = new thaianalyzer();     ana.analyze(thai) } 

vs

public static void main(string[] args) {     string thai;     thai = args[0] // "ฉันจะกลับบ้านในคืนนี้"(this string should passed python)     thaianalyzer ana = new thaianalyzer();     ana.analyze(args[0]) } 

i believe encoding issue.

pardon short java code not have code me.

what trying example if pipe python java tokenize string

"hi going home" 

i might end

"hi", "i", "am", "going", "home"  

if use former method

and latter method might yield

"hi i", "am", "going home"  

my question due difference in results of output. using english illustrate problem.


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -