stdin piping from python to java utf8 encoding error -


i trying pipe unicode characters python java.

python code:

thai = u"ฉันจะกลับบ้านในคืนนี้"  command = "java - jar tokenizer.jar " + thai p = subprocess.popen(command, stdout = subprocess.pipe, stdin = subprocess.pipe, stderr = subprocess.pipe) 

i plan pipe them java via args[].

the results of tokenizer different when ran in java this:

public static void main(string[] args) {     string thai = "ฉันจะกลับบ้านในคืนนี้"     thaianalyzer ana = new thaianalyzer();     ana.analyze(thai) } 

vs

public static void main(string[] args) {     string thai;     thai = args[0] // "ฉันจะกลับบ้านในคืนนี้"(this string should passed python)     thaianalyzer ana = new thaianalyzer();     ana.analyze(args[0]) } 

i believe encoding issue.

pardon short java code not have code me.

what trying example if pipe python java tokenize string

"hi going home" 

i might end

"hi", "i", "am", "going", "home"  

if use former method

and latter method might yield

"hi i", "am", "going home"  

my question due difference in results of output. using english illustrate problem.


Comments

Popular posts from this blog

c# - ItextSharp font color issue in ver 5.5.4+ -

jquery - Multiple issues with pushstate: history, loading, calling functions -

ios - retrievePeripherals deprecated in IOS7 how to substitude it with retrievePeripheralsWithIdentifiers -