python - Reading in TSV with unescaped character in Pandas -
i have tsv file each line word token , pos tag, separated tabs.
the det boy noun said verb " punct hi intj mum noun " punct
this used basis pos-tagger later on. problem whenever pandas encounters quotes, returns this:
word tag 0 det 1 boy noun 2 said verb 3 \tpunct\r\nhi\tintj\r\nmum\tnoun\r\n punct
i have tried explicitly define quotes escape character, didn't work. other thing can think of escape them in tsv files directly, since have many of them, , have been generated me external source, tedious , time consuming.
has encountered before , have solution?
you can tell pandas
ignore quoting when reading file, in case pandas
uses same configuration options builtin csv
module, have pass quote_none
constant csv module:
import csv import pandas pandas.read_table(fn, quoting=csv.quote_none, names=('word', 'tag'))
Comments
Post a Comment