python - Reading in TSV with unescaped character in Pandas -


i have tsv file each line word token , pos tag, separated tabs.

the    det boy    noun said    verb "    punct hi    intj mum    noun "    punct 

this used basis pos-tagger later on. problem whenever pandas encounters quotes, returns this:

                                   word    tag 0                                      det 1                                   boy   noun 2                                  said   verb 3  \tpunct\r\nhi\tintj\r\nmum\tnoun\r\n  punct 

i have tried explicitly define quotes escape character, didn't work. other thing can think of escape them in tsv files directly, since have many of them, , have been generated me external source, tedious , time consuming.

has encountered before , have solution?

you can tell pandas ignore quoting when reading file, in case pandas uses same configuration options builtin csv module, have pass quote_none constant csv module:

import csv import pandas  pandas.read_table(fn, quoting=csv.quote_none, names=('word', 'tag')) 

Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

gradle error "Cannot convert the provided notation to a File or URI" -

python - NameError: name 'subprocess' is not defined -