Recode bytes which cannot be decoded in utf-8 in python -
reading in txt files - there 1 byte causing me issues encode:
open(input_filename_and_director, 'rb') f: r = unicodecsv.reader(f, delimiter="|")
results in error message:
unicodedecodeerror: 'utf8' codec can't decode byte 0xc3 in position 26: invalid continuation byte
is there anyway specify how want these bytes handled (i.e. read byte in character?)
depending upon want, try using unicodecsv.reader(f, delimiter="|", errors='replace')
or unicodecsv.reader(f, delimiter="|", errors='ignore')
. unicodecsv passes through errors
parameter unicode encoding. see unicode
or here more information.
Comments
Post a Comment