python - "Unicode error" when reading a file -
this first post on here, don't hope isn't in wrong topic or something, i've run unusual problem python app i'm writing.
basically, i'm trying read text file , insert part of tkinter text widget. text file contains usual "\n" line breaks, when run code bizarre error haven't been able cook workaround for:
(btw, sorry lousy set-up here... not sure how work new code-entering system; seems "play own rules" , have own syntax, copied/pasted below:
exception in tkinter callback traceback (most recent call last): file "c:\python33\lib\idlelib\run.py", line 107, in main seq, request = rpc.request_queue.get(block=true, timeout=0.05) file "c:\python33\lib\queue.py", line 175, in raise empty queue.empty
during handling of above exception, exception occurred:
traceback (most recent call last): file "c:\python33\lib\tkinter\__init__.py", line 1442, in __call__ return self.func(*args) file "c:\users\owner\desktop\python projects\the ultimate joke book.py", line 89, in search results.create() file "c:\users\owner\desktop\python projects\the ultimate joke book.py", line 31, in create joke = linecache.getline('jokes/jokelist.txt',x) file "c:\python33\lib\linecache.py", line 15, in getline lines = getlines(filename, module_globals) file "c:\python33\lib\linecache.py", line 41, in getlines return updatecache(filename, module_globals) file "c:\python33\lib\linecache.py", line 127, in updatecache lines = fp.readlines() file "c:\python33\lib\codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) unicodedecodeerror: 'utf-8' codec can't decode byte 0xbf in position 627: invalid start byte
so function caused problem -- "linecache.getline" used in for
loop -- works when there no "\" in text, whatever reason doesn't "\" , starts spittin' errors. : /
so tonight i've spent hour on "docs" (http://docs.python.org/3/howto/unicode.html), reading history , basic concept of unicode, loaded assumed knowledge , while informative , helpful on concept-only level, didn't seem in terms of practical information , potential solutions.
the solution can come defeat annoying little bug use "/n" instead , programmatically split strings array (or "list" seem called in python), use loop break more 1 line... sounds lot of unnecessary steps, if there common workaround in existence. appreciate insights on how solve particularly mysterious problem.
thanks.
the data utf-8 decoder has been given not utf-8. that's why error. need give code fails , data examples explain happening.
the character in question "¿" in latin-1 , cp-1252. perhaps spanish text written on windows machine? in case, specify encoding when opening file.
Comments
Post a Comment