unicode - python pycassa encoding issue -
having issue encoding issue in inserting data in cassandra using pycassa. field name 'text' , content tweet can have non-ascii characters. tried encode using encode('utf-8') text field , shows, getting converted 'unicode' 'str' still fails? exact error here,
-'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128). -'ascii' codec can't encode character u'\2026' in position 139: ordinal not in range(128).
edit 1: field failing in cassandra, no default validator type has been defined? problem? cassandra store as, if type not specified?
edit 2: answers edit 1. noticed something, field it's failing not have default type defined , per doc, cassandra try store hex byte arrays (bytetype) trying insert utf-8 encoded string, problem?
traceback:
traceback (most recent call last): file "/opt/socialflow/prod/api-reporting/api-reporting/cassfh/app/c.py", line 40, in send mutator.send(self, *a, **kw) file "/usr/local/lib/python2.6/dist-packages/pycassa/batch.py", line 126, in send allow_retries=self.allow_retries)
file "/usr/local/lib/python2.6/dist-packages/pycassa/pool.py", line 124, in new_f result = f(self, *args, **kwargs)
file "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/cassandra.py", line 1005, in batch_mutate self.send_batch_mutate(mutation_map, consistency_level)
file "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/cassandra.py", line 1013, in send_batch_mutate args.write(self._oprot)
file "/usr/local/lib/python2.6/dist-packages/pycassa/cassandra/cassandra.py", line 5200, in write oprot.trans.write(fastbinary.encode_binary(self, (self.class, self.thrift_spec)))unicodeencodeerror: 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128)[2013-05-20 21:31:14,450] root critical:
this issue has been fixed. so, here issue.
- encoding issue existed in couple of column families same field called tweet text, can have non-ascii characters.
- i used, pycassa mutator batch requests across multiple column families
- so, fixed encoding issue 2 column families failed rest of 3 cfs.
- so batch insertion fails because failed 1 in pycassa batch.
- i recommend 3 thorough reads of python pycassa documentation , cassandra data model.
hope all.
Comments
Post a Comment