Python upgrade from 2.7 to 3.7 : unicode error

Question

I am updating my python code from 2.7 to 3.7. Basically, I am trying to run a pipeline on google dataflow which reads the data from Big Query view and transforms it and then writes back to big query in another table.

However, while updating if I am using unicode error : NameError: name 'unicode' is not defined

 bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {unicode(key).encode("utf-8"): unicode( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) )

If I am running the same code as above in python 2.7 it runs fine.

After sometime I tried to update the code as I read unicode in python 3+ as str - if I updated my code to replace unicode as str. The files from big query are not being read hence resulting in Key Error Later :

 bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {str(key).encode("utf-8"): str( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(),

EDIT 1 :

Update code without encoding - Works now.

bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'], {key: value for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) )

What is the type of the keys before you call str on them? — snakecharmerb
– snakecharmerb, Commented Sep 25, 2020 at 12:15
have you tried not encoding? In python3 that forces it to bytes, whereas in python2 it just makes it str. I'd post a more detailed answer, but I don't have google big query to test on. — Kenny Ostrom
– Kenny Ostrom, Commented Sep 25, 2020 at 13:55
Put your update as your own answer so you can mark this as resolved. — beroe
– beroe, Commented Sep 25, 2020 at 14:10

Kenny Ostrom · Accepted Answer · 2020-09-25 14:13:05Z

s = 'hello' u = u'hello' b = u.encode('utf-8') print (type(s), type(u), type(b))

in python38

<class 'str'> <class 'str'> <class 'bytes'>

in python27

(<type 'str'>, <type 'unicode'>, <type 'str'>)

The intent of that conversion is clearly to convert unicode to str, which is no longer a relevant concern in python3. Instead we're changing it to bytes, which is not compatible. Simply do not encode, and use str(key) -- or just key, if you already know it's unicode.

Collectives™ on Stack Overflow

Python upgrade from 2.7 to 3.7 : unicode error

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related