0

I am updating my python code from 2.7 to 3.7. Basically, I am trying to run a pipeline on google dataflow which reads the data from Big Query view and transforms it and then writes back to big query in another table.

However, while updating if I am using unicode error : NameError: name 'unicode' is not defined

 bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {unicode(key).encode("utf-8"): unicode( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) ) 

If I am running the same code as above in python 2.7 it runs fine.

After sometime I tried to update the code as I read unicode in python 3+ as str - if I updated my code to replace unicode as str. The files from big query are not being read hence resulting in Key Error Later :

 bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {str(key).encode("utf-8"): str( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), 

EDIT 1 :

Update code without encoding - Works now.

bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'], {key: value for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) ) 
6
  • What is the type of the keys before you call str on them? Commented Sep 25, 2020 at 12:15
  • its int type @snakecharmerb Commented Sep 25, 2020 at 13:16
  • have you tried not encoding? In python3 that forces it to bytes, whereas in python2 it just makes it str. I'd post a more detailed answer, but I don't have google big query to test on. Commented Sep 25, 2020 at 13:55
  • Yeah - It works now. Commented Sep 25, 2020 at 13:59
  • 2
    Put your update as your own answer so you can mark this as resolved. Commented Sep 25, 2020 at 14:10

1 Answer 1

1
s = 'hello' u = u'hello' b = u.encode('utf-8') print (type(s), type(u), type(b)) 

in python38

<class 'str'> <class 'str'> <class 'bytes'>

in python27

(<type 'str'>, <type 'unicode'>, <type 'str'>)

The intent of that conversion is clearly to convert unicode to str, which is no longer a relevant concern in python3. Instead we're changing it to bytes, which is not compatible. Simply do not encode, and use str(key) -- or just key, if you already know it's unicode.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.