I am updating my python code from 2.7 to 3.7. Basically, I am trying to run a pipeline on google dataflow which reads the data from Big Query view and transforms it and then writes back to big query in another table.
However, while updating if I am using unicode error : NameError: name 'unicode' is not defined
bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {unicode(key).encode("utf-8"): unicode( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) ) If I am running the same code as above in python 2.7 it runs fine.
After sometime I tried to update the code as I read unicode in python 3+ as str - if I updated my code to replace unicode as str. The files from big query are not being read hence resulting in Key Error Later :
bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'].encode("utf-8"), {str(key).encode("utf-8"): str( value).encode("utf-8") for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), EDIT 1 :
Update code without encoding - Works now.
bq_source = beam.io.BigQuerySource(query=query, use_standard_sql=True) records = (pipeline | 'Read %s From BQ' % v.get('name') >> beam.io.Read(bq_source) | 'BQ Create KV %s' % count >> beam.Map(lambda row: (row['value'], {key: value for key, value in row.items()})) | 'BQ Group By Key %s' % count >> beam.GroupByKey() | 'BQ Calculate %s Score' % v.get('name') >> beam.ParDo(ProcessDataDoFn(), filter_id=v.get('filter_id'), date=date) )
stron them?