I just ran a large batch transform job on AWS Sagemaker and ended up with a 10GB file in the following format:
{"vectors": [[1024 items], ..., [1024 items]]}{"vectors": [[1024 items], ..., [1024 items]]}{"vectors": [[1024 items], ..., [1024 items]]}...{"vectors": [[1024 items], ..., [1024 items]]} In total there are about 44,000 JSON entries, each with 10 lists of 1,024 items each. How can I extract the JSONs from this file, one by one, to have them available for post processing?
Ideal pseudo code:
for JSON in file: do stuff with JSON I've tried to use the below snippet, but my kernel dies each time.
with open(path, "r") as f: for line in f: do_stuff(line)