0

In the script, for each text file, I check the first two characters. If the first two characters are "[{" which means it is a JSON file, then execute other codes.

However, I have to read the file twice with open(f, 'r', encoding = 'utf-8', errors='ignore' as infile:, which is duplicated. Is there any better way to write this code?

result = [] for f in glob.glob("D:/xxxxx/*.txt"): print("file_name: ",f) with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile: first_two_char = infile.read(2) print(str(first_two_char )) if first_two_char == "[{": with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile: json_file = json.load(infile, strict=False) print(len(json_file)) result.append(json_file) #here appending the list with Jason content print(len(result)) 
4
  • 2
    I suppose you could always use seek to reset the cursor rather than reopening the file. Commented Aug 17, 2020 at 16:04
  • 2
    Your approach is wrong. Instead of making sure if it's JSON and reading, just TRY reading it as JSON and if it doesn't work, do nothing... Commented Aug 17, 2020 at 16:10
  • @Tomerikoo Thanks a lot! Yes, you are right. I have changed my code accordingly. It looks better and works well. Thanks again. Commented Aug 17, 2020 at 16:59
  • @AnthonyLabarre Thank you! You really answered my question. Next time when I come across with this issue, I will try seek. Commented Aug 17, 2020 at 17:01

1 Answer 1

1

You could seek(0) to move the file pointer back to zero. Generally, seeking doesn't work with files opened as text because there is an itermediate cache for bytes-to-string decoding. But seek(0) and seek to end of file work.

result = [] for f in glob.glob("D:/xxxxx/*.txt"): print("file_name: ",f) with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile: first_two_char = infile.read(2) print(str(first_two_char )) if first_two_char == "[{": infile.seek(0) json_file = json.load(infile, strict=False) print(len(json_file)) result.append(json_file) #here appending the list with Jason content print(len(result)) result = [] 

But really, just attempting the conversion and catching the error is a better way to go. Suppose the first two characters looked okay only by bad luck?

for f in glob.glob("D:/xxxxx/*.txt"): print("file_name: ",f) with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile: try: result.append(json.load(infile)) except json.decoder.JSONDecodeError: pass print(len(result)) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! It works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.