2

I'm trying to scrape various user data from a JSON API, and then store those values in a MySQL database I've set up, like so:

response = requests.get("URL") json_obj = json.loads(response.text) timer = json_obj['timestamp'] jobposition = json_obj['job']['position'] query = "INSERT INTO users (timer, jobposition) VALUES (%s, %s)" values = (timer, jobposition) cursor = db.cursor() cursor.execute(query, values) db.commit() 

The code seems to run fine for the most part, but some users don't have the attributes I'm trying to scrape in the JSON, which leads to errors.

How can I just store a zero value, or other default value, in the database when data is missing from the JSON?

1

3 Answers 3

3

You can use for that the get() method of the dictionary as follow

timer = json_obj.get('timestamp', 0) 

0 is the default value and in case there is no 'timestamp' attribute it will return 0. For job position, you can do

jobposition = json_obj['job'].get('position', 0) if 'job' in json_obj else 0 
Sign up to request clarification or add additional context in comments.

3 Comments

Damn I've been so close to this solution so many times, but somehow haven't been able to land the syntax apparently. It seems I'm not all the way there though, I had a closer look.. Some of the json attributes does not exist for all users, which can be handled the way you suggested. But as far as missing job position this is the actual json: ? "job": { "position": "None", "company_id": 0, "company_name": "None"`` So as you can see the attribute exists but is reported as "None" which I guess is a string instead of a null-value?
You are right when the value surrounded by " this is a string value if it will be just None it will be a None type.
Seems I'd actually misunderstood what attribute was causing the problem. The "None" strings handled just fine, I had a problem with other attributes not existing which was fixed with your code. Thanks!
0

Try this

try: jobposition = json_obj['job']['position'] except: jobposition = 0 

Comments

0

You can more clearly declare the data schema using dataclasses:

from dataclasses import dataclass from validated_dc import ValidatedDC @dataclass class Job(ValidatedDC): name: str position: int = 0 @dataclass class Workers(ValidatedDC): timer: int job: Job input_data = { 'timer': 123, 'job': {'name': 'driver'} } workers = Workers(**input_data) assert workers.job.position == 0 

https://github.com/EvgeniyBurdin/validated_dc

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.