0

I'm trying to parse a directory with a collection of xml files from RSS feeds. I have a similar code for another directory working fine, so I can't figure out the problem. I want to return the items so I can write them to a CSV file. The error I'm getting is:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0 

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os import xml.etree.ElementTree as ET import csv def baitem(): basepath = "../data_copy/bergens_avisen" table = [] for fname in os.listdir(basepath): if fname != "last_feed.xml": files = ET.parse(os.path.join(basepath, fname)) root = files.getroot() items = root.find("channel").findall("item") #print(items) for item in items: date = item.find("pubDate").text title = item.find("title").text description = item.find("description").text link = item.find("link").text table.append((date, title, description, link)) return table 

I tested with print(items) and it returns all the objects. Can it be how the XML files are written?

1 Answer 1

1

Asked a friend and said to test with a try except statement. Found a .DS_Store file, which only applies to Mac computers. I'm providing the solution for those who might experience the same problem in the future.

def baitem(): basepath = "../data_copy/bergens_avisen" table = [] for fname in os.listdir(basepath): try: if fname != "last_feed.xml" and fname != ".DS_Store": files = ET.parse(os.path.join(basepath, fname)) root = files.getroot() items = root.find("channel").findall("item") for item in items: date = item.find("pubDate").text title = item.find("title").text description = item.find("description").text link = item.find("link").text table.append((date, title, description, link)) except Exception as e: print(fname, e) return table 
Sign up to request clarification or add additional context in comments.

1 Comment

Instead of checking for DS_Store, I just made sure the filename contained ".xml", but this was my issue, so thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.