Hi this may look like a repost but is not. I have recently posted a similar question but this is another issue that links to that problem. So as seen from the previous question(LXML unable to retrieve webpage with error "failed to load HTTP resource"), I am now able to read and print the article if the link is the first line of the file. However, once I try to do it multiple times, it comes back with the error (http://tinypic.com/r/2rr2mau/8)
import lxml.html def fetch_article_content_cna (i): BASE_URL = "http://channelnewsasia.com" f = open('cnaurl2.txt') line = f.readlines() print line [i] url = urljoin(BASE_URL, line[i]) t = lxml.html.parse(url) #print t.find(".//title").text content = '\n'.join(t.xpath('.//div[@class="news_detail"]/div/p/text()')) return content cnaurl2.txt
/news/world/tripoli-fire-rages-as/1287826.html /news/asiapacific/korea-ferry-survivors/1287508.html