0

Hi this may look like a repost but is not. I have recently posted a similar question but this is another issue that links to that problem. So as seen from the previous question(LXML unable to retrieve webpage with error "failed to load HTTP resource"), I am now able to read and print the article if the link is the first line of the file. However, once I try to do it multiple times, it comes back with the error (http://tinypic.com/r/2rr2mau/8)

import lxml.html def fetch_article_content_cna (i): BASE_URL = "http://channelnewsasia.com" f = open('cnaurl2.txt') line = f.readlines() print line [i] url = urljoin(BASE_URL, line[i]) t = lxml.html.parse(url) #print t.find(".//title").text content = '\n'.join(t.xpath('.//div[@class="news_detail"]/div/p/text()')) return content 

cnaurl2.txt

/news/world/tripoli-fire-rages-as/1287826.html /news/asiapacific/korea-ferry-survivors/1287508.html 
2
  • Please reduce your program to the smallest complete program that demonstrates your error. Copy-paste that small program into your question. See stackoverflow.com/help/mcve for more info. Commented Jul 31, 2014 at 2:28
  • Also, please don't post link to image sites. Copy-paste the text of the error messages into your question. Commented Jul 31, 2014 at 2:31

1 Answer 1

0

Try:

url = urljoin(BASE_URL, line[i].strip()) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.