LXML unable to retrieve webpage using link from file

Question

Hi this may look like a repost but is not. I have recently posted a similar question but this is another issue that links to that problem. So as seen from the previous question(LXML unable to retrieve webpage with error "failed to load HTTP resource"), I am now able to read and print the article if the link is the first line of the file. However, once I try to do it multiple times, it comes back with the error (http://tinypic.com/r/2rr2mau/8)

import lxml.html def fetch_article_content_cna (i): BASE_URL = "http://channelnewsasia.com" f = open('cnaurl2.txt') line = f.readlines() print line [i] url = urljoin(BASE_URL, line[i]) t = lxml.html.parse(url) #print t.find(".//title").text content = '\n'.join(t.xpath('.//div[@class="news_detail"]/div/p/text()')) return content

cnaurl2.txt

/news/world/tripoli-fire-rages-as/1287826.html /news/asiapacific/korea-ferry-survivors/1287508.html

Please reduce your program to the smallest complete program that demonstrates your error. Copy-paste that small program into your question. See stackoverflow.com/help/mcve for more info. — Robᵩ
– Robᵩ, Commented Jul 31, 2014 at 2:28
Also, please don't post link to image sites. Copy-paste the text of the error messages into your question. — Robᵩ
– Robᵩ, Commented Jul 31, 2014 at 2:31

Robᵩ · Accepted Answer · 2014-07-31 02:31:37Z

0

Try:

url = urljoin(BASE_URL, line[i].strip())

answered Jul 31, 2014 at 2:31

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

LXML unable to retrieve webpage using link from file

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related