I use the code below to read tables from websites. With the first example everything works as expected. with the second example (commented variables) I only get the first column. I don't find the reason for it. Can somebody help here?
Also nice would be a simple ways to create a nicer output of the tables.
import urllib2 import pprint from bs4 import BeautifulSoup URL = 'http://www.proplanta.de/Markt-und-Preis/MATIF-Raps/' TABLENR = 36 #URL = 'http://www1.chineseshipping.com.cn/en/indices/ccfinew.jsp' #TABLENR = 4 req = urllib2.Request(URL, headers={'User-Agent' : "My Browser"}) con = urllib2.urlopen( req ) html = con.read() soup = BeautifulSoup(html) tables = soup.find_all('table') data = [] rows = tables[TABLENR].find_all('tr') for row in rows: cols = row.find_all('td') cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele]) # Get rid of empty values pprint.pprint (data)