Skip to main content
Fix HTMLParseError typos per Dave Knight's NB
Source Link
Ponkadoodle
  • 6k
  • 5
  • 43
  • 63

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLErrorHTMLParseError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello<br>there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserErrorHTMLParseError: print ':(' #the html is badly malformed (or you found a bug) 

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello<br>there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug) 

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLParseError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello<br>there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParseError: print ':(' #the html is badly malformed (or you found a bug) 
added 20 characters in body
Source Link
Martijn Pieters
  • 1.1m
  • 326
  • 4.2k
  • 3.4k

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello
there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug)

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello<br>there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug) 

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello
there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug)

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello<br>there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug) 
Source Link
Mark
  • 41
  • 1

Instead of the HTMLParser module, check out htmllib. It has a similar interface, but does more of the work for you. (It is pretty ancient, so it's not much help in terms of getting rid of javascript and css. You could make a derived class, but and add methods with names like start_script and end_style (see the python docs for details), but it's hard to do this reliably for malformed html.) Anyway, here's something simple that prints the plain text to the console

from htmllib import HTMLParser, HTMLError from formatter import AbstractFormatter, DumbWriter p = HTMLParser(AbstractFormatter(DumbWriter())) try: p.feed('hello
there'); p.close() #calling close is not usually needed, but let's play it safe except HTMLParserError: print ':(' #the html is badly malformed (or you found a bug)