wget without HTML tags

Question

Is there a way to get body of an html page, without the html tags?

curl and wget return the response, but contain HTML tags. We can strip the tags using sed and awk, but I am looking for an existing tool which could do it without sed and awk.

lynx is an option, but it does not come pre-installed.

Thanks !!

Duplicate to How to get text of a page using wget without html — hornetbzz
– hornetbzz, Commented Dec 16, 2018 at 3:04

Paul Dixon · Accepted Answer · 2013-09-27 16:37:50Z

1

Why the aversion to installing an appropriate tool?

As an alternative to lynx, try w3m, e.g.

w3m -dump http://google.com

answered Sep 27, 2013 at 16:37

Paul Dixon

302k54 gold badges315 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

shan Over a year ago

I don't have an aversion towards installing a tool. Just need to know if there is an existing tool before installing any other package

Community · Accepted Answer · 2017-05-23 11:50:41Z

Converting HTML to plain text in PHP for e-mail lists a few tools, as does How can I Convert HTML to Text in C#? . However, if lynx -dump does what you want then that may the best tool to install.

Collectives™ on Stack Overflow

wget without HTML tags

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related