How do I get the HTML source from the page?

Question

Is there a way to access the page HTML source code using javascript?

I know that I can use document.body.innerHTML but it contains only the code inside the body. I want to get all the page source code including head and body tags with their content, and, if it's possible, also the html tag and the doctype. Is it possible?

Possible duplicate of How to get the entire document HTML as a string? — wesinat0r
– wesinat0r, Commented Oct 14, 2019 at 0:24

gunr2171 · Accepted Answer · 2015-07-21 14:57:27Z

51

Use

document.documentElement.outerHTML

or

document.documentElement.innerHTML

edited Jul 21, 2015 at 14:57

gunr2171

17.7k26 gold badges68 silver badges102 bronze badges

answered Sep 2, 2009 at 13:07

Eldar Djafarov

25k2 gold badges36 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Crescent Fresh Over a year ago

@mck89: no browser but IE will have outerHTML.

scunliffe Over a year ago

Be aware that the source you get with Firefox/most browsers is the "true" source you served up. In IE you will get the "live" HTML of the page including any changes the user has made to forms, any new DOM content etc. In IE it will also be the mixed case invalid tag soup that IE provides when requesting the .innerHTML of elements.

Liam Over a year ago

In case anyone else is still looking into this, the situation has changed somewhat. @Crescent Fresh was correct 2 years ago, however more recent versions of Chrome and Safari also implement HTMLELement.outerHTML - though at the time of writing, Firefox does not.

Kip Over a year ago

@LiamNewmarch 2 years after your comment, which was 2 years after the initial post, and it seems that now Firefox also implements outerHTML. :)

Lothar Over a year ago

This is the current state of the DOM not the source code.

|

Paul S. · Accepted Answer · 2013-07-03 14:40:25Z

This can be done in a one-liner using XMLSerializer.

var generatedSource = new XMLSerializer().serializeToString(document);

Which gives String

<!DOCTYPE html><html><head> <title>html - javascript page source code - Stack Overflow</title> ...

Unfortunately you will get garbage if the document content has any character that requires escaping in XML. Also you will not get the real original string but something slightly different (e.g. including an XML schema link).

Paul Dixon · Accepted Answer · 2009-09-02 13:08:31Z

11

One way to do this would be to re-request the page using XMLHttpRequest, then you'll get the entire page verbatim from the web server.

answered Sep 2, 2009 at 13:08

Paul Dixon

302k54 gold badges315 silver badges349 bronze badges

1 Comment

mindplay.dk Over a year ago

Note that servers do not necessarily respond in exactly the same way to two individual requests.

benz · Accepted Answer · 2022-10-27 10:42:38Z

3

For IE you can also use:

document.all[0].outerHTML

edited Oct 27, 2022 at 10:42

benz

4551 gold badge7 silver badges27 bronze badges

answered Sep 2, 2009 at 13:23

DmitryK

5,5921 gold badge25 silver badges32 bronze badges

1 Comment

benz Over a year ago

Surprised this isn't marked as the answer. This works perfectly! The only thing is it only gets static HTML (doesn't retrieve anything javascript-related).

czerny · Accepted Answer · 2018-04-24 16:06:22Z

1

Provided that

true html source code is wanted (not current DOM serization)
and that the page was loaded using GET method,

the page source can be re-downloaded:

fetch(document.location.href) .then(response => response.text()) .then(pageSource => /* ... */)

edited Apr 24, 2018 at 16:06

answered Jun 24, 2017 at 23:15

czerny

16.9k15 gold badges75 silver badges103 bronze badges

5 Comments

Szczepan Hołyszewski Over a year ago

That is unreliable because there is no guarentee that the server will serve the same content next time.

dwb Over a year ago

@SzczepanHołyszewski Given that the REST protocol is defined as stateless, as long as you send the same headers in the ajax request as the browser did, then I would be confident the server would send the same response.

Szczepan Hołyszewski Over a year ago

@dantechguy What are you talking about? There is nothing in the OP about REST. Whether an endpoint is a REST one depends on the server. The fetch API is typically used by client-side JS to talk to REST endpoints, but using the fetch API on a non-REST endpoint doesn't magically turn it into a REST one. But even if we talk REST, statelessness is irrelevant. Two identical REST GET requests can return different data if the resource was actually modified between the requests, or your permission to access the resource was revoked, or for a number of other reasons.

mindplay.dk Over a year ago

You make this a bit more reliable by at least adding an Accept header similar to that of the browser. But yeah, this approach is not generally reliable.

Wim den Herder Over a year ago

This worked for me! this youtube url has timedtext (transcription) in 'view page source' and could only retrieve this by fetching the url again. youtube.com/watch?v=LA-LMRFhzaw&ab_channel=jordifieke

Collectives™ on Stack Overflow

How do I get the HTML source from the page?

5 Answers 5

6 Comments

1 Comment

1 Comment

1 Comment

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

1 Comment

1 Comment

1 Comment

5 Comments

Linked

Related