39

Is there a way to access the page HTML source code using javascript?

I know that I can use document.body.innerHTML but it contains only the code inside the body. I want to get all the page source code including head and body tags with their content, and, if it's possible, also the html tag and the doctype. Is it possible?

1

5 Answers 5

51

Use

document.documentElement.outerHTML 

or

document.documentElement.innerHTML 
Sign up to request clarification or add additional context in comments.

6 Comments

@mck89: no browser but IE will have outerHTML.
Be aware that the source you get with Firefox/most browsers is the "true" source you served up. In IE you will get the "live" HTML of the page including any changes the user has made to forms, any new DOM content etc. In IE it will also be the mixed case invalid tag soup that IE provides when requesting the .innerHTML of elements.
In case anyone else is still looking into this, the situation has changed somewhat. @Crescent Fresh was correct 2 years ago, however more recent versions of Chrome and Safari also implement HTMLELement.outerHTML - though at the time of writing, Firefox does not.
@LiamNewmarch 2 years after your comment, which was 2 years after the initial post, and it seems that now Firefox also implements outerHTML. :)
This is the current state of the DOM not the source code.
|
19

This can be done in a one-liner using XMLSerializer.

var generatedSource = new XMLSerializer().serializeToString(document); 

Which gives String

<!DOCTYPE html><html><head> <title>html - javascript page source code - Stack Overflow</title> ... 

1 Comment

Unfortunately you will get garbage if the document content has any character that requires escaping in XML. Also you will not get the real original string but something slightly different (e.g. including an XML schema link).
11

One way to do this would be to re-request the page using XMLHttpRequest, then you'll get the entire page verbatim from the web server.

1 Comment

Note that servers do not necessarily respond in exactly the same way to two individual requests.
3

For IE you can also use:

document.all[0].outerHTML 

1 Comment

Surprised this isn't marked as the answer. This works perfectly! The only thing is it only gets static HTML (doesn't retrieve anything javascript-related).
1

Provided that

  • true html source code is wanted (not current DOM serization)
  • and that the page was loaded using GET method,

the page source can be re-downloaded:

fetch(document.location.href) .then(response => response.text()) .then(pageSource => /* ... */) 

5 Comments

That is unreliable because there is no guarentee that the server will serve the same content next time.
@SzczepanHołyszewski Given that the REST protocol is defined as stateless, as long as you send the same headers in the ajax request as the browser did, then I would be confident the server would send the same response.
@dantechguy What are you talking about? There is nothing in the OP about REST. Whether an endpoint is a REST one depends on the server. The fetch API is typically used by client-side JS to talk to REST endpoints, but using the fetch API on a non-REST endpoint doesn't magically turn it into a REST one. But even if we talk REST, statelessness is irrelevant. Two identical REST GET requests can return different data if the resource was actually modified between the requests, or your permission to access the resource was revoked, or for a number of other reasons.
You make this a bit more reliable by at least adding an Accept header similar to that of the browser. But yeah, this approach is not generally reliable.
This worked for me! this youtube url has timedtext (transcription) in 'view page source' and could only retrieve this by fetching the url again. youtube.com/watch?v=LA-LMRFhzaw&ab_channel=jordifieke

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.