Unless you want to use or write a JavaScript parser, which is only fun for a very limited set of individuals, I suggest taking advantage of the thriving headless Chrome community. Grabbing JS variables with puppeteer is straightforward after some boilerplate node code. It's also shockingly (but not "blazingly") fast.
Before running the code:
- Have node js and npm working on your machine
- Install
jq for parsing JSON in the shell. It is available in most package managers, so brew install jq or sudo apt install jq etc should work. - Install Puppeteer in whichever directory these scripts are going to live in with
npm i puppeteer
A file like this is all you need to get started with Puppeteer. I added comments to the key areas.
#!/usr/bin/env node const puppeteer = require('puppeteer') ;(async () => { const browser = await puppeteer.launch() // Replace the line above with this statement a fun show // const browser = await puppeteer.launch({ // headless: false, // devtools: true, // }) const page = await browser.newPage() // Arbitrarily choosing SO for the demo, replace with your website await page.goto('https://stackoverflow.com/') // Or use an argument: // const uri = process.argv[2] // await page.goto(process.argv[0]) const retrievedData = await page.evaluate(() => { // This block has the page context, which is almost identical to being in the console // except for some of the console's supplementary APIs. // Get the URL host name and path separately const { origin, pathname } = window.location; // Get the title in a silly way, for demonstration purposes only const title = document.querySelector('title').textContent // More interesting - save data from the `StackExchange` object from `window` const { options: soOptions } = window.StackExchange // Return an object literal with data for the shell script return { origin, pathname, soOptions, } }) // Convert the object from the browser eval to JSON to parse with with jq later const retrievedJSON = JSON.stringify(retrievedData, null, 4) // console.log writes to stdout in node console.log(retrievedJSON) await browser.close() })()
Note the shebang at the top, which makes the shell understand to run it with node.
If we make this file executable and run it:
chmod +x get-so-data.js ./get-so-data.js
We have a CLI utility that will provide a JSON string of data from the JavaScript global execution context of the website. Here are some small generic shell examples.
#!/bin/sh # Confirm that jq understands the result (should pretty print with ANSI colors): ./get-so-data.js | jq # { # Many data... # } # Check if user is logged in (the user is our node script in a sandboxed browser, so no): ./get-so-data.js | jq '.soOptions.user.isRegistered' # null # Tell the time, according to StackExchange's server clock (linux only): ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -n '@' && cat --) # Fri 11 Sep 2020 04:37:02 PM PDT # Open a subset of the JSON payload returned by Puppeteer in the default editor: ./get-so-data.js | jq '.soOptions' | $EDITOR -- # or VS Code specifically ./get-so-data.js | jq '.soOptions' | code -- # ...
As long as the JavaScript side of the equation is returning enough info to construct a file path, you can open files in your editor based on JavaScript in a browser.
The shell date example takes about 1.5 seconds on a three-year-old Chromebook from within a Linux (Beta) container using 25mbps public wifi. Your mileage will vary depending on the performance of the site you're debugging and the steps in the script.
$ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --) Fri 11 Sep 2020 04:43:24 PM PDT real 0m1.515s user 0m0.945s sys 0m0.383s $ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --) Fri 11 Sep 2020 04:43:30 PM PDT real 0m1.755s user 0m0.999s sys 0m0.433s $ time ./get-so-data.js | jq '.soOptions.serverTime' | date -d $(echo -ne '@' && cat --) Fri 11 Sep 2020 04:43:33 PM PDT real 0m1.422s user 0m0.802s sys 0m0.361s
Resources