In general, to get Open Graph protocol (OGP) data for a given web page, one would need to retrieve the actual HTML, and then extract the meta tags from it.
However, this has two problems:
Instead of getting only
metatags, the whole page needs to be downloaded.If the website denies bots, it becomes impossible to get OGP data.
Are there APIs that would retrieve just the OGP data?
If not, how exactly am I expected to get OGP data in a responsible way, without risking to bebeing considered as a web scraper and without needlessly wasting the bandwidth of the server?
Update(2025/09/11):
Dynamic sites don't deny access, but returnsreturn contents for non-JS users, and it includes messagethey include messages such as "Please enable JS and disable any ad blocker"
Actual cases whichthat I encounter isare:
I'm running a trend analysis bot on Bluesky, which shows trend words and news article linklinks with thumbnailthumbnails if it existsthey exist: https://bsky.app/profile/did:plc:wwqlk2n45es2ywkwrf4dwsr2/lists/3kob6kalezl2a However, some sites have a bot block system, such as the anti-AI scraper of Cloudflare. ex: https://www.wsj.com/us-news/law/epstein-birthday-book-congress-9d79ab34 (if you've already visited the site before, the confirmation screen will not show)