Timeline for Can I get Open Graph Protocol data without behaving as a web scraper?
Current License: CC BY-SA 4.0
11 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Sep 11 at 14:03 | comment | added | JimmyJames | @Lamron There are various ways that a website can do this but the general idea is that a script on the page will run on your host and interact with the server to authorize your client. For example, it might retrieve a cookie which the server can verify was issued for your client. | |
| Sep 11 at 3:13 | comment | added | Doc Brown | @Lamron: this is definitely worth to be asked on Stackoverflow. | |
| Sep 10 at 19:24 | comment | added | Lamron | I think this is not code problem, some sites deny showing contents if request is sent by not human but program, which is written in any languages and any code. | |
| Sep 10 at 19:10 | comment | added | Doc Brown | @Lamron: that's a new question about your specific code. Try ask it on Stackoverflow (but make the code part of your question, don't just link to an external site). | |
| Sep 10 at 19:02 | comment | added | Lamron | I found that doesn't work, for example: gist.github.com/lamrongol/c7079bd8b5057aecf1ba916f4c9150a0 . The site returns "Please enable JS and disable any ad blocker" even if only small size is requested. | |
| Sep 10 at 16:39 | history | edited | Doc Brown | CC BY-SA 4.0 | added 179 characters in body |
| Sep 10 at 9:46 | comment | added | freakish | @Lamron of course. When you download a page, you always download it chunk by chunk. The data arrives at your computer in network packets, which you can analyze and parse on the fly. In fact browsers do this all the time, they won't wait until entire page is downloaded, they will try to parse, run scripts (and maybe even render it) on the fly. You can do something similar. But how to do that exactly, depends on the language and framework you are using. | |
| Sep 9 at 21:13 | comment | added | Lamron | And, "read the page sequentially" is possible? Not file but website? | |
| Sep 9 at 21:10 | comment | added | Lamron | This bot accesses news site only when posting, the rest of the time it analyzes Bluesky posts and doesn't access news sites. | |
| Sep 9 at 20:42 | history | edited | Doc Brown | CC BY-SA 4.0 | added 105 characters in body |
| Sep 9 at 15:46 | history | answered | Doc Brown | CC BY-SA 4.0 |