51,743 questions
0 votes
1 answer
31 views
BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction
I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...
-2 votes
0 answers
51 views
To Run firefox browser using launch_persistent_context of playwright python [closed]
from playwright.sync_api import sync_playwright profile_path = r"C:\Users\kdutt\AppData\Roaming\Mozilla\Firefox\Profiles\p283dicx.default-release" firefox_path = r"C:\Program Files\...
3 votes
1 answer
40 views
Nodriver does not take exception if element not found?
I am trying to search for elements on a webpage and have used various methods, including text and XPath. It seems that the timeout option does not work the way I expected, and no exception is raised ...
0 votes
2 answers
201 views
Beautiful Soup, children are clearly inside but can't get it
From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...
0 votes
0 answers
46 views
UPS fuel surcharge history extracting [closed]
I previously extracted the US fuel surcharge history using this JSON endpoint: https://www.ups.com/assets/resources/fuel-surcharge/us.json But, it stopped updating data after 9/22/2025. How can I ...
0 votes
0 answers
77 views
URL Targeted web crawler [closed]
I have a bit of code I am trying to build to take a specific tumblr page and then iteratively scan by post # sequentially and check to see if a page exists. If it does it will print that full URL to ...
2 votes
0 answers
99 views
How to stop/kill achieved Scrapy spider instance within RStudio
I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...
Advice
0 votes
4 replies
43 views
How to fetch realTime news Data feed
i wanted to know how i can get live news feed data (INDIAN) , without any or like minimal latency(30-40s), i tried using some rss feeds but all they do is provide the data as some latency so what i ...
0 votes
0 answers
48 views
Camoufox browser window remains visible in WSL even when `headless` is set to `virtual`
Camoufox browser window remains visible in WSL even when headless is set to virtual Description When headless is set to "virtual", the Camoufox browser window still appears on the screen in ...
1 vote
0 answers
86 views
Invoke-WebRequest URL encoding
I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character. code $json = Get-Content -Encoding utf8 -Path "./...
-4 votes
2 answers
75 views
How can I get BBFC ratings in python? [closed]
I am trying to write code to give me BBFC film ratings. I am using selenium to do this but would be happy with any solution that works reliably. After a lot of work I finally came up with this code: #...
0 votes
1 answer
211 views
Fetch data from https://www.sofascore.com/?
This is my python code using on ubuntu to try fetch and extract data from https://www.sofascore.com/ I create this test code before using on E2 device in my plugin # python3 -m venv venv # source venv/...
0 votes
1 answer
72 views
Using HTTPkerberosauth with a javascript enabled web scraper
I'm working on integration tests for a web application that's running in a Docker container within our GitLab CI/CD pipeline. The application is a frontend that requires Kerberos/SPNEGO authentication ...
0 votes
1 answer
65 views
Scrapy handle status 202
I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...
-1 votes
1 answer
47 views
How to loop an Apps Script / Cheerio web scraper over multiple urls? [closed]
I have this Apps Script / Cheerio function that successfully scrapes the data I want from the url. The site only displays 25 entries at this url. I can find additional entries on subsequent pages (by ...