Skip to content

Conversation

@ring00
Copy link

@ring00 ring00 commented Nov 27, 2025

The PR updates the PyTorch scraper for documentation versions 2.8 and 2.9, addressing changes in the theme and HTML structure.

The PyTorch 2.8 and 2.9 uses a new sphinx theme, which is somewhat unfriendly to scrapers at the moment. The commit mainly addresses truncations in the breadcrumb navigation section (e.g. https://docs.pytorch.org/docs/2.9/name_inference.html, https://docs.pytorch.org/docs/2.9/config_mod.html) by extracting the text inside the heading instead.

The extracted doc structure is slightly different from those of older PyTorch docs because sometimes truncations happen in the middle of the navigation paths (e.g. https://docs.pytorch.org/docs/2.9/torch.compiler_aot_inductor_debugging_guide.html).

Key changes:

  • Identifies the main content area correctly in newer version docs.
  • Supports the new breadcrumb navigation structure.
  • Restore truncated entry names in newer docs using the full page header, maintaining consistent naming conventions.

If you're updating existing documentation to its latest version, please ensure that you have:

  • Updated the versions and releases in the scraper file
  • Ensured the license is up-to-date
  • Ensured the icons and the SOURCE file in public/icons/your_scraper_name/ are up-to-date if the documentation has a custom icon
  • Ensured self.links contains up-to-date urls if self.links is defined
  • Tested the changes locally to ensure:
    • The scraper still works without errors
    • The scraped documentation still looks consistent with the rest of DevDocs
    • The categorization of entries is still good
This commit updates the PyTorch scraper for documentation versions 2.8 and 2.9, addressing changes in the theme and HTML structure. Key changes: - Identifies the main content area correctly in newer version docs. - Supports the new breadcrumb navigation structure. - Restore truncated entry names in newer docs using the full page title, maintaining consistent naming conventions.
@ring00 ring00 marked this pull request as ready for review November 27, 2025 08:05
@ring00 ring00 requested a review from a team as a code owner November 27, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant