Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
View all resources
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
GitHub Stars
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
commoncrawl
/
web-languages-code
Public
Notifications
You must be signed in to change notification settings
Fork
2
Star
2
Code
Issues
0
Pull requests
2
Actions
Projects
Security
0
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Security
Insights
Commits
Branch selector
main
User selector
All users
All time
Commit History
Commits on Jul 11, 2025
feat: check-in logo image
thunderpoot
committed
83abb3b
Copy full SHA for 83abb3b
Commits on Jul 10, 2025
feat: count the number of languages with links
Greg Lindahl
committed
074f769
Copy full SHA for 074f769
fix: missing requirement
Greg Lindahl
committed
3920a17
Copy full SHA for 3920a17
Commits on May 20, 2025
feat: add European language data and progs
thunderpoot
committed
33d40d8
Copy full SHA for 33d40d8
Commits on May 8, 2025
feat(link extractor): add option to filter links
Show description for a1340a2
sebastian-nagel
committed
a1340a2
Copy full SHA for a1340a2
Commits on May 7, 2025
fix: normalize extracted URLs
Show description for 45620e4
sebastian-nagel
committed
45620e4
Copy full SHA for 45620e4
Commits on Jan 11, 2025
Merge pull request #3 from commoncrawl/tool-extract-links
Show description for a4bf3d2
wumpus
authored
a4bf3d2
Copy full SHA for a4bf3d2
Add tool to extract links from all Markdown files to be injected
Show description for 296b2f0
sebastian-nagel
committed
296b2f0
Copy full SHA for 296b2f0
Commits on Dec 5, 2024
Revert "fix: improve filename sanitisation, ignore dataset directory, and skip ancient languages"
Show description for 3c6c797
thunderpoot
committed
3c6c797
Copy full SHA for 3c6c797
fix: improve filename sanitisation, ignore dataset directory, and skip ancient languages
Show description for 3cc89c0
thunderpoot
committed
3cc89c0
Copy full SHA for 3cc89c0
Commits on Nov 25, 2024
fix(template): spelling correction
wumpus
authored
09ed957
Copy full SHA for 09ed957
fix(template): spelling correction
thunderpoot
committed
7496b7d
Copy full SHA for 7496b7d
Commits on Oct 25, 2024
feat: generate links to Wikipedias, fix scripts
Greg Lindahl
committed
b1028a5
Copy full SHA for b1028a5
Commits on Oct 18, 2024
feat: generate links to Wikipedia language pages
Greg Lindahl
committed
6adb884
Copy full SHA for 6adb884
Commits on Oct 5, 2024
fix: make urls in README relative to fit into 500kb github limit
Greg Lindahl
committed
35f9388
Copy full SHA for 35f9388
fix: tood many numbers to remember
Greg Lindahl
committed
91ed6c9
Copy full SHA for 91ed6c9
fix: comments, title, spacing in title
Greg Lindahl
committed
51d3ff6
Copy full SHA for 51d3ff6
Commits on Oct 4, 2024
bug: fix directories in READMEs
Greg Lindahl
committed
5626a8d
Copy full SHA for 5626a8d
bug: obfuscate email addr
Greg Lindahl
committed
d44953f
Copy full SHA for d44953f
bug: make sure files end with a linefeed
Greg Lindahl
committed
8c35342
Copy full SHA for 8c35342
feat: initial files
Greg Lindahl
committed
08fdf2b
Copy full SHA for 08fdf2b
You can’t perform that action at this time.