A powerful command-line tool for organizing and exploring academic research using connected knowledge graphs. Built with Python, Neo4j, and the Semantic Scholar API, this toolkit helps researchers discover, analyze, and visualize relationships between papers, authors, and research concepts.
- Keyword Search: Find papers by research topics and keywords
- Author Search: Discover papers by specific researchers
- Paper ID Lookup: Direct access to papers via Semantic Scholar IDs
- Field-specific Search: Filter results by academic disciplines
- Automatic Graph Building: Create connected graphs of papers, authors, and venues
- Citation Networks: Map citation and reference relationships between papers
- Author Networks: Track collaborations and research connections
- Venue Analysis: Organize papers by publication venues
- Project Tagging: Organize research into custom project categories
- Keyword Extraction: Automatically extract and link research keywords
- Abstract Analysis: Process paper abstracts for semantic connections
- Batch Operations: Refresh and update entire research collections
- Citation Analysis: Track how papers reference each other
- Author Collaborations: Visualize research partnerships
- Topic Clustering: Group related research by keywords and concepts
- Venue Networks: Understand publication patterns across journals/conferences
- Python 3.8+
- Neo4j Database (Community or Enterprise Edition)
- Semantic Scholar API Key (optional, but recommended for higher rate limits)
- Download Neo4j Desktop
- Install and create a new database
- Set username:
neo4jand password:neo4j(or your preferred credentials) - Start the database (default URL:
bolt://localhost:7687)
# Ubuntu/Debian wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add - echo 'deb https://debian.neo4j.com stable 4.4' | sudo tee /etc/apt/sources.list.d/neo4j.list sudo apt update sudo apt install neo4j # macOS with Homebrew brew install neo4j # Start Neo4j service sudo systemctl start neo4j # or neo4j startdocker run \ --name neo4j-research \ -p 7474:7474 -p 7687:7687 \ -d \ -v $HOME/neo4j/data:/data \ -v $HOME/neo4j/logs:/logs \ -v $HOME/neo4j/import:/var/lib/neo4j/import \ -v $HOME/neo4j/plugins:/plugins \ --env NEO4J_AUTH=neo4j/neo4j \ neo4j:latest- Clone the repository:
git clone https://github.com/schladt/Researchers-Toolkit.git cd Researchers-Toolkit- Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtCreate a .env file in the project root or set environment variables:
# Required: Neo4j Database Connection export NEO4J_URL=bolt://localhost:7687 export NEO4J_USER=neo4j export NEO4J_PASSWORD=neo4j # Optional: Semantic Scholar API Key (for higher rate limits) export S2_API_KEY=your_semantic_scholar_api_key_here- Visit Semantic Scholar API
- Sign up for a free account
- Generate an API key
- Add it to your environment variables
Note: The tool works without an API key but with lower rate limits.
python rtk.py-
Start with a Search:
- Choose option 1 for keyword search
- Choose option 2 for author search
- Choose option 3 for direct paper lookup
-
Add Papers to Graph:
- Use 'g' to add individual papers
- Use 'a' to add papers with all citations and references (slower but comprehensive)
- Use 'gk' or 'ak' to include keyword extraction
-
Set Project Tags:
- Use option 5 to set project tags for organizing your research
-
Build Citation Networks:
- Use option 4 to refresh references for all papers in your database
>>> 1 # Search by keyword >>> machine learning transformers # Enter search term >>> 0 # Select first paper >>> gk # Add paper with keywords to graph >>> 5 # Set project tags >>> deep learning, nlp, transformers # Set tags >>> 4 # Refresh all references (builds citation network) The tool creates a rich knowledge graph with the following node types and relationships:
- Paper: Research papers with metadata (title, abstract, year, citation count, etc.)
- Author: Researchers and their information
- Venue: Journals, conferences, and publication venues
- Keyword: Extracted terms from abstracts and titles
- Tag: Custom project organization tags
- REFERENCES: Paper A references Paper B
- AUTHORED_BY: Paper written by Author
- PUBLISHED_IN: Paper published in Venue
- HAS_KEYWORD: Paper contains Keyword
- TAGGED: Entity belongs to project Tag
- Refresh all paper references automatically
- Extract keywords from existing papers
- Update citation counts and metadata
- Automatic tokenization and lemmatization
- Stop word removal
- Keyword extraction from abstracts and titles
- Built-in rate limiting for API calls
- Exponential backoff for error handling
- Concurrent processing for efficiency
- Primary: Semantic Scholar API - Comprehensive academic paper database
- Coverage: 200+ million papers across computer science, biomedical sciences, and more
- Data: Abstracts, citations, author information, venue details, and paper metrics
- neo4j: Graph database driver
- requests: HTTP library for API calls
- nltk: Natural language processing
- prompt-toolkit: Interactive command-line interface
- python-dotenv: Environment variable management
- tqdm: Progress bars
- ratelimit: API rate limiting
- backoff: Retry logic with exponential backoff
See requirements.txt for complete dependency list with versions.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
-
Neo4j Connection Error:
- Ensure Neo4j is running
- Check connection URL and credentials
- Verify firewall settings for port 7687
-
API Rate Limiting:
- Get a Semantic Scholar API key for higher limits
- The tool includes automatic retry logic
-
Memory Issues with Large Graphs:
- Use project tags to organize research
- Consider processing papers in smaller batches
For issues and questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Refer to Neo4j and Semantic Scholar documentation
- Advanced graph analytics and visualization
- Export functionality (GraphML, CSV, etc.)
- Local database search and filtering
- Paper recommendation system
- Integration with reference managers
- Web interface for graph exploration
Author: Mike Schladt (2025)
Repository: https://github.com/schladt/Researchers-Toolkit