https://github.com/arcadia-science/agent-literature-review
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Arcadia-Science
- License: other
- Language: Python
- Default Branch: main
- Size: 463 KB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
AI Laboratory for Scientific Research
A virtual laboratory of AI agents designed to facilitate collaborative scientific research discussions and planning. This project is a very rough prototype.
Note on Paper Access: The bioRxiv search functionality is limited and sometimes unstable due to API constraints. To address this limitation, a new
/read_foldercommand has been added, allowing agents to read and discuss papers from a local folder. This provides a more reliable alternative when online searches don't return the desired results.
Overview
This project creates a terminal-based application that simulates a laboratory environment with multiple AI agents, each with different specialties. The lab provides a space for:
- Collaborative discussions among AI agents and users
- Analysis of scientific papers and documents
- Research planning and ideation
- Cross-disciplinary knowledge synthesis
- Formalized multi-round discussions on focused topics
Features
- Multi-Agent Environment: Create a customizable team of AI expert agents with different specialties
- Direct Messaging: Address specific agents using @mention syntax
- Research Analysis: Analysis of scientific papers and research topics
- Enhanced Collaboration: Agents directly engage with each other and follow up on questions
- Visual Conversation Mapping: Track inter-agent communication patterns
- Structured Discussions: Run formal multi-round discussions on specific topics
- Research Integration: Connect to arXiv and bioRxiv for paper summaries
- Highly Customizable: Configure agents, specialties, and personalities via YAML
Installation
- Clone this repository
- Install dependencies:
bash
pip install -r requirements.txt
Configuration
Before running the application, set up your configuration file:
Copy the example configuration file:
bash cp config.yaml.example config.yamlEdit the
config.yamlfile:- Add your Anthropic API key
- Customize your user name
- Modify the AI agents and their specialties if desired
To get an Anthropic API key: 1. Go to https://console.anthropic.com/ 2. Sign up or log in to your account 3. Navigate to the API Keys section 4. Create a new API key 5. Copy the key and paste it in your config.yaml file
Usage
Run the application with:
bash
python -m src.main --config config.yaml
Interaction Features
- Regular Discussion: Type messages normally to engage all agents
- Direct Messaging: Use
@AgentName:at the beginning of a line to address specific agents - Structured Discussions: Type
/discuss <topic>to initiate a focused multi-round discussion
Example Interaction
``` === Welcome to the AI Laboratory, Erin! ===
Your AI research team: Lea (Biologist): Expert in molecular biology, genetics, and biochemical pathways. Emmy (Mathematician): Specialist in statistical analysis and complex data modeling. Marie (Physicist): Expert in quantum mechanics and simulation. Ada (Computer Scientist): Specialist in machine learning and algorithms. Cassandra (PI): As the lab director, guides discussions and integrates perspectives.
=== Instructions ===
• Type your message and press Enter to start the discussion
• To address specific agents directly, use '@AgentName:' at the start of a line
• Type '/discuss
=== Beginning of Discussion ===
Erin: /search protein language models --type biorxiv --results 3
🔍 Searching biorxiv for: protein language models
✅ Search Results (3 found)
Pre-trained protein language model for codon optimization https://www.biorxiv.org/content/10.1101/2024.12.12.628267v2 Published: 2024-12-12
Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation https://www.biorxiv.org/content/10.1101/2024.12.12.628175v1 Published: 2024-12-12
ProDualNet: Dual-Target Protein Sequence Design Method Based on Protein Language Model and Structure Model https://www.biorxiv.org/content/10.1101/2025.02.28.640919v1 Published: 2025-02-28
Type 'continue' to have agents download, read the full text, and discuss these papers in depth.
Erin: continue
🧠 Starting in-depth discussion of search results...
🔍 Agent roles assigned: Reader agents: Emmy, Lea Questioner agents: Marie, Ada, Cassandra
📚 2 agents are reviewing 3 papers while 3 agents will ask questions...
┌─── Emmy ────────────────────────────────────────────────────────────────────────┐ │ Alright, so I've just finished reading through a couple of really interesting │ │ papers that I think you all might want to hear about. │ │ The first one is "ProDualNet: Dual-Target Protein Sequence Design Method Based │ │ on Protein Language Model and Structure Model" by Liu Cheng and colleagues. It's │ │ a fresh preprint from late February this year. │ │ The second paper is "Protein Language Model Identifies Disordered, Conserved │ │ Motifs Driving Phase Separation" by Yumeng Zhang and team, which came out in │ │ December. │ │ Let's start with the ProDualNet paper. What caught my eye here is how they're │ │ combining protein language models with structure models for protein design. │ │ They've developed this method that can generate sequences that satisfy both │ │ sequence-based and structure-based constraints simultaneously. It's pretty │ │ clever - they're using a transformer-based language model for the sequence part │ │ and AlphaFold2 for the structure predictions. │ └──────────────────────────────────────────────────────────────────────────────────┘
┌─── Lea ─────────────────────────────────────────────────────────────────────────┐ │ Alright, so I've just finished reading this paper titled "Pre-trained protein │ │ language model for codon optimization" by Pathak and Lin. It's a pretty │ │ interesting piece that dives into using language models for codon optimization │ │ in protein synthesis. │ │ The main thing that caught my attention is how they're applying techniques from │ │ natural language processing to biological sequences. They've developed this │ │ model called PALM-CO that's pre-trained on a massive dataset of protein │ │ sequences and then fine-tuned for codon optimization tasks. │ │ What's cool is that they're not just looking at individual codons in isolation, │ │ but considering the broader sequence context. Their model takes into account │ │ things like codon pair bias and GC content, which can have a big impact on │ │ protein expression levels. │ └──────────────────────────────────────────────────────────────────────────────────┘
┌─── Marie ───────────────────────────────────────────────────────────────────────┐ │ @Emmy: I'm intrigued by the ProDualNet paper's approach to balancing sequence │ │ and structure constraints. Have they discussed how sensitive their results are │ │ to the weighting between these two objectives? I'm wondering if there's a risk │ │ of overfitting to one aspect at the expense of the other, especially given the │ │ computational complexity of structural predictions. │ └────────────────────────────────────────────────────────────────[ To: Emmy ]─┘
┌─── Emmy ────────────────────────────────────────────────────────────────────────┐ │ @Marie: Great question about the ProDualNet paper! They actually do address some │ │ of your concerns, though not exhaustively. The authors used a weighted sum of │ │ sequence and structure losses, with weights of 1 and 0.5 respectively. They │ │ found this balance worked well, but you're right to wonder about sensitivity - │ │ they don't report extensive experiments varying these weights. │ │ On overfitting, they employed early stopping based on validation loss to │ │ mitigate this risk. However, they don't deeply explore potential tradeoffs │ │ between sequence and structure optimization. It's a valid concern that │ │ optimizing too heavily for one aspect could come at the expense of the other. │ └─────────────────────────────────────────────────────────────────[ To: Marie ]─┘
┌─── Cassandra ───────────────────────────────────────────────────────────────────┐ │ Thanks everyone for the great discussion on these fascinating papers. I think │ │ we've covered a lot of ground here. │ │ To recap, we looked at "ProDualNet: Dual-Target Protein Sequence Design Method │ │ Based on Protein Language Model and Structure Model", "Protein Language Model │ │ Identifies Disordered, Conserved Motifs Driving Phase Separation", and │ │ "Pre-trained protein language model for codon optimization". │ │ A few key insights emerged: │ │ 1. The ProDualNet paper's approach of using weighted losses to balance sequence │ │ and structure constraints is intriguing, but as @Emmy pointed out, we should be │ │ cautious about how robust this 1:0.5 weighting is across diverse protein │ │ families. │ │ 2. The PALM-CO model for codon optimization considers broader sequence context, │ │ which @Lea highlighted could have interesting applications in protein │ │ engineering. However, its applicability to non-natural amino acids remains an │ │ open question. │ └──────────────────────────────────────────────────────────────────────────────────┘ ```
Architecture and Design
The codebase follows a modular structure:
src/: Main source codeagents/: Contains agent-related codebase.py: Base agent implementation with conversation capabilitieslab.py: Laboratory implementation managing multi-agent interactionsconfig/: Configuration handlingutils/: Utility functions including paper retrievalmain.py: Application entry point
This project was developed with the assistance of Claude Code.
Comparison with AgentLaboratory
This project is inspired by AgentLaboratory but takes a different approach:
- Role Structure: Uses peer-based specialists rather than hierarchical roles
- Conversation Flow: Focuses on natural multi-agent discussions rather than sequential research phases
- Communication Style: Implements @mention system for direct inter-agent communication
- Paper Search: Retrieves and analyzes papers from arXiv and bioRxiv
See docs/AGENT_COLLABORATION.md for a detailed comparison.
Requirements
- Python 3.8+
- OpenAI API key
- Dependencies:
- openai>=1.0.0
- PyYAML>=6.0
- PyPDF2>=3.0.0 (for PDF document processing)
- requests>=2.25.1
- feedparser>=6.0.0 (for research APIs)
Advanced Usage
Structured Discussions
Use the /discuss command to initiate a structured, multi-round discussion:
/discuss How might we develop more energy-efficient quantum computing architectures?
This starts a focused discussion where agents: 1. Provide initial perspectives 2. Build on each other's ideas 3. Work toward concrete outcomes
Research Integration
Ask questions that require research:
What are the latest developments in large language model few-shot learning?
Agents will integrate information from arXiv and bioRxiv when appropriate.
Paper Search and Reading
The system provides two ways to find and discuss scientific papers:
1. Online Search via /search
/search protein folding --type arxiv --results 5
/search cancer immunotherapy --type biorxiv --months 6 --results 3
Important Note on bioRxiv Search:
The bioRxiv API does not provide full search functionality. Instead, it can only:
- List papers published within a specified time range (using the --months parameter)
- Return the most recent papers that include certain terms
- Results are deduplicated to show only the latest version of each paper
This means bioRxiv searches may not be as precise as arXiv searches. For specific papers or detailed searches, using the /read_folder command with local files may be more effective.
2. Local Paper Reading via /read_folder
For cases where you have specific papers you want to discuss or when online search doesn't yield the needed results, you can use local PDF files:
/read_folder ~/papers/quantum_computing
This command will: 1. Scan the specified folder for PDF files 2. Display a list of found papers (based on filenames) 3. After typing 'continue', agents will read and discuss these papers
The /read_folder command provides a reliable alternative to online searches, especially when:
- You have specific papers you want the agents to analyze
- Online searches don't return the exact papers needed
- You need to discuss papers that might not be available through arXiv or bioRxiv
Both workflows follow the same pattern - after search or folder selection, type 'continue' to have agents read and discuss the papers.
Owner
- Name: Arcadia Science
- Login: Arcadia-Science
- Kind: organization
- Location: United States of America
- Website: https://www.arcadiascience.com/
- Twitter: ArcadiaScience
- Repositories: 16
- Profile: https://github.com/Arcadia-Science
GitHub Events
Total
- Watch event: 10
- Push event: 2
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 10
- Push event: 2
- Public event: 1
- Fork event: 1
Dependencies
- PyPDF2 >=3.0.0
- PyYAML >=6.0
- anthropic >=0.15.0
- arxiv >=1.4.7
- beautifulsoup4 >=4.12.0
- feedparser >=6.0.0
- matplotlib >=3.5.0
- numpy >=1.20.0
- pandas >=2.0.0
- requests >=2.25.1
- tiktoken >=0.5.0