Paper-Agent
Paper-Agent helps researchers automatically create detailed reports on a specific topic. It tackles the challenge of spending countless hours reading and analyzing academic papers to understand a field. Researchers, academics, and students would find this tool valuable for quickly getting a comprehensive overview of a subject. Unlike simple summarization tools, Paper-Agent uses a team of automated assistants to find, read, analyze, and synthesize information, ultimately producing in-depth reports that highlight trends and key findings. This system significantly speeds up the research process and provides more insightful analysis than a person could achieve alone.
README
<!-- <h1 align="center">基于多智能体和工作流的大模型的调研报告生成系统</h1> --> <h1 align="center">Paper-Agent: Intelligent Academic Survey Report Generation System</h1> <p align="center"> Languages: English · <a href="./docs/README_cn.md">简体中文</a> </p> [](LICENSE) [](https://www.python.org/) ## 📖 Introduction **Paper-Agent** is an automated survey report generation system for researchers, designed to address the pain points of "time-consuming and shallow analysis" in academic paper research. It is not a simple literature summarization tool, but an intelligent domain research assistant with full-process capabilities of "retrieval-reading-analysis-synthesis-report" that can generate in-depth and insightful domain survey reports. ## 📸 Project Preview <summary>Click to enlarge screenshots</summary> | Screenshot 1 | Screenshot 2 | Screenshot 3 | |-------|-------|-------| | <img width="400" src="https://github.com/user-attachments/assets/b3617fee-ab47-4aac-9be7-0cb543fd706a" /> | <img width="400" src="https://github.com/user-attachments/assets/a27882fb-3bd8-4f44-b18f-8161bb0d44a6" /> | <img width="400" src="https://github.com/user-attachments/assets/18f2f0bc-6d2c-4b5f-a2b9-a87d16fcd6be" /> | | Screenshot 4 | Screenshot 5 | Screenshot 6 | | <img width="400" src="https://github.com/user-attachments/assets/21e5dc93-1c8b-46e3-b33c-f359d94cf2db" /> | <img width="400" src="https://github.com/user-attachments/assets/1e21162d-e083-40bc-93de-08302f28b08b" /> | <img width="400" src="https://github.com/user-attachments/assets/77738e3d-7d80-4d8c-9ea4-61c45e3db5d6" /> | </div> ## ✨ Core Features - 🤖 **Multi-Agent Collaboration Architecture**: Based on the AutoGen framework, adopting a multi-agent collaboration model covering retrieval, reading, analysis, writing, and other agents to automatically collaborate on complex tasks - 📚 **Intelligent Literature Retrieval**: Converts natural language queries into precise search conditions with manual review support, retrieves relevant academic papers from arXiv - 🔍 **Structured Information Extraction**: Intelligent reading extracts core problems, technical approaches, experimental results, datasets, limitations, and other key information from papers, outputting standardized JSON structures - 🧠 **In-Depth Domain Analysis**: Through three-stage processes of cluster analysis, deep analysis, and global analysis, identifies research trends and emerging topics - ✍️ **Domain Survey Report Generation**: Integrates analysis results into academically structured reports with clear logic, supporting Markdown format output - 🔄 **Real-time Streaming Output**: Based on SSE (Server-Sent Events) technology, pushes task progress to the frontend in real-time - ⚡ **Parallel Processing Optimization**: Supports parallel paper reading, parallel cluster analysis, and parallel chapter writing, significantly improving processing efficiency - 🔧 **Modular Design**: Decoupled functional modules, built on LangGraph for workflow, easy to extend and maintain - 💾 **Vector Database Support**: Uses ChromaDB to store extracted paper information, supporting retrieval-augmented writing - 👥 **User Interaction Review**: Introduces manual review at key steps to ensure query conditions and generated content meet expectations ## System Architecture **Here is a brief introduction. For more detailed information about system architecture, node implementation, and agent collaboration, please refer to the [design.md](design.md) document.** Paper-Agent adopts a modular design, builds a complete workflow based on LangGraph, and works through six core nodes: ### Core Nodes 1. **search_agent_node (Paper Search Node)** - Uses LLM to convert user natural language requirements into structured query conditions - Performs manual review through user proxy (userProxyAgent) - Calls PaperSearcher to retrieve relevant papers from arXiv - Supports query conditions: querys, start_date, end_date 2. **reading_agent_node (Paper Reading Node)** - Processes multiple papers in parallel, extracting core information from each paper - Extracts according to predefined models: core problem, key methods, datasets, evaluation metrics, main results, limitations, contributions - Stores extracted results in vector database for subsequent retrieval augmentation 3. **analyse_agent_node (Paper Analysis Node)** - **PaperClusterAgent**: Uses embedding vectors and KMeans algorithm for paper clustering, automatically determining cluster count - **DeepAnalyseAgent**: Performs in-depth analysis on each cluster, including technical approaches, method comparisons, application domains, etc. - **GlobalanalyseAgent**: Summarizes all cluster analysis results to generate a global analysis report containing six major modules 4. **writing_agent_node (Writing Node)** - **writing_director_node**: Generates report outline and splits it into writing sub-tasks based on user requirements and global analysis - **parallel_writing_node**: Executes all writing sub-tasks in parallel, using multi-agent collaboration to complete chapter writing - Supports retrieval-augmented writing and quality review 5. **report_agent_node (Report Generation Node)** - Summarizes all written chapters to generate a complete survey report - Outputs in Markdown format, automatically adding transitional sentences - Streaming output, pushing generation progress in real-time ### Sub-Agent Architecture **Writing Module Sub-Agents** - **writing_agent**: Responsible for writing chapter content based on sub-tasks - **retrieval_agent**: Retrieves relevant content from vector database to supplement writing materials - **review_agent**: Reviews writing content quality, outputs "APPROVE" to terminate sub-task upon approval **Analysis Module Sub-Agents** - **PaperClusterAgent**: Paper cluster analysis, generates topic - **DeepAnalyseAgent**: In-depth analysis of single descriptions and keywords clusters - **GlobalanalyseAgent**: Global analysis, generates six-module report ### Workflow Architecture - **Orchestrator Module** - Builds complete workflow based on LangGraph - Coordinates orderly execution of nodes - Manages global state and error handling - Pushes task progress to frontend in real-time via SSE - **State Management** - Uses State to manage global state - Implements frontend-backend communication through queues - Supports real-time state push ## Workflow The system builds a complete workflow based on LangGraph and completes survey report generation through six core nodes: ### Complete Process 1. **Input Query**: User provides research topic or question 2. **Paper Retrieval**: System automatically generates query conditions with manual review support, retrieves relevant papers from arXiv 3. **Paper Reading**: Processes multiple papers in parallel, extracts and structures core information 4. **In-Depth Analysis**: - Cluster Analysis: Groups papers by topic - Deep Analysis: Performs in-depth analysis on each cluster including technical approaches, method comparisons, etc. - Global Analysis: Summarizes all cluster results to generate six-module report 5. **Content Generation**: - Generate Outline: Generates report outline based on user requirements and global analysis - Task Splitting: Parses outline into parallel executable writing sub-tasks - Parallel Writing: Uses multi-agent collaboration to complete chapter writing 6. **Report Integration**: Summarizes all chapters to generate complete Markdown format survey report ### Key Features - **Real-time Streaming Output**: Based on SSE technology, pushes task progress to frontend in real-time - **Parallel Processing Optimization**: Parallel paper reading, parallel cluster analysis, parallel chapter writing - **User Interaction Review**: Introduces manual review at key steps - **Retrieval [truncated…]
PUBLIC HISTORY
IDENTITY
Identity inferred from code signals. No PROVENANCE.yml found.
Is this yours? Claim it →METADATA
README BADGE
Add to your README:
