Paper-Agent

provenance:github:Tswoen/Paper-Agent

WHAT THIS AGENT DOES

Paper-Agent helps researchers automatically create detailed reports on a specific topic. It tackles the challenge of spending countless hours reading and analyzing academic papers to understand a field. Researchers, academics, and students would find this tool valuable for quickly getting a comprehensive overview of a subject. Unlike simple summarization tools, Paper-Agent uses a team of automated assistants to retrieve, read, analyze, and synthesize information, ultimately producing in-depth reports that highlight trends and emerging topics. This system significantly speeds up the research process and provides more insightful analysis than a person could achieve alone.

View Source ↗First seen 10mo agoNot yet hireable

README

<!-- <h1 align="center">基于多智能体和工作流的大模型的调研报告生成系统</h1> -->
<h1 align="center">Paper-Agent: Intelligent Academic Survey Report Generation System</h1>

<p align="center">
  Languages: 
  English ·
  <a href="./docs/README_cn.md">简体中文</a>
</p>

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/)
## 📖 Introduction

**Paper-Agent** is an automated survey report generation system for researchers, designed to address the pain points of "time-consuming and shallow analysis" in academic paper research. It is not a simple literature summarization tool, but an intelligent domain research assistant with full-process capabilities of "retrieval-reading-analysis-synthesis-report" that can generate in-depth and insightful domain survey reports.

## 📸 Project Preview

<summary>Click to enlarge screenshots</summary>

| Screenshot 1 | Screenshot 2 | Screenshot 3 |
|-------|-------|-------|
| <img width="400" src="https://github.com/user-attachments/assets/b3617fee-ab47-4aac-9be7-0cb543fd706a" /> | <img width="400" src="https://github.com/user-attachments/assets/a27882fb-3bd8-4f44-b18f-8161bb0d44a6" /> | <img width="400" src="https://github.com/user-attachments/assets/18f2f0bc-6d2c-4b5f-a2b9-a87d16fcd6be" /> |
| Screenshot 4 | Screenshot 5 | Screenshot 6 |
| <img width="400" src="https://github.com/user-attachments/assets/21e5dc93-1c8b-46e3-b33c-f359d94cf2db" /> | <img width="400" src="https://github.com/user-attachments/assets/1e21162d-e083-40bc-93de-08302f28b08b" /> | <img width="400" src="https://github.com/user-attachments/assets/77738e3d-7d80-4d8c-9ea4-61c45e3db5d6" /> |

</div>

## ✨ Core Features

- 🤖 **Multi-Agent Collaboration Architecture**: Based on the AutoGen framework, adopting a multi-agent collaboration model covering retrieval, reading, analysis, writing, and other agents to automatically collaborate on complex tasks
- 📚 **Intelligent Literature Retrieval**: Converts natural language queries into precise search conditions with manual review support, retrieves relevant academic papers from arXiv
- 🔍 **Structured Information Extraction**: Intelligent reading extracts core problems, technical approaches, experimental results, datasets, limitations, and other key information from papers, outputting standardized JSON structures
- 🧠 **In-Depth Domain Analysis**: Through three-stage processes of cluster analysis, deep analysis, and global analysis, identifies research trends and emerging topics
- ✍️ **Domain Survey Report Generation**: Integrates analysis results into academically structured reports with clear logic, supporting Markdown format output
- 🔄 **Real-time Streaming Output**: Based on SSE (Server-Sent Events) technology, pushes task progress to the frontend in real-time
- ⚡ **Parallel Processing Optimization**: Supports parallel paper reading, parallel cluster analysis, and parallel chapter writing, significantly improving processing efficiency
- 🔧 **Modular Design**: Decoupled functional modules, built on LangGraph for workflow, easy to extend and maintain
- 💾 **Vector Database Support**: Uses ChromaDB to store extracted paper information, supporting retrieval-augmented writing
- 👥 **User Interaction Review**: Introduces manual review at key steps to ensure query conditions and generated content meet expectations

## System Architecture

**Here is a brief introduction. For more detailed information about system architecture, node implementation, and agent collaboration, please refer to the [design.md](design.md) document.**

Paper-Agent adopts a modular design, builds a complete workflow based on LangGraph, and works through six core nodes:

### Core Nodes

1. **search_agent_node (Paper Search Node)**
   - Uses LLM to convert user natural language requirements into structured query conditions
   - Performs manual review through user proxy (userProxyAgent)
   - Calls PaperSearcher to retrieve relevant papers from arXiv
   - Supports query conditions: querys, start_date, end_date

2. **reading_agent_node (Paper Reading Node)**
   - Processes multiple papers in parallel, extracting core information from each paper
   - Extracts according to predefined models: core problem, key methods, datasets, evaluation metrics, main results, limitations, contributions
   - Stores extracted results in vector database for subsequent retrieval augmentation

3. **analyse_agent_node (Paper Analysis Node)**
   - **PaperClusterAgent**: Uses embedding vectors and KMeans algorithm for paper clustering, automatically determining cluster count
   - **DeepAnalyseAgent**: Performs in-depth analysis on each cluster, including technical approaches, method comparisons, application domains, etc.
   - **GlobalanalyseAgent**: Summarizes all cluster analysis results to generate a global analysis report containing six major modules

4. **writing_agent_node (Writing Node)**
   - **writing_director_node**: Generates report outline and splits it into writing sub-tasks based on user requirements and global analysis
   - **parallel_writing_node**: Executes all writing sub-tasks in parallel, using multi-agent collaboration to complete chapter writing
   - Supports retrieval-augmented writing and quality review

5. **report_agent_node (Report Generation Node)**
   - Summarizes all written chapters to generate a complete survey report
   - Outputs in Markdown format, automatically adding transitional sentences
   - Streaming output, pushing generation progress in real-time

### Sub-Agent Architecture

**Writing Module Sub-Agents**
- **writing_agent**: Responsible for writing chapter content based on sub-tasks
- **retrieval_agent**: Retrieves relevant content from vector database to supplement writing materials
- **review_agent**: Reviews writing content quality, outputs "APPROVE" to terminate sub-task upon approval

**Analysis Module Sub-Agents**
- **PaperClusterAgent**: Paper cluster analysis, generates topic
- **DeepAnalyseAgent**: In-depth analysis of single descriptions and keywords
 clusters
- **GlobalanalyseAgent**: Global analysis, generates six-module report

### Workflow Architecture

- **Orchestrator Module**
  - Builds complete workflow based on LangGraph
  - Coordinates orderly execution of nodes
  - Manages global state and error handling
  - Pushes task progress to frontend in real-time via SSE

- **State Management**
  - Uses State to manage global state
  - Implements frontend-backend communication through queues
  - Supports real-time state push

## Workflow

The system builds a complete workflow based on LangGraph and completes survey report generation through six core nodes:

### Complete Process

1. **Input Query**: User provides research topic or question
2. **Paper Retrieval**: System automatically generates query conditions with manual review support, retrieves relevant papers from arXiv
3. **Paper Reading**: Processes multiple papers in parallel, extracts and structures core information
4. **In-Depth Analysis**:
   - Cluster Analysis: Groups papers by topic
   - Deep Analysis: Performs in-depth analysis on each cluster including technical approaches, method comparisons, etc.
   - Global Analysis: Summarizes all cluster results to generate six-module report
5. **Content Generation**:
   - Generate Outline: Generates report outline based on user requirements and global analysis
   - Task Splitting: Parses outline into parallel executable writing sub-tasks
   - Parallel Writing: Uses multi-agent collaboration to complete chapter writing
6. **Report Integration**: Summarizes all chapters to generate complete Markdown format survey report

### Key Features

- **Real-time Streaming Output**: Based on SSE technology, pushes task progress to frontend in real-time
- **Parallel Processing Optimization**: Parallel paper reading, parallel cluster analysis, parallel chapter writing
- **User Interaction Review**: Introduces manual review at key steps
- **Retrieval

[truncated…]

PUBLIC HISTORY

First discoveredMar 21, 2026

IDENTITY

inferred

Identity inferred from code signals. No PROVENANCE.yml found.

Is this yours? Claim it →

METADATA

platformgithub

first seenSep 1, 2025

last updatedMar 20, 2026

last crawled4 days ago

version—

README BADGE

Add to your README:

![Provenance](https://getprovenance.dev/api/badge?id=provenance:github:Tswoen/Paper-Agent)