https://github.com/0xalanyin/zhihu-spider
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: 0xAlanYin
- Language: JavaScript
- Default Branch: main
- Size: 91.8 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 1 year ago
· Last pushed about 1 year ago
Metadata Files
Readme
README.md
知乎热点问题采集系统
这是一个简单的知乎热点问题采集系统,可以定期抓取知乎热榜的前20个问题,并存储到本地SQLite数据库中,同时提供简单的前端界面查看数据。
功能特点
- 使用Playwright模拟浏览器环境,支持登录态
- 定时抓取知乎热榜前20个问题
- 数据去重保存到SQLite数据库
- 提供简单的前端界面查看数据
- 可配置cookies、抓取数量、抓取间隔等参数
安装步骤
- 安装Node.js环境(推荐v16+)
- 克隆本仓库
- 安装依赖
bash
npm install
- 安装Playwright浏览器
bash
npx playwright install chromium
使用方法
启动系统
bash
npm start
这将启动后端API服务和前端开发服务器。
- 后端API服务运行在 http://localhost:3000
- 前端界面运行在 http://localhost:5173
测试爬虫功能
bash
node src/crawler/test.js
配置说明
系统启动后,可以通过前端界面的"系统配置"标签页修改以下配置:
- cookies: 知乎登录cookies
- fetchCount: 每次抓取的问题数量
- fetchInterval: 抓取间隔(毫秒)
目录结构
├── README.md
├── data/ # 数据文件目录
│ └── zhihu.db # SQLite数据库文件
├── src/
│ ├── api/ # API接口
│ ├── config/ # 配置文件
│ ├── crawler/ # 爬虫模块
│ ├── db/ # 数据库模块
│ ├── frontend/ # 前端界面
│ └── index.js # 入口文件
└── package.json
注意事项
- 本系统依赖cookies方式实现知乎登录,需要手动更新cookies
- 爬虫行为需要遵守知乎的robots协议
- 系统仅供学习研究使用
Owner
- Login: 0xAlanYin
- Kind: user
- Repositories: 1
- Profile: https://github.com/0xAlanYin
Evolving.....
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2
Dependencies
package-lock.json
npm
- 574 dependencies
package.json
npm
- @vitejs/plugin-react ^4.2.1 development
- jest ^29.7.0 development
- nodemon ^3.0.2 development
- vite ^5.0.11 development
- antd ^5.13.0
- cors ^2.8.5
- express ^4.18.2
- node-cron ^3.0.3
- playwright ^1.40.0
- react ^18.2.0
- react-dom ^18.2.0
- sqlite3 ^5.1.6