|
2 years ago | |
---|---|---|
spiderNotices | 5 years ago | |
.gitignore | 5 years ago | |
LICENSE | 2 years ago | |
README.md | 5 years ago | |
requirements.txt | 5 years ago | |
scrapy.cfg | 5 years ago |
爬取东方财富网站的公司公告列表及文本内容。
3、项目运行,python 运行main.py脚本。或者切换到项目根目录,输入命令
scrapy crawl notices
4、运行结果:往数据库中存入数据,项目路径下生成log文件夹
spiderNotices.text_mongo.TextMongo对象的方法:
get_notices_single::获取单个股票数据
from spiderNotices.text_mongo import TextMongo
# 单个获取
result = TextMongo().get_notices_single('000001.SZ', '2010-01-01', '2012-12-31')
result = TextMongo().get_notices_single('000001.SZ')
# 多个获取
result = TextMongo().get_notices(['000001.SZ', '000002.SZ'])
# 遍历存有的股票
result = TextMongo().get_notices_stk()
SeleniumMiddleware
RandomUserAgent
ProxyIpMiddleware
[ ] pdf的文本提取和图片文字识别
[ ] 市场新闻的爬取