修复teleport软件中文爬取网站的错误。 http://blog.yoqi.me/?p=4081
liuyuqi 889cc9ceb3 更新 'README.md' | 6 years ago | |
---|---|---|
.vscode | 6 years ago | |
src | 6 years ago | |
.classpath | 6 years ago | |
.gitignore | 6 years ago | |
.project | 6 years ago | |
LICENSE | 7 years ago | |
README.md | 6 years ago | |
covert.php | 7 years ago | |
js_convert.py | 6 years ago | |
pom.xml | 6 years ago |
修复teleport软件中文爬取网站的错误。
查找 | 替换 |
---|---|
/\*tpa=.*\*/ | |
\btppabs="h[^"]*"或者tppabs="h[^"]*" | |
href="javascript:if\(confirm\('htt[^"]*" | href=www.xxx.com |
href=" *javascript:if\(confirm\('(htt[^"\s]*).*?" | href="$1" |
href="javascript:if(confirm([^"]*" | href="" |
css文件: | |
tpa=http://[^\s]*.gif | |
/\*tpa.*?\*/ |
中文乱码,使用工具:
http://others.yoqi.me/convert.php
由于普通正则表达式无法对正则表达式内继续正则匹配。增加js_convert.py,对项目中文件 href="javascript:if(confirm(%27http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=31011502004838 \n\nThis file was not retrie ved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want t o open it from the server?%27))window.location=%27http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=31011502004838%27" 批量更改为: href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=31011502004838"
java项目批量正则修改就不需要继续开发了。