Re: [求指导]如何利用脚本/程序从静雅思听网站下载音乐文件?
发表于 : 2014-02-18 15:40
用油猴啊。这倒是可以。
平时都是爬,要带复杂的js外挂模块,那很难,模块还没人维护了。
平时都是爬,要带复杂的js外挂模块,那很难,模块还没人维护了。
nae6taiyie0T 写了:@highwindhighwind 写了:哇塞,都不知道怎么感谢了,反正我也只是拿这个学学写脚本而已,如果有人提点再好不过啦。nae6taiyie0T 写了:我刚开试了下, 是可以批量下载的, 我同时下载了6首mp3, 没问题.highwind 写了:我是想一步一步来的,先学下一个 再看能不能下一批,因为每天每个类别都有更新。
因为蛮喜欢这个网站的,就是不知道这样多网站会不会有啥不利的。
Sent from Tapatal
如果考虑网站负担的话, 不妨在晚上睡觉前开始下载, 因为零晨以后的网速快, 网站的访问量也很小了.
我可以帮你写一个下载的程序, 刚才我想了一下, 可以有这些功能:
* 自动分类
* 自动将mp3的标签转为UTF8格式的, 因为它默认的是gbk, 会有乱码.
* 多线程下载.
那个乱码确实没法子,只好自己转了,跟网站提过建议的,不过他们还没回音;
分类的话,其实倒也不打紧,我现在都是按每篇文章来分类(手动建立文件夹而已),当然要是学会怎么自动分也很开心啊;
多线程不错啊,不过每个文件都不大,单线程一会儿也下完了,而且是不是相应可以多下几个文件呢(待求证);
另外如果是注册用户,每次下载好像都会有点小积分,不知道咱要是用脚本下,这个网站能不能识别呢?
已经写好, 需要的话可以联系我, 下载速度还行, 默认是3个下载线程.
我的gtalk 是 [email protected], 也可以发邮件
==========
17:33 更新
把程序上传到了github里, 这里: https://github.com/LiuLang/monkey-video ... er/justing
只需要下载那个justing.py文件, 然后安装必要的python3依赖包, 就可以运行了.
代码: 全选
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/lxml/cssselect.py", line 16, in <module>
external_cssselect = __import__('cssselect')
ImportError: No module named 'cssselect'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "justing.py", line 22, in <module>
from lxml.cssselect import CSSSelector as CSS
File "/usr/lib/python3/dist-packages/lxml/cssselect.py", line 18, in <module>
raise ImportError('cssselect seems not to be installed. '
ImportError: cssselect seems not to be installed. See http://packages.python.org/cssselect/
代码: 全选
Requirement already up-to-date: cssselect in /usr/local/lib/python2.7/dist-packages
Cleaning up...
代码: 全选
from urllib.error import URLError
ImportError: No module named error
代码: 全选
Downloading/unpacking cssselect
Downloading cssselect-0.9.1.tar.gz
Running setup.py egg_info for package cssselect
no previously-included directories found matching 'docs/_build'
Installing collected packages: cssselect
Running setup.py install for cssselect
no previously-included directories found matching 'docs/_build'
Successfully installed cssselect
Cleaning up...
代码: 全选
Requirement already satisfied (use --upgrade to upgrade): cssselect in /usr/local/lib/python3.3/dist-packages
Cleaning up...
代码: 全选
background:url('/stage/img2/bgdetaildeploydate.png') no-repeat
代码: 全选
CSS("style[background='*bgdetaildeploydate*']") ???
代码: 全选
#!/usr/bin/env python3
from urllib import request
from bs4 import BeautifulSoup as BS
JustingHtml = request.urlopen('http://www.justing.com.cn/index.jsp').read()
JustingSoup = BS(JustingHtml)
JustingSoupFind = JustingSoup.select('#tabs-3 > p > a[target="_blank"]')
for i in range(18):
print(JustingSoupFind[i].text)
代码: 全选
#!/usr/bin/env python3
import urllib.request
from bs4 import BeautifulSoup as BS
NUM = 18
def getHtml(url):
html = urllib.request.urlopen(url).read()
return html
def getMP3(html):
Soup = BS(html)
SoupFind = Soup.select('#tabs-3 > p > a[target="_blank"]')
return SoupFind
def editMP3(mp3):
mp3_E = list(range(NUM))
for i in range(NUM):
mp3_E[i] = mp3[i].text[0:(len(mp3[i].text)-1)]
# 或者用strip好像也行mp3_E[i] = mp3[i].text.strip('\xa0')
return mp3_E
Justingurl = 'http://www.justing.com.cn/index.jsp'
Justinghtml = getHtml(Justingurl)
Justingmp3 = getMP3(Justinghtml)
Justingmp3_E = editMP3(Justingmp3)
def main():
for i in range(NUM):
print('http://dl.justing.com.cn/page/{0}.mp3'.format(Justingmp3_E[i]))
if __name__=="__main__":
main()
代码: 全选
UnicodeEncodeError: 'ascii' codec can't encode characters in position
代码: 全选
wget -c --restrict-file-names=nocontrol -i URLLIST_file
代码: 全选
from lxml import html
from lxml.cssselect import CSSSelector as CSS
url = 'http://www.justing.com.cn/index.jsp'
tree = html.parse(url)
sel = CSS('div#tabs-3 > p > a[target="_blank"]')
elem=sel(tree)
elem[索引数].text
代码: 全选
from urllib import parse
parse.quote(文件名)