精易论坛
标题:
爬取猫眼电影
[打印本页]
作者:
ideologism
时间:
2019-1-6 12:27
标题:
爬取猫眼电影
import
json
from
requests.exceptions
import
RequestException
import
requests
import
re
from
multiprocessing
import
Pool
def
get_url
(
url
)
:
try
:
reponse
=
requests.
get
(
url
)
if
reponse.status_code
==
200
:
return
reponse.text
return None
except
RequestException
:
return None
def
parse_page
(
html
)
:
res
=
re.
findall
(
'<dd>.*?"board-index.*?board-index.*?">(\d*)</i>.*?title="(.*?)".*?data-src="(.*?)".*?<p.*?"star">(.*?)</p>.*?"releasetime">(.*?)</p>.*?</dd>'
,
html
,re.S)
# print(res)
for
result
in
res
:
yield
{
'index'
:
result[
0
],
'title'
:
result[
1
],
'url'
:
result[
2
],
'name'
:
result[
3
].
strip
()[
3
:
],
'time'
:
result[
4
].
strip
()[
5
:
]
}
def
with_open
(
result
)
:
with
open
(
'爬猫影电影网top100.txt'
,
'a'
,
encoding
=
'utf8'
)
as
f
:
f.
write
(json.
dumps
(
result
,
ensure_ascii
=
False
)
+
'
\n
'
)
f.
close
()
def
main
(
i
)
:
url
=
'https://maoyan.com/board/4?offset='
+
str
(
i
)
html
=
get_url
(url)
# print(html)
for
result
in
parse_page
(html)
:
with_open
(result)
if
__name__
==
'__main__'
:
pool
=
Pool
()
pool.
map
(main,[i
*
10
for
i
in
range
(
10
)])
作者:
ideologism
时间:
2019-1-6 12:29
这个是爬取猫眼电影top100的电影,做的不是太好
作者:
ideologism
时间:
2019-1-6 12:29
希望大佬们多提出些宝贵的建议
作者:
犹豫的流星
时间:
2019-3-7 08:48
可以可以
欢迎光临 精易论坛 (https://125.confly.eu.org/)
Powered by Discuz! X3.4