精易论坛

标题: 刚学习的python,获取列表帖子 [打印本页]

作者: 978069486    时间: 2019-12-31 22:03
标题: 刚学习的python,获取列表帖子

coding=utf-8


import requests

from lxml import etree

host = "https://125.confly.eu.org/"

def get_demo():

i = 0

while i <= 869:

i += 1


print(i)


    url = "https://125.confly.eu.org/forum-98-" + str(i) + ".html"
    print("=" * 40)
    print(url)
    header = {
        "Host": "125.confly.eu.org",
        "Connection": "keep-alive",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400",
        "X-Requested-With": "XMLHttpRequest",
        "Accept": "*/*",
        "Referer": "https://125.confly.eu.org/forum-98-1.html",
    }
    cookies = {
        "Cookie": "替换成自己的cookies"
    }
    req = requests.get(url, headers=header, cookies=cookies)

    text = req.content.decode('gbk',"ignore")
    html = etree.HTML(text)
    result = html.xpath("//*[@id='threadlisttableid']/tbody/tr/th/a[zxsq-anti-bbcode-2]")

    for a in result:
        str1 = a.xpath("text()")[zxsq-anti-bbcode-0]
        href = a.xpath("@href")[zxsq-anti-bbcode-0]
        if href.find("html") != -1:
            try:
                f.write(str1 + "----" + host + href + "\n")
                print(str1, host + href)
            except:
                print("编码有误",str1)

if name == 'main':

f = open('demo.txt', mode='w',encoding='utf-8')

get_demo()

f.close()



作者: 神女软件定制    时间: 2019-12-31 22:05
建议放弃。
作者: 凌云啊    时间: 2019-12-31 23:54
python写这些东西确实方便啊
作者: xuxuand    时间: 2020-2-24 12:18
666666666666




欢迎光临 精易论坛 (https://125.confly.eu.org/) Powered by Discuz! X3.4