大佬教程收集整理的这篇文章主要介绍了渣男,你为什么有这么多小姐姐的照片?因为我Python爬虫学的好啊❤️!,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
requests
和 BeautifulSoup
。
下拉滚动条或鼠标滚轮滚动到页面底部时才会动态即时加载下一页新内容
c;这就需要我们认真观察网络
选项卡中页面动态加载时的变化了。 经过观察c;可以看到每次加载新的图册时c;都会显示有数字标记的页面请求c;点击查看详情c;可以看到新的页面请求c;如下图所示: url_pattern = "https://www.mmkk.me/category/weimei/{}/"
for i in range(1,11):
url = url_pattern.format(i)
BeautifulSoup
解析获取。 首先需要获取图册的地址c;使用浏览器“开发者工具”c;定位图册链接:
@H_197_5@url = url_pattern.format(1)
response = requests.get(url=url, headers=headers)
# 解码
response.encoding = 'utf-8'
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
results = soup.find_all('a',attrs={"class":"item-link"})
# 打印查看获取的图册链接
for j in results:
print(j.attrs['href'])
# 为了对每个图册分别存储c;获取每个图册的标题信息
path_name=j.get_text().Strip()
@H_197_5@打印查看获取的图册链接:
https://www.mmkk.me/weimei/5537.html
https://www.mmkk.me/weimei/5438.html
https://www.mmkk.me/weimei/5991.html
https://www.mmkk.me/weimei/6278.html
https://www.mmkk.me/weimei/5827.html
@H_197_5@获取图册地址后c;就可以请求图册页面c;然后在图册中解析获取每张图片的地址(方法与解析图册地址类似):
@H_197_5@url_imgs = results[0].attrs['href']
response_imgs = requests.get(url=url_imgs, headers=headers)
# 解码
response_imgs.encoding = 'utf-8'
response_imgs.raise_for_status()
soup_imgs = BeautifulSoup(response_imgs.text, 'html.parser')
results_imgs = soup_imgs.find_all('div',attrs={"data-fancybox":"gallery"})
# 打印查看获取的图片地址
for k in range(len(results_imgs)):
print(results_imgs[k].attrs['data-src'])
@H_197_5@打印查看获取的图片地址链接:
https://imgs.mmkk.me/wmnv/img/20190625081910-5d11d8fe5422b.png
https://imgs.mmkk.me/wmnv/img/20190625081910-5d11d8feae474.png
https://imgs.mmkk.me/wmnv/img/20190625081911-5d11d8ff282b1.png
...
if not os.path.exists(path_name):
os.@H_986_80@makedirs(path_name, exist_ok=True)
file_name = path_name +'_'+str(k+1)+'.png'
file_name = os.path.join(path_name, name_webp)
@H_197_5@然后将获取的图片内容写入文件中:
# 以第一张图片为例
r = requests.get(results_imgs[0].attrs['data-src'], headers=headers)
if r.status_code == 200:
with open(file_name, 'wb') as f:
f.write(r.content)
import time
import requests
from bs4 import BeautifulSoup
import os
import random
url_pattern = "https://www.mmkk.me/category/weimei/{}/"
headers = {
'user-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 Edg/92.0.902.62',
'Connection': 'keep-alive'
}
# 爬取前5页
for i in range(1, 6):
time.sleep(10)
url = url_pattern.format(i)
response = requests.get(url=url, headers=headers)
# 解码
response.encoding = 'utf-8'
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# 相册链接
results = soup.find_all('a',attrs={"class":"item-link"})
# 循环所有相册链接
for j in results:
time.sleep(random.randint(8,13))
url_imgs = j.attrs['href']
# 相册名
path_name = j.get_text().Strip()
# 创建图片保存路径
if not os.path.exists(path_name):
os.@H_986_80@makedirs(path_name, exist_ok=True)
response_imgs = requests.get(url=url_imgs, headers=headers)
# 解码
response_imgs.encoding = 'utf-8'
response_imgs.raise_for_status()
soup_imgs = BeautifulSoup(response_imgs.text, 'html.parser')
# 图片链接
results_imgs = soup_imgs.find_all('div',attrs={"data-fancybox":"gallery"})
# 循环所有图片链接
for k in range(len(results_imgs)):
img = results_imgs[k].attrs['data-src']
file_name = path_name + '_' + str(k+1) + '.png'
file_name = os.path.join(path_name, file_name)
if not os.path.exists(file_name):
time.sleep(random.randint(3,8))
r = requests.get(img, headers=headers)
if r.status_code == 200:
with open(file_name, 'wb') as f:
f.write(r.content)
以上是大佬教程为你收集整理的渣男,你为什么有这么多小姐姐的照片?因为我Python爬虫学的好啊❤️!全部内容,希望文章能够帮你解决渣男,你为什么有这么多小姐姐的照片?因为我Python爬虫学的好啊❤️!所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。