分类导航

程序问答发布时间：2022-06-02 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了Selenium 抓取 HTTP 错误 403：使用 wget 时的 ModSecurity 操作，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决SELEnium 抓取 http 错误 403：使用 wget 时的 ModSecurity 操作？

开发过程中遇到SELEnium 抓取 http 错误 403：使用 wget 时的 ModSecurity 操作的问题如何解决？下面主要结合日常开发的经验，给出你关于SELEnium 抓取 http 错误 403：使用 wget 时的 ModSecurity 操作的解决方法建议，希望对你解决SELEnium 抓取 http 错误 403：使用 wget 时的 ModSecurity 操作有所启发或帮助；

我试图从网站上抓取图片。我设法获得了图像的链接，但是当我使用 wget 下载图像时，我不断收到 http 错误 403：ModSecurity Action

这是我的代码

from SELEnium import webdriver
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.support.wait import webdriverwait
import wget
import time
import os
import urllib.request

driver = webdriver.Chrome('Chromedriver-path')
url = ("https://www.ancuong.com/vi/san-pham/san-pham-chinh/van-mfc--cac-loai-van-phu-melamine/melamine-phu-tren-mdf-melamine-mdf/page-woodgrain.HTML")
driver.get(url)

n = 0
while n <= 1500:
    driver.execute_script("window.scrollTo(0,{})".format(n))
    n+=200
    time.sleep(0.1)

images = webdriverwait(driver,60).until(
            EC.visibility_of_all_elements_located((By.CLASS_name,'load-done'))
        )

imglinks = []
for image in images:
  imdlink = image.get_attribute('src') 
  imglinks.append(imglink)

print(imglinks)
time.sleep(1)
driver.quit()

path = os.getcwd()
path = os.path.join(path,"an-cuong-images")
os.mkdir(path)
counter = 0
for imglink in imglinks:
    save_as = os.path.join(path,"an-cuong-plywood" + str(counter) + '.jpg')
    wget.download(imglink,save_as)
    counter += 1

我得到的错误是

    file "D:\Jobs\dream\scrape info\scrape image- python\SELEnium_crawling.py",line 43,in <module>
    wget.download(imglink,save_as)
  file "C:\Users\My Lap\ApPDAta\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\python39\site-packages\wget.py",line 526,in download
    (tmpfile,headers) = ulib.urlretrIEve(binurl,tmpfile,callBACk)
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 239,in urlretrIEve
    with contextlib.closing(urlopen(url,data)) as fp:
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 214,in urlopen
    return opener.open(url,data,timeout)
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 523,in open
    response = meth(req,responsE)
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 632,in http_response
    response = self.parent.error(
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 561,in error
    return self._call_chain(*args)
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 494,in _call_chain
    result = func(*args)
  file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 641,in http_error_default
    raise httpError(req.full_url,code,msg,hdrs,fp)
urllib.error.httpError: http Error 403: ModSecurity Action

我该如何解决这个问题。预先感谢您的帮助！

解决方法

@H_356_2@modSecurity 是一个开源、跨平台的 Web 应用程序防火墙 (WAF) 模块。 https://modsecurity.org/about.html

因此，每当您看到 403（ModSecurity Action）时，这意味着 mod 安全防火墙已阻止该请求。造成这种情况的常见原因是：

恶意负载注入
作为参数发布的网址
JavaScript 属性违规
任何其他跨站脚本 (XSS) 尝试

在这里，您将自定义生成的 URL（安全，因为您知道，它是您创建的！）作为参数传递；在规则手册中，这是有效载荷注入的经典示例。要绕过它，请尝试不使用 URL 作为参数的不同方法。

大佬总结

以上是大佬教程为你收集整理的Selenium 抓取 HTTP 错误 403：使用 wget 时的 ModSecurity 操作全部内容，希望文章能够帮你解决Selenium 抓取 HTTP 错误 403：使用 wget 时的 ModSecurity 操作所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：403：使用 wget 抓取操作时的错误

上一篇: 如何从不同片段的侧边菜单处理 o... 下一篇:如何使用 MongoDB 更新 Laravel ...

猜你在找的程序问答相关文章

在烧瓶中重定向时发出POST请求 2022-06-02
从 CreateWindow() 返回的 HWND 的格式值是多少？ 2022-05-31
使用nodejs打印json对象内容 2022-05-31
useEffect 无限循环仅在测试时发生，否则不会发生 - 尽管使用 useReducer 2022-05-31
从雅虎财经检索 ESG 分数 2022-05-31
Gulp：获取“必须指定任务功能”错误，但我只有 1 个任务 2022-05-31
JavaScript 将平面数组转换为嵌套/分组和排序数组 2022-05-31
405 Method Not Allowed 当提交表单到 Flask 时，即使路由有 ['GET', 'PO... 2022-05-31
Mongodb 错误码和对应的 http 状态码 2022-05-31
连接到上游时 Nginx connect() 失败（111：连接被拒绝），客户端：192.168.128.1，服务... 2022-05-31