大佬教程收集整理的这篇文章主要介绍了Selenium 抓取 HTTP 错误 403:使用 wget 时的 ModSecurity 操作,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我试图从网站上抓取图片。我设法获得了图像的链接,但是当我使用 wget 下载图像时,我不断收到 http 错误 403:ModSecurity Action
这是我的代码
from SELEnium import webdriver
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.support.wait import webdriverwait
import wget
import time
import os
import urllib.request
driver = webdriver.Chrome('Chromedriver-path')
url = ("https://www.ancuong.com/vi/san-pham/san-pham-chinh/van-mfc--cac-loai-van-phu-melamine/melamine-phu-tren-mdf-melamine-mdf/page-woodgrain.HTML")
driver.get(url)
n = 0
while n <= 1500:
driver.execute_script("window.scrollTo(0,{})".format(n))
n+=200
time.sleep(0.1)
images = webdriverwait(driver,60).until(
EC.visibility_of_all_elements_located((By.CLASS_name,'load-done'))
)
imglinks = []
for image in images:
imdlink = image.get_attribute('src')
imglinks.append(imglink)
print(imglinks)
time.sleep(1)
driver.quit()
path = os.getcwd()
path = os.path.join(path,"an-cuong-images")
os.mkdir(path)
counter = 0
for imglink in imglinks:
save_as = os.path.join(path,"an-cuong-plywood" + str(counter) + '.jpg')
wget.download(imglink,save_as)
counter += 1
我得到的错误是
file "D:\Jobs\dream\scrape info\scrape image- python\SELEnium_crawling.py",line 43,in <module>
wget.download(imglink,save_as)
file "C:\Users\My Lap\ApPDAta\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\python39\site-packages\wget.py",line 526,in download
(tmpfile,headers) = ulib.urlretrIEve(binurl,tmpfile,callBACk)
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 239,in urlretrIEve
with contextlib.closing(urlopen(url,data)) as fp:
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 214,in urlopen
return opener.open(url,data,timeout)
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 523,in open
response = meth(req,responsE)
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 632,in http_response
response = self.parent.error(
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 561,in error
return self._call_chain(*args)
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 494,in _call_chain
result = func(*args)
file "C:\Program files\windowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",line 641,in http_error_default
raise httpError(req.full_url,code,msg,hdrs,fp)
urllib.error.httpError: http Error 403: ModSecurity Action
我该如何解决这个问题。预先感谢您的帮助!
因此,每当您看到 403(ModSecurity Action)时,这意味着 mod 安全防火墙已阻止该请求。造成这种情况的常见原因是:
在这里,您将自定义生成的 URL(安全,因为您知道,它是您创建的!)作为参数传递;在规则手册中,这是有效载荷注入的经典示例。要绕过它,请尝试不使用 URL 作为参数的不同方法。
以上是大佬教程为你收集整理的Selenium 抓取 HTTP 错误 403:使用 wget 时的 ModSecurity 操作全部内容,希望文章能够帮你解决Selenium 抓取 HTTP 错误 403:使用 wget 时的 ModSecurity 操作所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。