分类导航

程序问答发布时间：2022-06-01 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛？

开发过程中遇到创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛的问题如何解决？下面主要结合日常开发的经验，给出你关于创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛的解决方法建议，希望对你解决创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛有所启发或帮助；

我正在尝试创建从另一个蜘蛛继承功能的较小蜘蛛。

这里是主蜘蛛的代码：

import scrapy
import pandas as pd
import os.path 


class main(scrapy.SpIDer):
    allowed_domains = ['domain.com']

    def parse_product_List(self,response):
        urls = response.xpath("//*[contains(@class,'col-xs-12 col-sm-6 col-lg-3')]/a/@href").getall()
        for url in urls:
            url = "https:" + url
            yIEld scrapy.Request(url,self.parse_product_vIEw,dont_filter=True)
        # Next Page
        next_page = response.CSS(".pull-left .pagination a:contains('Next')::attr(href)").get()
        if next_page is not None:
            # yIEld response.follow_all(url,callback=self.parse)
            # yIEld from response.follow_all(pagination_links,self.parse)
            yIEld scrapy.Request(next_page,callback=self.parse,dont_filter=True)
        
    def parse_product_vIEw(self,response):
        productSku = response.CSS("#products_model::text").get().replace("Model #","")
        productname = response.CSS("#products_name::text").get().replace(";",'') + "- " + productSku
        # productname = '"' + productnameUgly + '"'   
        productDescription = response.CSS("#productDescription").get().replace(";",'')

        # MSRP Price
        if "$" in response.CSS("#products_price::text").get():
            productPrice = response.CSS("#products_price::text").get().replace("$","")
        else:
            productPrice = response.CSS("#products_price s::text").get().replace("$","")

        # Special Price
        if response.CSS("#products_price .productSpecialPrice::text").get():
            specialPrice = response.CSS("#products_price .productSpecialPrice::text").get().replace("Sale:$","")
        else:
            specialPrice = ""

        # Product Image
        image = response.CSS("img.img-responsive.img-fill.product-details-big-thumb::attr(src)").get()
        gallery = response.xpath("//*[@class='product-details-thumb-float']/a/@href").getall()
        gallery.insert(0,image)
        gallery = '; '.join(gallery).replace("wm.PHP/","")

        
        # Extra Attribute
        if response.xpath(".//*[@class='product-notes-feature']/li").getall():
            key = response.xpath(".//*[@class='product-notes-feature']/li/@data-product-feature").getall()
            value = response.xpath(".//*[@class='product-notes-feature']/li/text()").getall()
            attributes = dict(zip(key,value))
            productAttributes = attributes
            for key,value in attributes.items():
                productAttributes[key] = value.replace(key + ": ","")


        export_essential = {
            "sku": productSku,"label": productSku,"name": productname,"description": productDescription,"price": productPrice,"special_price": specialPrice,"image": image,"media_gallery": gallery
        }

        export_all = export_essential
        if productAttributes:
            export_all.update(productAttributes)

        yIEld export_all

现在我复制并粘贴基础蜘蛛的整个文件，然后更改名称和网址。这不是一种有效的方式。如果基础蜘蛛有变化，我必须回到所有的小蜘蛛那里去更新。

谢谢

解决方法

我不确定您要问什么，但是如果您将第一行更改为：

class Spider(scrapy.Spider):
    def __init__(self,url):
        self.allowed_domains = [url]

然后你可以调用这个文件spider.py，在另一个文件中，说：

from spider import Spider

spider1 = Spider('domain.com')

这就是你所追求的吗？

大佬总结

以上是大佬教程为你收集整理的创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛全部内容，希望文章能够帮你解决创建从另一个文件基蜘蛛类继承函数的爬虫蜘蛛所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：

上一篇: 如何将一个布尔值设置为 false 从... 下一篇:全局名称未定义 Telepot

猜你在找的程序问答相关文章

在烧瓶中重定向时发出POST请求 2022-06-02
从 CreateWindow() 返回的 HWND 的格式值是多少？ 2022-05-31
使用nodejs打印json对象内容 2022-05-31
useEffect 无限循环仅在测试时发生，否则不会发生 - 尽管使用 useReducer 2022-05-31
从雅虎财经检索 ESG 分数 2022-05-31
Gulp：获取“必须指定任务功能”错误，但我只有 1 个任务 2022-05-31
JavaScript 将平面数组转换为嵌套/分组和排序数组 2022-05-31
405 Method Not Allowed 当提交表单到 Flask 时，即使路由有 ['GET', 'PO... 2022-05-31
Mongodb 错误码和对应的 http 状态码 2022-05-31
连接到上游时 Nginx connect() 失败（111：连接被拒绝），客户端：192.168.128.1，服务... 2022-05-31