程序问答   发布时间:2022-06-01  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了我正在将抓取的数据插入 mysql 但引发以下错误大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

如何解决我正在将抓取的数据插入 mysql 但引发以下错误?

开发过程中遇到我正在将抓取的数据插入 mysql 但引发以下错误的问题如何解决?下面主要结合日常开发的经验,给出你关于我正在将抓取的数据插入 mysql 但引发以下错误的解决方法建议,希望对你解决我正在将抓取的数据插入 mysql 但引发以下错误有所启发或帮助;

我用 SELEnium 创建了一个 python webscraPing 脚本来从 researchgate 网站抓取配置文件,并希望将结果存储在 MysqL 数据库中。 但是我遇到了这个错误并尝试了一个月但找不到解决此错误的方法。 这是代码。

from SELEnium import webdriver
from SELEnium.webdriver.Chrome.options import Options
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support.ui import webdriverwait
from SELEnium.webdriver.support.expected_conditions import presence_of_element_located
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.common.exceptions import ElementClickInterceptedException
from SELEnium.common.exceptions import NoSucHelementexception 
import time
import sys
import Mysql.connector


mydb = Mysql.connector.connect(
  host="localhost",user="your username",password="your password",db='your database'
)

cur = mydb.cursor()


#create table
cur.execute("""@R_673_3368@ IF EXISTS Data""")

cur.execute(''' create table IF NOT EXISTS Data
               (ID int nOT NulL PRIMary KEY auto_INCREMENT,name varchar(20),Institution VARCHAR(255),Department varchar(255),Citations IntegeR,Recommendation IntegeR,@R_151_10586@l_Reads IntegeR,@R_151_10586@l_research_interest decimaL(7,1),Research_items IntegeR,Projects IntegeR,Questions  IntegeR,Answers IntegeR,scores int,Followers IntegeR,Followings IntegeR
               )''')



login_url = 'https://www.researchgate.net/login'
base_url = "https://www.researchgate.net/institution/Islamia_College_Peshawar/department/Department_of_Computer_ScIEnce/members"
Chrome_driver_path = '/home/danish-khan/scrapers/researchgate/Chromedriver'

Chrome_options = Options()
#Chrome_options.add_argument('--headless')

webdriver = webdriver.Chrome(
  executable_path=Chrome_driver_path,options=Chrome_options
)

# default login credential and search query
username = 'your username'
password = 'your password'
search_query = "Islamia college Peshawar"
results = []

with webdriver as driver:
    # Set timeout time 
    wait = webdriverwait(driver,10)

    # retrive url in headless browser
    driver.get(login_url)
    
    driver.find_elemenT_By_ID("input-login").send_keys(userName)
    driver.find_elemenT_By_ID("input-password").send_keys(password)
    driver.find_elemenT_By_class_name("nova-c-button__label").find_element(By.XPATH,"./..").click()
    time.sleep(5)

    driver.get(base_url)

    time.sleep(10)
    #driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    #names = driver.find_elements_by_CSS_SELEctor('.display-name')
    name = driver.find_elements_by_xpath('//ul[@class="List people-List-m"]/li//a[@class="display-name"]')
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    print(len(Name))
    name_SELEctor = '.nova-e-text--color-grey-900'
    #SELEctor = '.display-name'
    SELEctor = '//ul[@class="List people-List-m"]/li//a[@class="display-name"]'
    #for i in range(0,1):
    for i in range(0,len(Name)):
             #driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
        
            #time.sleep(5)
            links = webdriverwait(driver,70).until(
            EC.presence_of_all_elements_located((By.XPATH,SELEctor))
              )
                    
             
            links[i].click()
#            name_e = webdriverwait(driver,20).until(
#            EC.presence_of_element_located((By.CSS_SELECTOR,name_SELEctor))
#            )
            time.sleep(5)
            name = driver.find_elemenT_By_CSS_SELEctor('.nova-e-text--color-grey-900').text
            try:
              Institution = driver.find_elemenT_By_CSS_SELEctor('.nova-v-institution-item__title .nova-e-link--theme-bare').text
            except:
              Institution = ''
              
            try:    
              Department = driver.find_elemenT_By_CSS_SELEctor('.nova-v-institution-item__info-section-List-item .nova-e-link--theme-bare').text
            except:
              Department = ''
            
            try:   
              Citations = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(2) .nova-e-text--size-xl').text
              Citations = int(Citations.replace(",",""))
            except:
              Citations = '' 
                 
            try:     
              Recommendation =   driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(3) .nova-e-text--size-xl').text
            except:
              Recommendation = ''  
            
            try:  
              @R_151_10586@l_Reads = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(4) .nova-e-text--size-xl').text                      
              @R_151_10586@l_Reads = int(@R_151_10586@l_Reads.replace(",""))
            except:
              @R_151_10586@l_Reads = '' 
            
            try:
              @R_151_10586@l_research_interest =   driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(1) .nova-e-text--size-xl').text           
              @R_151_10586@l_research_interest = (@R_151_10586@l_research_interest.replace(",""))
            except:
              @R_151_10586@l_research_interest = ''
            
            try:
              Research_items = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(1) .nova-e-text--color-inherit').text
              Research_items = Research_items
            except:
              Research_items = ''
            
            try:
              Projects = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(2) .nova-e-text--color-inherit').text
              Projects = Projects
            except:
              Projects = ''
       
            try:   
              Questions = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(3) .nova-e-text--size-xl').text
              Questions = Questions
            except:
              Questions = ''
            
            try:
              Answers = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(4) .nova-e-text--size-xl').text
              Answers = Answers  
            except:
              Answers = ''
            
            
            try:
                scores = driver.find_elemenT_By_CSS_SELEctor('.profile-header-details-Meta-items .nova-e-List__item:nth-child(1)').text
                
                scores = scores
            except: 
                scores = 0
                    
            #scores = scores[0]
            
            try:                             
              Followings = driver.find_elemenT_By_xpath(xpath = "//*[contains(text(),'Following')]").text.Strip('Following').Strip('( )')
    
            except:
              Followings = 0
              print('No  Followers')  
            
            
            
            
            try:                             
              Followers = driver.find_elemenT_By_xpath(xpath = "//*[contains(text(),'Followers')]").text.Strip('Followers').Strip('( )')
    
            except:
              Followers = 0
              print('No  Followers')  
            
            
         
            print(scores) 
            print(Citations)
            print(Recommendation)
            print(@R_151_10586@l_Reads)
            print(@R_151_10586@l_research_interest)
            print()  
           
            time.sleep(5)
            driver.BACk()
            
            time.sleep(5)
            #driver.close()   
        
             
        

            time.sleep(10)

            cur.execute('INSERT INTO Data(name,Institution,Department,Citations,Recommendation,@R_151_10586@l_Reads,@R_151_10586@l_research_interest,Research_items,Projects,Questions,Answers,scores,Followers,Followings) VALUES("%s","%s","%s" )' % (name,Followings ) )
            #driver.close()
            mydb.commit()
            print('complete.')


mydb.close()
time.sleep(10)

driver.close()

以下是我尝试过但不明白为什么会出现此错误的输出错误。

20
17.92
301
28
5406
230.7

complete.
11.54
79
66
3356
92.1

complete.

5
2
392
9.9

TraceBACk (most recent call last):
  file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/connection_cext.py",line 507,in cmd_query
    self._cMysql.query(query,_MysqL_connector.MysqLInterfaceError: Incorrect Integer value: '' for column 'scores' at row 1

During handling of the above exception,another exception occurred:

TraceBACk (most recent call last):
  file "resgt5.py",line 208,in <module>
    cur.execute('INSERT INTO Data(name,Followings ) )
  file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/cursor_cext.py",line 274,in execute
    result = self._cnx.cmd_query(stmt,raw=self._raw,file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/connection_cext.py",line 511,in cmd_query
    raise errors.get_MysqL_exception(exc.errno,msg=exc.msg,Mysql.connector.errors.DatabaseError: 1366 (HY000): Incorrect Integer value: '' for column 'scores' at row 1
 

解决方法

无论您分配给 scores 的是什么,在 scores = driver.find_elemenT_By_css_SELEctor(...).text 中都不是 Integer - 即元素的值文本不是整数。打印出来,你会看到它实际上是什么

@H_235_2@mySQL 会自动将它从 text 转换为 int,因此如果您在 scores 中的值是 int,您的代码会起作用。

就个人而言,如果它是一个整数,我会使用 VALUES(%d)' % (scores),这可能需要您事先将 scores 转换为 Integer,所以 scores = int(driver.find_elemenT_By_css_SELEctor...>

您的 @R_673_3368@ 似乎也不起作用,因为它似乎认为已经有一个名为 DATA -> DATA(Name,recommendation,treads,scores )

,
from SELEnium import webdriver
from SELEnium.webdriver.chrome.options import Options
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support.ui import WebDriverWait
from SELEnium.webdriver.support.expected_conditions import presence_of_element_located
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.common.exceptions import ElementClickInterceptedException
import time
import sys
import mysql.connector


mydb = mysql.connector.connect(
  host="localhost",user="danish-khan",password="12345",db='reseachgate_profiles'
)

cur = mydb.cursor()


#create table
cur.execute("""@R_673_3368@ IF EXISTS Data""")

cur.execute(''' create table IF NOT EXISTS Data
               (Id int nOT NULL PRIMary KEY AUTO_INCREMENT,name varchar(20),Institution VARCHAR(255),Department varchar(255),Citations IntegeR,Recommendation IntegeR,@R_151_10586@l_Reads IntegeR,@R_151_10586@l_research_interest decimaL(7,1),Research_items IntegeR,Projects IntegeR,Questions  IntegeR,Answers IntegeR,scores int,Followers IntegeR,Followings IntegeR
               )''')



login_url = 'https://www.researchgate.net/login'
base_url = "https://www.researchgate.net/institution/Islamia_College_Peshawar/department/Department_of_Computer_Science/members"
chrome_driver_path = '/home/danish-khan/scrapers/researchgate/chromedriver'

chrome_options = Options()
#chrome_options.add_argument('--headless')

webdriver = webdriver.Chrome(
  executable_path=chrome_driver_path,options=chrome_options
)

# default login credential and search query
username = 'danishkhankd237@gmail.com'
password = 'danish3.16khan'
search_query = "Islamia college Peshawar"
results = []

with webdriver as driver:
    # Set timeout time 
    wait = WebDriverWait(driver,10)

    # retrive url in headless browser
    driver.get(login_url)
    
    driver.find_elemenT_By_id("input-login").send_keys(userName)
    driver.find_elemenT_By_id("input-password").send_keys(password)
    driver.find_elemenT_By_class_name("nova-c-button__label").find_element(By.XPATH,"./..").click()
    time.sleep(5)

    driver.get(base_url)

    time.sleep(10)
    #driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    #names = driver.find_elements_by_css_SELEctor('.display-name')
    name = driver.find_elements_by_xpath('//ul[@class="list people-list-m"]/li//a[@class="display-name"]')
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    print(len(Name))
    name_SELEctor = '.nova-e-text--color-grey-900'
    #SELEctor = '.display-name'
    SELEctor = '//ul[@class="list people-list-m"]/li//a[@class="display-name"]'
    #for i in range(0,1):
    for i in range(1,len(Name)):
             #driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
        
            #time.sleep(5)
            links = WebDriverWait(driver,70).until(
            EC.presence_of_all_elements_located((By.XPATH,SELEctor))
              )
                    
             
            links[i].click()
#            name_e = WebDriverWait(driver,20).until(
#            EC.presence_of_element_located((By.CSS_SELECTOR,name_SELEctor))
#            )
            time.sleep(5)
            Name = driver.find_elemenT_By_css_SELEctor('.nova-e-text--size-xl.nova-e-text--color-grey-900').text
            try:
              Institution = driver.find_elemenT_By_css_SELEctor('.nova-v-institution-item__title .nova-e-link--theme-bare').text
            except:
              Institution = 'NA'
              
            try:    
              Department = driver.find_elemenT_By_css_SELEctor('.nova-v-institution-item__info-section-list-item .nova-e-link--theme-bare').text
            except:
              Department = 'NA'
            
            try:   
              Citations = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(2) .nova-e-text--size-xl').text
            except:
              Citations = ''  
                 
            try:     
              Recommendation =   driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(3) .nova-e-text--size-xl').text
            except:
              Recommendation = ''  
            
            try:  
              @R_151_10586@l_Reads = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(4) .nova-e-text--size-xl').text                      
              @R_151_10586@l_Reads = int(@R_151_10586@l_Reads.replace(",",""))
            except:
              @R_151_10586@l_Reads = '' 
            
            try:
              @R_151_10586@l_research_interest =   driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(1) .nova-e-text--size-xl').text,@R_151_10586@l_research_interest = @R_151_10586@l_research_interest[0]
            except:
              @R_151_10586@l_research_interest = ''
            
            try:
              Research_items = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(1) .nova-e-text--color-inherit').text,Research_items = Research_items[0]
            except:
              Research_items = ''
            
            try:
              Projects = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(2) .nova-e-text--color-inherit').text
              Projects = Projects[0]
            except:
              Projects = ''
       
            try:   
              Questions = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(3) .nova-e-text--size-xl').text,Questions = Questions[0]
            except:
              Questions = ''
            
            try:
              Answers = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(4) .nova-e-text--size-xl').text,Answers = Answers[0]   
            except:
              Answers = ''
            
            
            try:
              scores = driver.find_elemenT_By_css_SELEctor('.profile-header-details-meta-items .nova-e-list__item:nth-child(1)').text
            except:
               scores = 0
                    
            #Scores = Scores[0]
              
            try:                             
              Followers = driver.find_elemenT_By_xpath('//div[@class="nova-o-stack nova-o-stack--gutter-xxl nova-o-stack--spacing-xxs nova-o-stack--show-divider"]/div[2]/div/div/div/div/div/b').text[11:13]
            
            except:
              followers = ''
              print('No  Followers')  
            
            Followings = driver.find_elemenT_By_xpath('//div[@class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"]/b').text[11:13]
            ''' 
            try:
               Followers = Followers
          
            except: 
               Followers = ''
            '''
            try:
               Followings = Followings
          
            except: 
               Followings = 'NA'
            
            print(scores) 
           
            time.sleep(5)
            driver.BACk()
            
            time.sleep(5)
            #driver.close()   
        
             
        

            time.sleep(10)

            cur.execute('INSERT INTO Data(Name,Institution,Department,Citations,Recommendation,@R_151_10586@l_Reads,@R_151_10586@l_research_interest,Research_items,Projects,Questions,Answers,scores,Followers,Followings) VALUES("%s","%s","%s" )' % (Name,Followings ) )
            #driver.close()
            mydb.commit()
            print('complete.')


mydb.close()
time.sleep(10)

driver.close()

大佬总结

以上是大佬教程为你收集整理的我正在将抓取的数据插入 mysql 但引发以下错误全部内容,希望文章能够帮你解决我正在将抓取的数据插入 mysql 但引发以下错误所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。
标签:mysql