大佬教程收集整理的这篇文章主要介绍了我正在将抓取的数据插入 mysql 但引发以下错误,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我用 SELEnium 创建了一个 python webscraPing 脚本来从 researchgate 网站抓取配置文件,并希望将结果存储在 MysqL 数据库中。 但是我遇到了这个错误并尝试了一个月但找不到解决此错误的方法。 这是代码。
from SELEnium import webdriver
from SELEnium.webdriver.Chrome.options import Options
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support.ui import webdriverwait
from SELEnium.webdriver.support.expected_conditions import presence_of_element_located
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.common.exceptions import ElementClickInterceptedException
from SELEnium.common.exceptions import NoSucHelementexception
import time
import sys
import Mysql.connector
mydb = Mysql.connector.connect(
host="localhost",user="your username",password="your password",db='your database'
)
cur = mydb.cursor()
#create table
cur.execute("""@R_673_3368@ IF EXISTS Data""")
cur.execute(''' create table IF NOT EXISTS Data
(ID int nOT NulL PRIMary KEY auto_INCREMENT,name varchar(20),Institution VARCHAR(255),Department varchar(255),Citations IntegeR,Recommendation IntegeR,@R_151_10586@l_Reads IntegeR,@R_151_10586@l_research_interest decimaL(7,1),Research_items IntegeR,Projects IntegeR,Questions IntegeR,Answers IntegeR,scores int,Followers IntegeR,Followings IntegeR
)''')
login_url = 'https://www.researchgate.net/login'
base_url = "https://www.researchgate.net/institution/Islamia_College_Peshawar/department/Department_of_Computer_ScIEnce/members"
Chrome_driver_path = '/home/danish-khan/scrapers/researchgate/Chromedriver'
Chrome_options = Options()
#Chrome_options.add_argument('--headless')
webdriver = webdriver.Chrome(
executable_path=Chrome_driver_path,options=Chrome_options
)
# default login credential and search query
username = 'your username'
password = 'your password'
search_query = "Islamia college Peshawar"
results = []
with webdriver as driver:
# Set timeout time
wait = webdriverwait(driver,10)
# retrive url in headless browser
driver.get(login_url)
driver.find_elemenT_By_ID("input-login").send_keys(userName)
driver.find_elemenT_By_ID("input-password").send_keys(password)
driver.find_elemenT_By_class_name("nova-c-button__label").find_element(By.XPATH,"./..").click()
time.sleep(5)
driver.get(base_url)
time.sleep(10)
#driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
#names = driver.find_elements_by_CSS_SELEctor('.display-name')
name = driver.find_elements_by_xpath('//ul[@class="List people-List-m"]/li//a[@class="display-name"]')
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
print(len(Name))
name_SELEctor = '.nova-e-text--color-grey-900'
#SELEctor = '.display-name'
SELEctor = '//ul[@class="List people-List-m"]/li//a[@class="display-name"]'
#for i in range(0,1):
for i in range(0,len(Name)):
#driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
#time.sleep(5)
links = webdriverwait(driver,70).until(
EC.presence_of_all_elements_located((By.XPATH,SELEctor))
)
links[i].click()
# name_e = webdriverwait(driver,20).until(
# EC.presence_of_element_located((By.CSS_SELECTOR,name_SELEctor))
# )
time.sleep(5)
name = driver.find_elemenT_By_CSS_SELEctor('.nova-e-text--color-grey-900').text
try:
Institution = driver.find_elemenT_By_CSS_SELEctor('.nova-v-institution-item__title .nova-e-link--theme-bare').text
except:
Institution = ''
try:
Department = driver.find_elemenT_By_CSS_SELEctor('.nova-v-institution-item__info-section-List-item .nova-e-link--theme-bare').text
except:
Department = ''
try:
Citations = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(2) .nova-e-text--size-xl').text
Citations = int(Citations.replace(",",""))
except:
Citations = ''
try:
Recommendation = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(3) .nova-e-text--size-xl').text
except:
Recommendation = ''
try:
@R_151_10586@l_Reads = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(4) .nova-e-text--size-xl').text
@R_151_10586@l_Reads = int(@R_151_10586@l_Reads.replace(",""))
except:
@R_151_10586@l_Reads = ''
try:
@R_151_10586@l_research_interest = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--m:nth-child(1) .nova-e-text--size-xl').text
@R_151_10586@l_research_interest = (@R_151_10586@l_research_interest.replace(",""))
except:
@R_151_10586@l_research_interest = ''
try:
Research_items = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(1) .nova-e-text--color-inherit').text
Research_items = Research_items
except:
Research_items = ''
try:
Projects = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(2) .nova-e-text--color-inherit').text
Projects = Projects
except:
Projects = ''
try:
Questions = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(3) .nova-e-text--size-xl').text
Questions = Questions
except:
Questions = ''
try:
Answers = driver.find_elemenT_By_CSS_SELEctor('.application-Box-layout__item--xs:nth-child(4) .nova-e-text--size-xl').text
Answers = Answers
except:
Answers = ''
try:
scores = driver.find_elemenT_By_CSS_SELEctor('.profile-header-details-Meta-items .nova-e-List__item:nth-child(1)').text
scores = scores
except:
scores = 0
#scores = scores[0]
try:
Followings = driver.find_elemenT_By_xpath(xpath = "//*[contains(text(),'Following')]").text.Strip('Following').Strip('( )')
except:
Followings = 0
print('No Followers')
try:
Followers = driver.find_elemenT_By_xpath(xpath = "//*[contains(text(),'Followers')]").text.Strip('Followers').Strip('( )')
except:
Followers = 0
print('No Followers')
print(scores)
print(Citations)
print(Recommendation)
print(@R_151_10586@l_Reads)
print(@R_151_10586@l_research_interest)
print()
time.sleep(5)
driver.BACk()
time.sleep(5)
#driver.close()
time.sleep(10)
cur.execute('INSERT INTO Data(name,Institution,Department,Citations,Recommendation,@R_151_10586@l_Reads,@R_151_10586@l_research_interest,Research_items,Projects,Questions,Answers,scores,Followers,Followings) VALUES("%s","%s","%s" )' % (name,Followings ) )
#driver.close()
mydb.commit()
print('complete.')
mydb.close()
time.sleep(10)
driver.close()
以下是我尝试过但不明白为什么会出现此错误的输出错误。
20
17.92
301
28
5406
230.7
complete.
11.54
79
66
3356
92.1
complete.
5
2
392
9.9
TraceBACk (most recent call last):
file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/connection_cext.py",line 507,in cmd_query
self._cMysql.query(query,_MysqL_connector.MysqLInterfaceError: Incorrect Integer value: '' for column 'scores' at row 1
During handling of the above exception,another exception occurred:
TraceBACk (most recent call last):
file "resgt5.py",line 208,in <module>
cur.execute('INSERT INTO Data(name,Followings ) )
file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/cursor_cext.py",line 274,in execute
result = self._cnx.cmd_query(stmt,raw=self._raw,file "/home/danish-khan/scrapers/scrpers/lib/python3.8/site-packages/MysqL/connector/connection_cext.py",line 511,in cmd_query
raise errors.get_MysqL_exception(exc.errno,msg=exc.msg,Mysql.connector.errors.DatabaseError: 1366 (HY000): Incorrect Integer value: '' for column 'scores' at row 1
无论您分配给 scores
的是什么,在 scores = driver.find_elemenT_By_css_SELEctor(...).text
中都不是 Integer
- 即元素的值文本不是整数。打印出来,你会看到它实际上是什么
scores
中的值是 int
,您的代码会起作用。
就个人而言,如果它是一个整数,我会使用 VALUES(%d)' % (scores)
,这可能需要您事先将 scores
转换为 Integer
,所以 scores = int(driver.find_elemenT_By_css_SELEctor...
>
您的 @R_673_3368@
似乎也不起作用,因为它似乎认为已经有一个名为 DATA
->
DATA(Name,recommendation,treads,scores )
from SELEnium import webdriver
from SELEnium.webdriver.chrome.options import Options
from SELEnium.webdriver.common.by import By
from SELEnium.webdriver.common.keys import Keys
from SELEnium.webdriver.support.ui import WebDriverWait
from SELEnium.webdriver.support.expected_conditions import presence_of_element_located
from SELEnium.webdriver.support import expected_conditions as EC
from SELEnium.common.exceptions import ElementClickInterceptedException
import time
import sys
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",user="danish-khan",password="12345",db='reseachgate_profiles'
)
cur = mydb.cursor()
#create table
cur.execute("""@R_673_3368@ IF EXISTS Data""")
cur.execute(''' create table IF NOT EXISTS Data
(Id int nOT NULL PRIMary KEY AUTO_INCREMENT,name varchar(20),Institution VARCHAR(255),Department varchar(255),Citations IntegeR,Recommendation IntegeR,@R_151_10586@l_Reads IntegeR,@R_151_10586@l_research_interest decimaL(7,1),Research_items IntegeR,Projects IntegeR,Questions IntegeR,Answers IntegeR,scores int,Followers IntegeR,Followings IntegeR
)''')
login_url = 'https://www.researchgate.net/login'
base_url = "https://www.researchgate.net/institution/Islamia_College_Peshawar/department/Department_of_Computer_Science/members"
chrome_driver_path = '/home/danish-khan/scrapers/researchgate/chromedriver'
chrome_options = Options()
#chrome_options.add_argument('--headless')
webdriver = webdriver.Chrome(
executable_path=chrome_driver_path,options=chrome_options
)
# default login credential and search query
username = 'danishkhankd237@gmail.com'
password = 'danish3.16khan'
search_query = "Islamia college Peshawar"
results = []
with webdriver as driver:
# Set timeout time
wait = WebDriverWait(driver,10)
# retrive url in headless browser
driver.get(login_url)
driver.find_elemenT_By_id("input-login").send_keys(userName)
driver.find_elemenT_By_id("input-password").send_keys(password)
driver.find_elemenT_By_class_name("nova-c-button__label").find_element(By.XPATH,"./..").click()
time.sleep(5)
driver.get(base_url)
time.sleep(10)
#driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
#names = driver.find_elements_by_css_SELEctor('.display-name')
name = driver.find_elements_by_xpath('//ul[@class="list people-list-m"]/li//a[@class="display-name"]')
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
print(len(Name))
name_SELEctor = '.nova-e-text--color-grey-900'
#SELEctor = '.display-name'
SELEctor = '//ul[@class="list people-list-m"]/li//a[@class="display-name"]'
#for i in range(0,1):
for i in range(1,len(Name)):
#driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
#time.sleep(5)
links = WebDriverWait(driver,70).until(
EC.presence_of_all_elements_located((By.XPATH,SELEctor))
)
links[i].click()
# name_e = WebDriverWait(driver,20).until(
# EC.presence_of_element_located((By.CSS_SELECTOR,name_SELEctor))
# )
time.sleep(5)
Name = driver.find_elemenT_By_css_SELEctor('.nova-e-text--size-xl.nova-e-text--color-grey-900').text
try:
Institution = driver.find_elemenT_By_css_SELEctor('.nova-v-institution-item__title .nova-e-link--theme-bare').text
except:
Institution = 'NA'
try:
Department = driver.find_elemenT_By_css_SELEctor('.nova-v-institution-item__info-section-list-item .nova-e-link--theme-bare').text
except:
Department = 'NA'
try:
Citations = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(2) .nova-e-text--size-xl').text
except:
Citations = ''
try:
Recommendation = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(3) .nova-e-text--size-xl').text
except:
Recommendation = ''
try:
@R_151_10586@l_Reads = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(4) .nova-e-text--size-xl').text
@R_151_10586@l_Reads = int(@R_151_10586@l_Reads.replace(",",""))
except:
@R_151_10586@l_Reads = ''
try:
@R_151_10586@l_research_interest = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--m:nth-child(1) .nova-e-text--size-xl').text,@R_151_10586@l_research_interest = @R_151_10586@l_research_interest[0]
except:
@R_151_10586@l_research_interest = ''
try:
Research_items = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(1) .nova-e-text--color-inherit').text,Research_items = Research_items[0]
except:
Research_items = ''
try:
Projects = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(2) .nova-e-text--color-inherit').text
Projects = Projects[0]
except:
Projects = ''
try:
Questions = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(3) .nova-e-text--size-xl').text,Questions = Questions[0]
except:
Questions = ''
try:
Answers = driver.find_elemenT_By_css_SELEctor('.application-box-layout__item--xs:nth-child(4) .nova-e-text--size-xl').text,Answers = Answers[0]
except:
Answers = ''
try:
scores = driver.find_elemenT_By_css_SELEctor('.profile-header-details-meta-items .nova-e-list__item:nth-child(1)').text
except:
scores = 0
#Scores = Scores[0]
try:
Followers = driver.find_elemenT_By_xpath('//div[@class="nova-o-stack nova-o-stack--gutter-xxl nova-o-stack--spacing-xxs nova-o-stack--show-divider"]/div[2]/div/div/div/div/div/b').text[11:13]
except:
followers = ''
print('No Followers')
Followings = driver.find_elemenT_By_xpath('//div[@class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"]/b').text[11:13]
'''
try:
Followers = Followers
except:
Followers = ''
'''
try:
Followings = Followings
except:
Followings = 'NA'
print(scores)
time.sleep(5)
driver.BACk()
time.sleep(5)
#driver.close()
time.sleep(10)
cur.execute('INSERT INTO Data(Name,Institution,Department,Citations,Recommendation,@R_151_10586@l_Reads,@R_151_10586@l_research_interest,Research_items,Projects,Questions,Answers,scores,Followers,Followings) VALUES("%s","%s","%s" )' % (Name,Followings ) )
#driver.close()
mydb.commit()
print('complete.')
mydb.close()
time.sleep(10)
driver.close()
以上是大佬教程为你收集整理的我正在将抓取的数据插入 mysql 但引发以下错误全部内容,希望文章能够帮你解决我正在将抓取的数据插入 mysql 但引发以下错误所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。