大佬教程收集整理的这篇文章主要介绍了PHP - 检查 url 是否有效,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在检查网址并返回“有效”,如果网址状态代码“200”和“无效”,如果它在“404",
url 是重定向到某个页面 (url) 的链接,我需要检查该页面 (url) 的状态,以根据其状态代码确定其是否有效。
<?php
// From URL to get redirected URL
$url = 'https://www.shareaSALE.com/m-PR.cfm?merchantID=83483&userID=1860618&productID=916465625';
// Initialize a CURL session.
$ch = curl_init();
// Grab URL and pass it to the variable.
curl_setopt($ch,CURLOPT_URL,$url);
// Catch output (do NOT print!)
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
// Return follow LOCATIOn true
curl_setopt($ch,CURLOPT_FolLOWLOCATION,TRUE);
$HTML = curl_exec($ch);
// GeTinfo or redirected URL from effective URL
$redirectedUrl = curl_geTinfo($ch,CURliNFO_EFFECTIVE_URL);
// Close handle
curl_close($ch);
echo "Original URL: " . $url . "<br/> </br>";
echo "Redirected URL: " . $redirectedUrl . "<br/>";
function is_url_valID($url) {
$handle = curl_init($url);
curl_setopt($handle,truE);
curl_setopt($handle,CURLOPT_NOBODY,truE);
curl_exec($handlE);
$httpCode = intval(curl_geTinfo($handle,CURliNFO_http_CODE));
curl_close($handlE);
if ($httpCode == 200) {
return 'valID link';
}
else {
return 'invalID link';
}
}
//
echo "<br/>".is_url_valID($redirectedUrl)."<br/>";
如您所见,上面的链接状态为 400,但仍显示“有效” 我正在使用上面的代码,有什么想法或更正吗?为了让它按预期工作? 似乎该站点有不止一个重定向的 url 和脚本检查只有一个,这就是它显示有效的原因。 任何想法如何解决它?
这是我正在检查的链接
问题 -
例如 - 如果我查看此链接 https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518 然后在浏览器中它继续 "404" 但在脚本 o/p 中它的 "200"
@H_675_0@解决方法我为此使用了 get_headers() 函数。如果我在数组中找到状态 2xx,则 URL 没问题。
function urlExists($url){
$headers = @get_headers($url);
if($headers === falsE) return false;
return preg_grep('~^http/\d+\.\d+\s+2\d{2}~',$headers) ? true : false;
}
,
这是我对这个问题的看法。基本上,要点是:
CURLOPT_FOLLOWLOCATION
将为您完成所有工作,最后,您将获得的 http 响应代码是在发生/某些重定向的情况下来自最终调用的代码。CURLOPT_NOBODY
,因此请求将使用 HEAD
方法并且不会返回任何内容。因此,CURLOPT_RETURNTRANSFER
毫无用处。...
<?php
$linksTocheck = [
'https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=547531.5112&type=15&murl=https%3A%2F%2Fwww.peopletree.co.uk%2Fwomen%2Fdresses%2FAnna-checked-dress','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.2335&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fagnetha-black-floral-print-bamboo-dress-midnight-navy%2F%2392%3D1390%26142%3D198','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.752&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fbernice-floral-tunic-dress%2F%2392%3D1273%26142%3D198','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.6863&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fjosefa-smock-shift-dress-in-midnight-navy-hemp%2F%2392%3D1390%26142%3D208','https://www.shareaSALE.com/m-PR.cfm?merchantID=16570&userID=1860618&productID=546729471','https://www.shareaSALE.com/m-PR.cfm?merchantID=53661&userID=1860618&productID=680698793','https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518','https://www.shareaSALE.com/m-PR.cfm?merchantID=83483&userID=1860618&productID=916465625',];
function isValidUrl($url) {
echo "Original URL: " . $url . "<br/>\n";
$handle = curl_init($url);
// Follow any redirection.
curl_setopt($handle,CURLOPT_FOLLOWLOCATION,TRUE);
// Use a HEAD request and do not return a body.
curl_setopt($handle,CURLOPT_NOBODY,truE);
// Execute the request.
curl_exec($handlE);
// Get the effective URl.
$effectiveUrl = curl_geTinfo($handle,CURLINFO_EFFECTIVE_URL);
echo "Effective URL: " . $effectiveUrl . "<br/> </br>";
$httpResponseCode = (int) curl_geTinfo($handle,CURLINFO_http_CODE);
// Close this request.
curl_close($handlE);
if ($httpResponseCode == 200) {
return '✅';
}
else {
return '❌';
}
}
foreach ($linksTocheck as $linkTocheck) {
echo php_EOL . "Result: " . isValidUrl($linkTocheck) . php_EOL . php_EOL;
}
,
注意:我们使用 CURLOPT_NOBODY 只是检查连接而不是获取整个正文。
$url = "Your URL";
$curl = curl_init($url);
curl_setopt($curl,truE);
$result = curl_exec($curl);
if ($result !== falsE)
{
$statusCode = curl_geTinfo($curl,CURLINFO_http_CODE);
if ($statusCode == 404)
{
echo "URL Not Exists"
}
else
{
echo "URL Exists";
}
}
else
{
echo "URL not Exists";
}
,
下面的代码运行良好,但是当我将 url 放入数组并测试相同的功能时,它没有给出正确的结果? 任何想法为什么? 此外,如果任何主体想要更新答案以使其在某种意义上是动态的(当提供一组 url 时,应一次检查多个 url)。
<?php
// URL to check
$url = 'https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518';
$ch = curl_init(); // Initialize a CURL session.
curl_setopt($ch,CURLOPT_URL,$url); // Grab URL and pass it to the variable.
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE); // Catch output (do NOT print!)
curl_setopt($ch,TRUE); // Return follow LOCATIOn true
$html = curl_exec($ch);
$redirectedUrl = curl_geTinfo($ch,CURLINFO_EFFECTIVE_URL); // GeTinfo or redirected URL from effective URL
curl_close($ch); // Close handle
$get_final_url = get_final_url($redirectedUrl);
if($get_final_url){
echo is_url_valid($get_final_url);
}else{
echo $redirectedUrl ? is_url_valid($redirectedUrl) : is_url_valid($url);
}
function is_url_valid($url) {
$handle = curl_init($url);
curl_setopt($handle,truE);
curl_setopt($handle,truE);
curl_exec($handlE);
$httpCode = intval(curl_geTinfo($handle,CURLINFO_http_CODE));
curl_close($handlE);
echo $httpCode;
if ($httpCode == 200) {
return '<b> Valid link </b>';
}
else {
return '<b> Invalid link </b>';
}
}
function get_final_url($url) {
$ch = curl_init();
if (!$ch) {
return false;
}
$ret = curl_setopt($ch,$url);
$ret = curl_setopt($ch,CURLOPT_HEADER,1);
$ret = curl_setopt($ch,CURLOPT_TIMEOUT,30);
$ret = curl_exec($ch);
if (!empty($ret)) {
$info = curl_geTinfo($ch);
curl_close($ch);
return false;
if (empty($info['http_code'])) {
return false;
} else {
preg_match('#(https:.*?)\'\)#',$ret,$match);
$final_url = Stripslashes($match[1]);
return Stripslashes($match[1]);
}
}
}
,
看,这里的问题是你想跟随 JAVASCRIPT 重定向,
您抱怨的网址 https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518
确实重定向到响应 http 200 OK
的网址,并且该页面包含 javascript
<script LANGUAGE="JavaScript1.2">
window.LOCATIOn.replace('https:\/\/www.tenthousandvillages.com\/bicycle-statue?sscid=71k5_4yt9r ')
</script>
所以你的浏览器,它理解 javascript,遵循 javascript 重定向,而 js 重定向是一个 404 页面..不幸的是,没有从 php 做到这一点的好方法,你最好的选择可能是无头网络浏览器,例如 PhantomJS 或 puppeteer 或 SELEnium 或类似的东西。
仍然,您可以在正则表达式中搜索 javascript 重定向并希望获得最佳效果,例如
<?php
function is_url_valid(String $url):bool{
if(0!==strncasecmp($url,"http",strlen("http"))){
// file:///etc/passwd and stuff like that aren't considered valid urls right?
return false;
}
$ch=curl_init();
if(!curl_setopt_array($ch,array(
CURLOPT_URL=>$url,CURLOPT_FOLLOWLOCATION=>1,CURLOPT_RETURNTRANSFER=>1
))){
// best guess: the url is so malformed that even CURLOPT_URL didn't accept it.
return false;
}
$resp= curl_exec($ch);
if(false===$resp){
return false;
}
if(curl_geTinfo($ch,CURLINFO_RESPONSE_CODE) != 200){
// only http 200 OK is accepted
return false;
}
// attempt to detect javascript redirects... sigh
// window.LOCATIOn.replace('https:\/\/www.tenthousandvillages.com\/bicycle-statue?sscid=71k5_4yt9r ')
$rex = '/LOCATIOn\.replace\s*\(\s*(?<redirect>(?:\'|\")[\s\S]*?(?:\'|\"))/';
if(!preg_match($rex,$resp,$matches)){
// no javascript redirects detected..
return true;
}else{
// javascript redirect detected..
$url = trim($matches["redirect"]);
// javascript allows both ' and " for Strings,but json only allows " for Strings
$url = str_replace("'",'"',$url);
$url = json_decode($url,true,512,JSON_THROW_ON_ERROR); // we extracted it from javascript,need json decoding.. (well,Strictly speaking,it needs javascript decoding,but json decoding is probably sufficient,and we only have a json decoder nearby)
curl_close($ch);
return is_url_valid($url);
}
}
var_dump(
is_url_valid('https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518'),is_url_valid('http://example.org'),is_url_valid('http://example12k34jr43r5ehjegeesfmwefdc.org'),);
但委婉地说,这是一个狡猾的hacky解决方案..
以上是大佬教程为你收集整理的PHP - 检查 url 是否有效全部内容,希望文章能够帮你解决PHP - 检查 url 是否有效所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。