程序问答   发布时间:2022-06-02  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了PHP - 检查 url 是否有效大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
@H_675_0@如何解决php - 检查 url 是否有效? 开发过程中遇到php - 检查 url 是否有效的问题如何解决?下面主要结合日常开发的经验,给出你关于php - 检查 url 是否有效的解决方法建议,希望对你解决php - 检查 url 是否有效有所启发或帮助;

我正在检查网址并返回“有效”,如果网址状态代码“200”和“无效”,如果它在404",

url 是重定向到某个页面 (url) 的链接,我需要检查该页面 (url) 的状态,以根据其状态代码确定其是否有效。

<?php

// From URL to get redirected URL
$url = 'https://www.shareaSALE.com/m-PR.cfm?merchantID=83483&userID=1860618&productID=916465625';
  
// Initialize a CURL session.
$ch = curl_init();
  
// Grab URL and pass it to the variable.
curl_setopt($ch,CURLOPT_URL,$url);
  
// Catch output (do NOT print!)
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
  
// Return follow LOCATIOn true
curl_setopt($ch,CURLOPT_FolLOWLOCATION,TRUE);
$HTML = curl_exec($ch);
  
// GeTinfo or redirected URL from effective URL
$redirectedUrl = curl_geTinfo($ch,CURliNFO_EFFECTIVE_URL);
  
// Close handle
curl_close($ch);
echo "Original URL:   " . $url . "<br/> </br>";
echo "Redirected URL: " . $redirectedUrl . "<br/>";

 function is_url_valID($url) {
  $handle = curl_init($url);
  curl_setopt($handle,truE);
  curl_setopt($handle,CURLOPT_NOBODY,truE);
  curl_exec($handlE);
 
  $httpCode = intval(curl_geTinfo($handle,CURliNFO_http_CODE));
  curl_close($handlE);
 
  if ($httpCode == 200) {
    return 'valID link';
  }
  else {
    return 'invalID link';
  }
}

// 
echo "<br/>".is_url_valID($redirectedUrl)."<br/>";

如您所见,上面的链接状态为 400,但仍显示“有效” 我正在使用上面的代码,有什么想法或更正吗?为了让它按预期工作? 似乎该站点有不止一个重定向的 url 和脚本检查只有一个,这就是它显示有效的原因。 任何想法如何解决它?

这是我正在检查的链接

  • https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=547531.5112&type=15&murl=https%3A%2F%2Fwww.peopletree.co.uk%2Fwomen%2Fdresses%2FAnna-checked-dress
  • https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.2335&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fagnetha-black-floral-print-bamboo-dress-midnight-navy%2F%2392%3D1390%26142%3D198
  • https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.752&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fbernice-floral-tunic-dress%2F%2392%3D1273%26142%3D198
  • https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.6863&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fjosefa-smock-shift-dress-in-midnight-navy-hemp%2F%2392%3D1390%26142%3D208
  • https://www.shareaSALE.com/m-PR.cfm?merchantID=16570&userID=1860618&productID=546729471
  • https://www.shareaSALE.com/m-PR.cfm?merchantID=53661&userID=1860618&productID=680698793
  • https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518
  • https://www.shareaSALE.com/m-PR.cfm?merchantID=83483&userID=1860618&productID=916465625

问题 -

例如 - 如果我查看此链接 https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518 然后在浏览器中它继续 "404" 但在脚本 o/p 中它的 "200"

@H_675_0@解决方法

我为此使用了 get_headers() 函数。如果我在数组中找到状态 2xx,则 URL 没问题。

function urlExists($url){
  $headers = @get_headers($url);
  if($headers === falsE) return false;
  return preg_grep('~^http/\d+\.\d+\s+2\d{2}~',$headers) ? true : false;
}
,

这是我对这个问题的看法。基本上,要点是:

  1. 您不需要提出多个请求。使用 CURLOPT_FOLLOWLOCATION 将为您完成所有工作,最后,您将获得的 http 响应代码是在发生/某些重定向的情况下来自最终调用的代码。
  2. 由于您使用的是 CURLOPT_NOBODY,因此请求将使用 HEAD 方法并且不会返回任何内容。因此,CURLOPT_RETURNTRANSFER 毫无用处。
  3. 我冒昧地使用了我自己的编码风格(无意冒犯)。
  4. 由于我从 phpstorm 的 Scratch 文件中运行代码,所以我添加了一些 php_EOL 作为换行符来格式化输出。随意删除它们。

...

<?php

$linksTocheck = [
    'https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=547531.5112&type=15&murl=https%3A%2F%2Fwww.peopletree.co.uk%2Fwomen%2Fdresses%2FAnna-checked-dress','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.2335&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fagnetha-black-floral-print-bamboo-dress-midnight-navy%2F%2392%3D1390%26142%3D198','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.752&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fbernice-floral-tunic-dress%2F%2392%3D1273%26142%3D198','https://click.linksynergy.com/link?id=GsILx6E5APM&offerid=330522.6863&type=15&murl=https%3A%2F%2Fwww.wearethought.com%2Fjosefa-smock-shift-dress-in-midnight-navy-hemp%2F%2392%3D1390%26142%3D208','https://www.shareaSALE.com/m-PR.cfm?merchantID=16570&userID=1860618&productID=546729471','https://www.shareaSALE.com/m-PR.cfm?merchantID=53661&userID=1860618&productID=680698793','https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518','https://www.shareaSALE.com/m-PR.cfm?merchantID=83483&userID=1860618&productID=916465625',];

function isValidUrl($url) {
    echo "Original URL:   " . $url . "<br/>\n";

    $handle = curl_init($url);

    // Follow any redirection.
    curl_setopt($handle,CURLOPT_FOLLOWLOCATION,TRUE);

    // Use a HEAD request and do not return a body.
    curl_setopt($handle,CURLOPT_NOBODY,truE);

    // Execute the request.
    curl_exec($handlE);

    // Get the effective URl.
    $effectiveUrl = curl_geTinfo($handle,CURLINFO_EFFECTIVE_URL);
    echo "Effective URL:   " . $effectiveUrl . "<br/> </br>";

    $httpResponseCode = (int) curl_geTinfo($handle,CURLINFO_http_CODE);

    // Close this request.
    curl_close($handlE);

    if ($httpResponseCode == 200) {
        return '✅';
    }
    else {
        return '❌';
    }
}

foreach ($linksTocheck as $linkTocheck) {
    echo php_EOL . "Result: " . isValidUrl($linkTocheck) . php_EOL . php_EOL;
}
,

注意:我们使用 CURLOPT_NOBODY 只是检查连接而不是获取整个正文。

  $url = "Your URL";
  $curl = curl_init($url);
  curl_setopt($curl,truE);
  $result = curl_exec($curl);
 if ($result !== falsE)
 {
    $statusCode = curl_geTinfo($curl,CURLINFO_http_CODE);  
 if ($statusCode == 404)
 {
   echo "URL Not Exists"
 }
 else
 {
   echo "URL Exists";
  }
 }
else
{
  echo "URL not Exists";
}
,

下面的代码运行良好,但是当我将 url 放入数组并测试相同的功能时,它没有给出正确的结果? 任何想法为什么? 此外,如果任何主体想要更新答案以使其在某种意义上是动态的(当提供一组 url 时,应一次检查多个 url)。

  <?php
    
    // URL to check
    $url = 'https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518';
      
    $ch = curl_init(); // Initialize a CURL session.
    curl_setopt($ch,CURLOPT_URL,$url); // Grab URL and pass it to the variable.
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE); // Catch output (do NOT print!)
    curl_setopt($ch,TRUE); // Return follow LOCATIOn true
    $html = curl_exec($ch);
    $redirectedUrl = curl_geTinfo($ch,CURLINFO_EFFECTIVE_URL); // GeTinfo or redirected URL from effective URL
    curl_close($ch); // Close handle
    
    $get_final_url = get_final_url($redirectedUrl);
    if($get_final_url){
        echo is_url_valid($get_final_url);
    }else{
        echo $redirectedUrl ? is_url_valid($redirectedUrl) : is_url_valid($url);
    }
    
    function is_url_valid($url) {
      $handle = curl_init($url);
      curl_setopt($handle,truE);
      curl_setopt($handle,truE);
      curl_exec($handlE);
     
      $httpCode = intval(curl_geTinfo($handle,CURLINFO_http_CODE));
      curl_close($handlE);
      echo $httpCode;
      if ($httpCode == 200) {
        return '<b> Valid link </b>';
      }
      else {
        return '<b> Invalid link </b>';
      }
    }
    
    function get_final_url($url) {
            $ch = curl_init();
            if (!$ch) {
                return false;
            }
            $ret = curl_setopt($ch,$url);
            $ret = curl_setopt($ch,CURLOPT_HEADER,1);
            $ret = curl_setopt($ch,CURLOPT_TIMEOUT,30);
            $ret = curl_exec($ch);
    
            if (!empty($ret)) {
                $info = curl_geTinfo($ch);
                curl_close($ch);
                return false;
            if (empty($info['http_code'])) {
                return false;
            } else {
                preg_match('#(https:.*?)\'\)#',$ret,$match);
                $final_url = Stripslashes($match[1]);
                return Stripslashes($match[1]);
            }
        }
    } 
,

看,这里的问题是你想跟随 JAVASCRIPT 重定向, 您抱怨的网址 https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518 确实重定向到响应 http 200 OK 的网址,并且该页面包含 javascript

<script LANGUAGE="JavaScript1.2">
                window.LOCATIOn.replace('https:\/\/www.tenthousandvillages.com\/bicycle-statue?sscid=71k5_4yt9r ')
                </script>

所以你的浏览器,它理解 javascript,遵循 javascript 重定向,而 js 重定向是一个 404 页面..不幸的是,没有从 php 做到这一点的好方法,你最好的选择可能是无头网络浏览器,例如 PhantomJS 或 puppeteer 或 SELEnium 或类似的东西。

仍然,您可以在正则表达式中搜索 javascript 重定向并希望获得最佳效果,例如

<?php
function is_url_valid(String $url):bool{
    if(0!==strncasecmp($url,"http",strlen("http"))){
        // file:///etc/passwd and stuff like that aren't considered valid urls right?
        return false;
    }
    $ch=curl_init();
    if(!curl_setopt_array($ch,array(
        CURLOPT_URL=>$url,CURLOPT_FOLLOWLOCATION=>1,CURLOPT_RETURNTRANSFER=>1
    ))){
        // best guess: the url is so malformed that even CURLOPT_URL didn't accept it.
        return false;
    }
    $resp= curl_exec($ch);
    if(false===$resp){
        return false;
    }
    if(curl_geTinfo($ch,CURLINFO_RESPONSE_CODE) != 200){
        // only http 200 OK is accepted
        return false;
    }
    // attempt to detect javascript redirects... sigh
    // window.LOCATIOn.replace('https:\/\/www.tenthousandvillages.com\/bicycle-statue?sscid=71k5_4yt9r ')
    $rex = '/LOCATIOn\.replace\s*\(\s*(?<redirect>(?:\'|\")[\s\S]*?(?:\'|\"))/';
    if(!preg_match($rex,$resp,$matches)){
        // no javascript redirects detected..
        return true;
    }else{
        // javascript redirect detected..
        $url = trim($matches["redirect"]);
        // javascript allows both ' and " for Strings,but json only allows " for Strings
        $url = str_replace("'",'"',$url);
        $url = json_decode($url,true,512,JSON_THROW_ON_ERROR); // we extracted it from javascript,need json decoding.. (well,Strictly speaking,it needs javascript decoding,but json decoding is probably sufficient,and we only have a json decoder nearby)
        curl_close($ch);
        return is_url_valid($url);
    }
}
var_dump(

    is_url_valid('https://www.shareaSALE.com/m-PR.cfm?merchantID=66802&userID=1860618&productID=1186005518'),is_url_valid('http://example.org'),is_url_valid('http://example12k34jr43r5ehjegeesfmwefdc.org'),);

但委婉地说,这是一个狡猾的hacky解决方案..

大佬总结

以上是大佬教程为你收集整理的PHP - 检查 url 是否有效全部内容,希望文章能够帮你解决PHP - 检查 url 是否有效所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。