分类导航

程序问答发布时间：2022-06-01 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了如何从 <link/> 标签恢复 http 链接，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决如何从 <link/> 标签恢复 http 链接？

开发过程中遇到如何从 <link/> 标签恢复 http 链接的问题如何解决？下面主要结合日常开发的经验，给出你关于如何从 <link/> 标签恢复 http 链接的解决方法建议，希望对你解决如何从 <link/> 标签恢复 http 链接有所启发或帮助；

我正在尝试从 RSS 页面恢复网络链接。我在 windows 10 系统上使用 python3、请求和 BeautifulSoup4。我的代码如下：

RSS = "http://www.example.com/xml/RSS/all.xml"
mYheaders = {'User-Agent': 'Mozilla/5.0 (X11; linux x86_64; rv:45.0) Gecko/20100101 firefox/45.0'}
sourcePage = requests.get(RSS,headers = mYheaders,timeout=(5,10))
sourceText = sourcePage.text
soup = BeautifulSoup(sourceText,'HTMl.parser')
Articles = soup.findAll('item')
for i in Articles:
    title = i.title
    link = i.link
    Pub = i.pubdate
    print('title: ',titlE)
    print('link: ',link)
    print('Pub: ',Pub)

打印如下：

title:  <title>There is some text here</title>
link:  <link/>
Pub:  <pubdate>Sat,06 Feb 2021 10:22:41 +0000</pubdate>

文章中的单个项目具有以下形式：

<item>
<link/>https://www.example.com/news/2021/2/6/blahblah
                <title>Some Title TEXT here</title>
<description><![cdaTA[Some text here&#039; and here.]]></description>
<pubdate>Sat,06 Feb 2021 11:58:23 +0000</pubdate>
<category>News</category>
<guID ispermalink="false">https://www.example.com/?t=1234567</guID>
</item>

问题在于

<link/>

因为它没有以适当的形式被捕获，即

<link>...</link>

当我在浏览器 (FireFox) 中打开同一个链接（上面的 RSS）时，链接标签显示正确：

<item>
<link>
https://www.example.com/blah/blah
</link>
<title>
Some Title TEXT here.
</title>
<description>
Some description here.
</description>
<pubDate>Sun,07 Feb 2021 08:03:48 +0000</pubDate>
<category>News</category>
<guID isPermalink="false">https://www.example.com/?t=123456</guID>
</item>

我猜问题在于对 xml 页面使用 HTMl.parser。如果我需要使用一些 xml 解析器，你能指导我在 python3 上使用哪一个。代码将在 raspBerry pi 上运行，但我正在 windows10 上开发。

提前感谢您的解决方案！

解决方法

由于<link></link>标签被转换成<link/>，所以你需要使用.next_sibling来获取你需要的链接。代码看起来像这样：

...
for i in Articles:
    title = i.title
    Link = i.link.next_sibling
    Pub = i.pubdate
    print('title: ',titlE)
    print('Link: ',Link)
    print('Pub: ',Pub)

此外，如果您只想获得没有标签的 title 和 Pub，请使用 .text。