python – lxml xpath返回一个空列表

<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" class="pc chrome win psc_dir-ltr psc_form-xlarge" dir="ltr" lang="en">
<title>Some Title</title>
</html>

如果我跑:

from lxml import etree
html = etree.parse('text.txt')
result = html.xpath('//title')
print(result)

我会得到一个空列表.
我想它与命名空间有关,但我无法弄清楚如何解决它.

最佳答案
尝试使用html解析器创建树.
另请注意,如果text.txt是文件,则需要先读取它.

with open('text.txt', 'r', encoding='utf8') as f:
    text_html = f.read()

像这样:

from lxml import etree, html

def build_lxml_tree(_html):
    tree = html.fromstring(_html)
    tree = etree.ElementTree(tree)
    return tree

tree = build_lxml_tree(text_html)
result = tree.xpath('//title')
print(result)

转载注明原文:python – lxml xpath返回一个空列表 - 代码日志