i'm trying parse xml document multiple namespaces lxml, , i'm stuck on getting findall() method return something.
my xml:
<measurementrecords xmlns="http://www.company.com/common/rsp/2012/07" xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.company.com/common/rsp/2012/07 rsp_ews_v1.6.xsd"> <historyrecords> <valueitemid>100_0000100004_3788_resource-0.customid_wsx data precip type</valueitemid> <list> <historyrecord> <value>60</value> <state>valid</state> <timestamp>2016-04-20t12:40:00z</timestamp> </historyrecord> </list> </historyrecords> <historyrecords> </measurementrecords>
my code:
from lxml import etree pprint import pprint rspxmlfile = '/home/user/desktop/100_0000100004_3788_20160420144011263_records.xml' open (rspxmlfile, 'rt') f: tree = etree.parse(f) root = tree.getroot() node in tree.findall('measurementrecords', root.nsmap): print node print "parameter = ", node.text
gives:
valueerror: empty namespace prefix not supported in elementpath
some experiments i've tried after reading this:
>>> root.nsmap {'xsi': 'http://www.w3.org/2001/xmlschema-instance', none: http://www.company.com/common/rsp/2012/07'} >>> nsmap['foo']=nsmap[none] >>> nsmap.pop(none) 'http://www.company.com/common/rsp/2012/07' >>> nsmap {'xsi': 'http://www.w3.org/2001/xmlschema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'} >>> tree.xpath("//measurementrecords", namespaces=nsmap) [] >>> tree.xpath('/foo:measurementrecords', namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290>] >>> tree.xpath('/foo:measurementrecords/historyrecords', namespaces=nsmap) []
but didn't seem help.
so, more experiments:
>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}measurementrecords') [] >>> print root <element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290> >>> print tree <lxml.etree._elementtree object @ 0x6ffffda5368> >>> node in tree.iter(): ... print node ... <element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290> <element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x6ffffda5cf8> <element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x6ffffda5f38> ...etc... >>> tree.findall("//historyrecords", namespaces=nsmap) [] >>> tree.findall("//foo:measurementrecords/historyrecords", namespaces=nsmap) []
i'm stumped. have no idea what's wrong.
if start this:
>>> tree = etree.parse(open('data.xml')) >>> root = tree.getroot() >>>
this fail find elements...
>>> root.findall('{http://www.company.com/common/rsp/2012/07}measurementrecords') []
...but that's because root
is measurementrecords
element; not contain measurementrecords
elements. on other hand, following works fine:
>>> root.findall('{http://www.company.com/common/rsp/2012/07}historyrecords') [<element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x7fccd0332ef0>] >>>
using xpath
method, this:
>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07', ... 'b': 'http://www.w3.org/2001/xmlschema-instance'} >>> root.xpath('//a:historyrecords', namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x7fccd0332ef0>]
so:
- the
findall
,find
methods require{...namespace...}elementname
syntax. - the
xpath
method requires namespace prefixes (ns:elementname
), looks in providednamespaces
map. prefix doesn't have match prefix used in original document, namespace url must match.
so works:
>>> root.find('{http://www.company.com/common/rsp/2012/07}historyrecords/{http://www.company.com/common/rsp/2012/07}valueitemid') <element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x7fccd0332a70>
or works:
>>> root.xpath('/a:measurementrecords/a:historyrecords/a:valueitemid',namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x7fccd0330830>]
Comments
Post a Comment