xpath - python lxml findall with multiple namespaces -


i'm trying parse xml document multiple namespaces lxml, , i'm stuck on getting findall() method return something.

my xml:

<measurementrecords xmlns="http://www.company.com/common/rsp/2012/07"                     xmlns:xsi="http://www.w3.org/2001/xmlschema-instance"                              xsi:schemalocation="http://www.company.com/common/rsp/2012/07 rsp_ews_v1.6.xsd">     <historyrecords>         <valueitemid>100_0000100004_3788_resource-0.customid_wsx data precip type</valueitemid>             <list>                 <historyrecord>                     <value>60</value>                     <state>valid</state>                     <timestamp>2016-04-20t12:40:00z</timestamp>                 </historyrecord>             </list>         </historyrecords>     <historyrecords> </measurementrecords> 

my code:

from lxml import etree pprint import pprint  rspxmlfile = '/home/user/desktop/100_0000100004_3788_20160420144011263_records.xml'  open (rspxmlfile, 'rt') f:     tree = etree.parse(f)  root = tree.getroot()  node in tree.findall('measurementrecords', root.nsmap):     print node     print "parameter = ", node.text 

gives:

valueerror: empty namespace prefix not supported in elementpath 

some experiments i've tried after reading this:

>>> root.nsmap {'xsi': 'http://www.w3.org/2001/xmlschema-instance', none: http://www.company.com/common/rsp/2012/07'}  >>> nsmap['foo']=nsmap[none] >>> nsmap.pop(none) 'http://www.company.com/common/rsp/2012/07' >>> nsmap {'xsi': 'http://www.w3.org/2001/xmlschema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'} >>> tree.xpath("//measurementrecords", namespaces=nsmap) [] >>> tree.xpath('/foo:measurementrecords', namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290>] >>> tree.xpath('/foo:measurementrecords/historyrecords', namespaces=nsmap) [] 

but didn't seem help.

so, more experiments:

>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}measurementrecords') [] >>> print root <element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290> >>> print tree <lxml.etree._elementtree object @ 0x6ffffda5368> >>> node in tree.iter(): ...     print node ... <element {http://www.company.com/common/rsp/2012/07}measurementrecords @ 0x6ffffda5290> <element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x6ffffda5cf8> <element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x6ffffda5f38> ...etc... >>> tree.findall("//historyrecords", namespaces=nsmap) [] >>> tree.findall("//foo:measurementrecords/historyrecords", namespaces=nsmap) [] 

i'm stumped. have no idea what's wrong.

if start this:

>>> tree = etree.parse(open('data.xml')) >>> root = tree.getroot() >>>  

this fail find elements...

>>> root.findall('{http://www.company.com/common/rsp/2012/07}measurementrecords') [] 

...but that's because root is measurementrecords element; not contain measurementrecords elements. on other hand, following works fine:

>>> root.findall('{http://www.company.com/common/rsp/2012/07}historyrecords') [<element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x7fccd0332ef0>] >>>  

using xpath method, this:

>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07', ... 'b': 'http://www.w3.org/2001/xmlschema-instance'} >>> root.xpath('//a:historyrecords', namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}historyrecords @ 0x7fccd0332ef0>] 

so:

  • the findall , find methods require {...namespace...}elementname syntax.
  • the xpath method requires namespace prefixes (ns:elementname), looks in provided namespaces map. prefix doesn't have match prefix used in original document, namespace url must match.

so works:

>>> root.find('{http://www.company.com/common/rsp/2012/07}historyrecords/{http://www.company.com/common/rsp/2012/07}valueitemid') <element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x7fccd0332a70> 

or works:

>>> root.xpath('/a:measurementrecords/a:historyrecords/a:valueitemid',namespaces=nsmap) [<element {http://www.company.com/common/rsp/2012/07}valueitemid @ 0x7fccd0330830>] 

Comments