Parsing XML with lxml in Django – Multiple Namespaces and XPath

A few days ago, I was trying to figure out how to parse XML with multiple namespaces and get information using XPath in Django. I came across lxml which I think is really good. You don’t have to csrf_exempt this procedure as it is GET based and thus safe. I am doing it for consistency with the rest of my code. I am using Primo Webservices basic search here as an example, but you may not be able to open this URL as it is a protected URL. Also, this may not be the best way to do this, so if you can think of improvements, please let me know. from django.http import HttpResponse from django.views.decorators.csrf import csrf_exempt import simplejson as json import urllib from lxml import etree   @csrf_exempt def brief_search(request):     errors = []     if request.method == 'GET':         searchTerms = request.GET.get('query')         bulkSize = request.GET.get('pageSize')         indx = request.GET.get('start')         if indx:             indx = int(indx) + 1             DEFAULT_NS = 'http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib'             query = 'any,contains,' + searchTerms             url = 'http://solo.bodleian.ox.ac.uk/PrimoWebServices/xservice/search/brief?institution=OX&onCampus=false&dym=false&highlight=false&lang=eng&query=' + query + '&indx=' + str(indx) + '&bulkSize=' + bulkSize             content = urllib.urlopen(url)             xml = etree.parse(content)             docset = xml.getroot().xpath('//sear:SEGMENTS/sear:JAGROOT/sear:RESULT/sear:DOCSET', namespaces={'sear': 'http://www.exlibrisgroup.com/xsd/jaguar/search', 'def': DEFAULT_NS})             totalhits = docset[0].get("TOTALHITS");             docs = xml.getroot().xpath('//sear:SEGMENTS/sear:JAGROOT/sear:RESULT/sear:DOCSET/sear:DOC/def:PrimoNMBib/def:record', namespaces={'sear': 'http://www.exlibrisgroup.com/xsd/jaguar/search', 'def': DEFAULT_NS})  

Continue Reading

Site Footer

Sliding Sidebar

Blog of Masud Khokhar

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Currently Reading