Parsing XML with lxml in Django – Multiple Namespaces and XPath

A few days ago, I was trying to figure out how to parse XML with multiple namespaces and get information using XPath in Django. I came across lxml which I think is really good. You don’t have to csrf_exempt this procedure as it is GET based and thus safe. I am doing it for consistency with the rest of my code. I am using Primo Webservices basic search here as an example, but you may not be able to open this URL as it is a protected URL. Also, this may not be the best way to do this, so if you can think of improvements, please let me know. from django.http import HttpResponse from django.views.decorators.csrf import csrf_exempt import simplejson as json import urllib from lxml import etree   @csrf_exempt def brief_search(request):     errors = []     if request.method == 'GET':         searchTerms = request.GET.get('query')         bulkSize = request.GET.get('pageSize')         indx = request.GET.get('start')         if indx:             indx = int(indx) + 1             DEFAULT_NS = ''             query = 'any,contains,' + searchTerms             url = '' + query + '&indx=' + str(indx) + '&bulkSize=' + bulkSize             content = urllib.urlopen(url)             xml = etree.parse(content)             docset = xml.getroot().xpath('//sear:SEGMENTS/sear:JAGROOT/sear:RESULT/sear:DOCSET', namespaces={'sear': '', 'def': DEFAULT_NS})             totalhits = docset[0].get("TOTALHITS");             docs = xml.getroot().xpath('//sear:SEGMENTS/sear:JAGROOT/sear:RESULT/sear:DOCSET/sear:DOC/def:PrimoNMBib/def:record', namespaces={'sear': '', 'def': DEFAULT_NS})  

Continue Reading

Configuring AWStats for Response Time Parameters

Recently I was looking at various web log analysers that I can use with Apache to generate standard and custom statistics. The package that seemed most promising was AWStats. AWStats does a great job, especially with standard statistics. It is also very easy to install. However, if you want to see performance related attributes in AWStats, you need to use the “Extra Sections” in AWStats. I was interested in the response time parameter for a query in Apache. As a result, my Apache LogFormat looks something like: >LogFormat “%h %u %l %t ”%r” %>s %b ”%{Referer}i” ”%{User-Agent}i” %T %D” 443_combined Where %T represents the response time in seconds (useless as most requests are performed in under a second) and %D represents the response time in microseconds. The 443_combined is the nickname you can give to your LogFormat string, and then use this nick name to format when defining log files. You can call it anything. Equivalent AWStats log format is: >LogFormat=”%host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot %other %extra1″ And to show the %extra1 parameter in the graphical page, you use the EXTRA SECTIONS. ExtraSectionName1="Response Time (in microseconds)" ExtraSectionCodeFilter1="200 304" ExtraSectionCondition1="URL,/" ExtraSectionFirstColumnTitle1="Response Time" ExtraSectionFirstColumnValues1="extra1,([0-9]*)$" ExtraSectionFirstColumnFormat1="%s" ExtraSectionStatTypes1=P ExtraSectionAddAverageRow1=0 ExtraSectionAddSumRow1=0

Continue Reading

Primo Enrichment Plugin for Nielsen Data

Primo, the Ex Libris resource discovery platform, provides an architecture to write plug-ins on top of it. One of these plug-ins is the Enrichment plug-in. I have recently written one of these plug-ins, which enhances Oxford’s resource discovery platform called SOLO (based on Primo). The plug-in searches Nielsen data against every record in Primo and enriches the record (more precisely record’s PNX) with table of contents, short descriptions and long descriptions (whichever available). The enriched data is displayable and searchable. Nielsen data is indexed in Apache’s Solr search server and request for the data are made through a web service call from within the plug-in. More details about the plug-in along with source code and installation instructions can be found here.

Bash script to read data from a file into an array

I was asked to write a small shell script which can read the difference of two files (which extracts filenames) and pass the extracted filenames to another shell script. It is not too complicated but posting it for other people’s benefit. Here is a quick explanation. Line 5 takes a difference of files file1.txt and file2.txt, cuts the output from characters 3 to 13, and writes it into a file. Line 7 opens temporarylist.txt as file descriptor 3 for reading. Line 9 runs a loop till there is nothing further to read from file descriptor 3. Line 11 passes the read data to a different shell script (named othershellscript in this case). Line 15 closes the file descriptor 3. #!/bin/bash cd /home/masud diff file1.txt file2.txt | cut -c3-13 > temporarylist.txt exec 3< “temporarylist.txt” || exit 1 while read i <&3; do scripts/othershellscript $i done exec 3<&-

Site Footer

Sliding Sidebar

About Me

About Me

Strategy, Leadership, Innovation and Code Monkeyism

Social Profiles

Latest Tweets