Breaking large XML documents into chunks to speed processing

Category: Python - Miscellaneous

One of the few problems with using Python to process XML is the speed -- if the XML becomes somewhat large (>1Mb), it slows down exponentially as the size of the XML increases. One way to increase the processing speed is to break the XML down via tag name. This is especially handy if you are only interested in one part of the XML, or between certain elements throughout the XML.

Here is a function that I came up with to handle this problem -- I call it "tinyDom". It uses the Sax reader from PyXML, although it could be easily changed for minidom, etc.

The In parameters are the XML as a string, the tag name that you want to build the DOM around, and an optional postition to start at within the XML. It returns a DOM tree and the character position that it stopped at. Date: 17 May, 2012


Xml

Homepage: http://code.activestate.com/recipes/84515-breaking-large-xml-documents-into-chunks-to-speed-/?in=lang-python

Developer: Mike Hostetler

License: Python License

Operating System: Windows

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

Indite is a plugin for HtmlArea.

developer Developer: troels
license License: GNU Lesser General Public License (LGPL)
operating systems Operating System: ie5.5+, mozilla


AxPoint generates slideshows in PDF format from a simple XML description format.

developer Developer: Matt Sergeant
license License: GNU General Public License (GPL)
operating systems Operating System: All


Kumera is an Open Source Content Management System written in Perl and using XML for data storage, designed for small to medium web sites.

developer Developer: http://www.cyber4.org/ku...
license License: Freeware
operating systems Operating System: Unix, Linux


Hippo CMS is an open source information centered content management system.

developer Developer: Tjeerd Brenninkmeijer
license License: Apache Software License
operating systems Operating System: Unix, Windows


The Data Generator is a free, GNU-licensed, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software and populating databases.

developer Developer: Benjamin Keen
license License: GNU General Public License (GPL)
operating systems Operating System: All


SYDI is a project aimed to help system administrators to document their network.

developer Developer: Patrick Ogenstad
license License: BSD License
operating systems Operating System: Windows


This small tool helps you to convert your MySQL database layout into XML.

developer Developer: PhpToys
license License: GNU General Public License (GPL)
operating systems Operating System: ALL


This describes possible ways of using userdefined class instances as dictionary keys.

developer Developer: Andreas Kostyrka
license License: Python License
operating systems Operating System: Windows


CORBA has a reputation for being hard to use, but it is really very easy, expecially if you use Python.

developer Developer: Duncan Grisby
license License: Python License
operating systems Operating System: Windows