Breaking large XML documents into chunks to speed processing

Category: Python - Miscellaneous

One of the few problems with using Python to process XML is the speed -- if the XML becomes somewhat large (>1Mb), it slows down exponentially as the size of the XML increases. One way to increase the processing speed is to break the XML down via tag name. This is especially handy if you are only interested in one part of the XML, or between certain elements throughout the XML.

Here is a function that I came up with to handle this problem -- I call it "tinyDom". It uses the Sax reader from PyXML, although it could be easily changed for minidom, etc.

The In parameters are the XML as a string, the tag name that you want to build the DOM around, and an optional postition to start at within the XML. It returns a DOM tree and the character position that it stopped at. Date: 17 May, 2012


Xml

Homepage: http://code.activestate.com/recipes/84515-breaking-large-xml-documents-into-chunks-to-speed-/?in=lang-python

Developer: Mike Hostetler

License: Python License

Operating System: Windows

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

Hippo CMS is an open source information centered content management system.

developer Developer: Tjeerd Brenninkmeijer
license License: Apache Software License
operating systems Operating System: Unix, Windows


SYDI is a project aimed to help system administrators to document their network.

developer Developer: Patrick Ogenstad
license License: BSD License
operating systems Operating System: Windows


A modern template engine for PHP 5 with XML-based template language, declarative programming support and an API similar to those used in frameworks.

developer Developer: Tomasz Jędrzejewski
license License: BSD License (revised)
operating systems Operating System: Platform-independent


Easy, e-commerce JavaScript shopping cart.

developer Developer: Gregory
license License: Freeware
operating systems Operating System: UNIX, Windows all


XJIG is an image gallery, that is working locally via file protocoll as well as via http on a webserver.

developer Developer: Christian Schramm
license License: GNU General Public License (GPL)
operating systems Operating System: Linux, Windows (all)


This small tool helps you to convert your MySQL database layout into XML.

developer Developer: PhpToys
license License: GNU General Public License (GPL)
operating systems Operating System: ALL


Conglomerate is a project to create a complete structured information authoring, management, archival, revision control and transformation system.

developer Developer: The Conglomerate Team
license License: GNU General Public License (GPL)
operating systems Operating System: Windows / Unix


This describes possible ways of using userdefined class instances as dictionary keys.

developer Developer: Andreas Kostyrka
license License: Python License
operating systems Operating System: Windows


CORBA has a reputation for being hard to use, but it is really very easy, expecially if you use Python.

developer Developer: Duncan Grisby
license License: Python License
operating systems Operating System: Windows