Shuffle Merge Files

Category: Python - Miscellaneous

This script solves the problem of shuffle-merging files -- interlacing (shuffle-merging) many small text files into one large text file, while preserving the order of the lines from within the small files.In a scientific simulation process, it is not uncommon to need to combine multiple source files into one final file, which will become input for another stage in the process, while preserving the order of the lines from within the source files. For example, when simulating the way checked messages arrive to a centralized component (e.g., a central database server in a distributed banking service), the final file needs to combine all source files in a random way (e.g., the messages arrived at their pace, disturbed by the transfer over Internet), while preserving the order between lines of the same source file (e.g., the receiving-end of the messaging service ensured the messages from the same client arrived in a fixed order). This recipe solves this problem, under the following assumptions:o the source files are named "Prefix_X", with the same Prefix, and X being a 0-bades integer index of the file (e.g., 0, 1, ..., n-1 for n source files)o the shuffle-merge (output) file is "sm-Prefix"The script prints a verification message every 10K lines parsed, and seems to be performing in under 10s on a 1M lines set of input.Possible improvements [est.difficulty: trivial In a scientific simulation process, it is not uncommon to need to combine multiple source files into one final file, which will become input for another stage in the process, while preserving the order of the lines from within the source files. For example, when simulating the way checked messages arrive to a centralized component (e.g., a central database server in a distributed banking service), the final file needs to combine all source files in a random way (e.g., the messages arrived at their pace, disturbed by the transfer over Internet), while preserving the order between lines of the same source file (e.g., the receiving-end of the messaging service ensured the messages from the same client arrived in a fixed order). This recipe solves this problem, under the following assumptions:o the source files are named "Prefix_X", with the same Prefix, and X being a 0-bades integer index of the file (e.g., 0, 1, ..., n-1 for n source files)o the shuffle-merge (output) file is "sm-Prefix"The script prints a verification message every 10K lines parsed, and seems to be performing in under 10s on a 1M lines set of input. Date: 11 February, 2012


Merge Files - Shuffle Merge - Text To File - Merge - Files - Text

Homepage: http://code.activestate.com/

Developer: code.activestate.com

License: Freeware

Operating System: All

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

Meld is a visual diff and merge tool.

developer Developer: ftp.gnome.org
license License: Freeware
operating systems Operating System: All


This script helps you to merge sorted iterables, preserving ordering,without consuming iterables (and computing time) unnecessarily.

developer Developer: code.activestate.com
license License: Freeware
operating systems Operating System: All


The usual approach to merging is to loop through both sequences taking the smallest from each until they are both exhausted.

developer Developer: code.activestate.com
license License: Freeware
operating systems Operating System: All


This code creates real mixed-in classes: it actually merges one class into another (c-python specific), taking care of name-mangling, some complications with __slots__, and everything else.

developer Developer: code.activestate.com
license License: Freeware
operating systems Operating System: All


This script merges multiple sorted inputs into a single sorted output.

developer Developer: code.activestate.com
license License: Freeware
operating systems Operating System: All


The ObjectMerger class dynamically merges two given objects, making one a subclass of the other.

developer Developer: code.activestate.com
license License: Freeware
operating systems Operating System: All


PDF Split and Merge is an easy to use tool to merge and split pdf documents.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


KTorrent is a BitTorrent program for KDE.

developer Developer: ktorrent.org
license License: Freeware
operating systems Operating System: All


This program is a File Management Program as well as a compilation of usefull tools for web administration.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All