A Python script to test download mirrors

Category: Python - Miscellaneous

The concept of the script is straightforward: read the mirrors page from RedHat's web site, make a list of all the mirrors, test how long it takes to download from each, and present a sorted list of the results.The first task, reading and parsing the RedHat mirrors list, is handled with the urllib and HTMLParser modules, respectively. I chose HTMLParser over the more comprehensive parser in sgmllib because it's a bit less work to override the default parser for simple tasks. After the parser sees the content comment in the HTML source, it starts recording any tags that have a scheme of 'homepage it stops recording after it sees the end of the content comment. Currently, it happens that there aren't any absolute URLs on the mirror page outside the content block, but I didn't want to rely on that fact.To test the bandwidth of each mirror site, I simply test how long it takes to download the index page of the mirror. This is not a perfect test, but it gives reasonably good results without depending on knowledge of the site structure.The bandwidth test demonstrates a few important paradigms when dealing with multithreading, either in Python or other languages:1. Let the underlying libraries do as much work as possible.2. Isolate your threads from the rest of the program.The main thread creates a work queue of URLs to be tested and a result queue for retrieving results, then starts a number of threads to do the work and waits for those threads to exit. Because the Queue class is a threadsafe container, Python guarantees that no two threads will ever get the same work unit, and the storing of results by multiple threads will never leave the queue in a bad state.Initially, each worker thread downloaded the mirror index page directly, but this caused the process to run for long amounts of time (over three minutes) when some sites were heavily loaded. To avoid this, I defined a maximum time to attempt downloading, and made each worker thread spawn a new daemon thread to do the download. The worker thread can use Thread.join() to wait on the subthread with a timeout; timeouts are counted as failures. Note that I pass an empty list to the subthread to collect the results. Threads in Python don't have a convenient way to return a status code back to the caller; by passing a mutable object like a list, the subthread can append value to the list to indicate a result. When the join() on the subthread completes, the worker thread can tell that it timed out if the list it passed in is empty.The worker threads put the results for each URL into a results queue. For successful tests, they put a tuple of the URL and the time it took to download; for unsuccessful results, they put a tuple of the URL and a string describing the type of failure. When the main thread has detected that all worker threads have exited, it separates successes from failures, sorts the two lists, and prints them in aligned columns.Note that the script could be written without the second-level threads. Using them helps isolate the failure-prone download from the more reliable worker thread pool, at the cost of a few more ephemeral threads, and provides a good demonstration of how and when to use daemon threads to keep a script from hanging indefinitely at shutdown.This script is useful to tell which mirrors are most heavily loaded, but it has shortcomings. Some HTTP-based mirrors are actually redirects to FTP mirrors, and some seem to apply different bandwidth throttles to index pages and ISO downloads. Additionally, the script can't tell which of the mirrors actually have up-to-date files; this can't easily be fixed without having knowledge of each mirror site, since mirror sites differ in their directory structure. But this at least gives the would-be upgrader an idea of where to look. Date: 07 February, 2012


Test Download Mirrors - Mirror Tester - Mirror Checker - Test - Download - Mirrors

Homepage: http://code.activestate.com/

Developer: code.activestate.com

License: Freeware

Operating System: All

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

The Data Generator is a free, GNU-licensed, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software and populating databases.

developer Developer: Benjamin Keen
license License: GNU General Public License (GPL)
operating systems Operating System: All


Javascript Assertion Unit Framework is multi-platform compatible.

developer Developer: http://jsassertunit.sour...
license License: Freeware
operating systems Operating System: All


Streamlines your authoring and site administration by identifying your server environment and reporting the results directly in your browser.

developer Developer: http://www.craigrichards...
license License: Freeware
operating systems Operating System: All


I find this little script very useful for web applications that need to automatically generate tables.

developer Developer: Francisco Charrua
license License: Freeware
operating systems Operating System: Linux, Unix, Windows


This script generates images for CAPTCHA ( Completely Automated Public Turing test to tell Computers from Humans Apart ) validation tests, make coded string and compare users input.

developer Developer: alex hom
license License: Freeware
operating systems Operating System: Unix


This tool will permit you to do some performance benchmark of any servers (PHP).

developer Developer: askywhale
license License: Freeware
operating systems Operating System: Linux, Unix, Windows


It will allow you to verify the accuracy of your data after you burn a CD or transfer a files over a network.

developer Developer: http://www.irnis.net/
license License: Freeware
operating systems Operating System: All


is a light website torrent indexing script written in php/mysql design to help the publishing and sharing of bittorrent files over the internet through a website.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


This is a Perl script that can grab a file from an HTTP URL and save it locally.

developer Developer: rainwaterreptileranch.org
license License: Freeware
operating systems Operating System: All