Swish-e

Category: C & C++ - Database

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft Word / Power Point / Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL DBMS for very fast full-text searching.Features: - Quickly index a large number of documents in different formats including text, HTML, and XML. - Use "filters" to index other types of files such as PDF, gzip, or PostScript. - Includes a web spider for indexing remote documents over HTTP. Follows Robots Exclusion Rules (including META tags). - Can use an external program to supply documents to Swish-e, such as an advanced spider for your web server or a program to read and format records from a relational database. - Document "properties" (some subset of the source document, usually defined as a META or XML elements) may be stored in the index and returned with search results. - Document summaries can be returned with each search. - Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching - Phrase searching and wildcard searching - Limit searches to HTML links. - Use powerful Regular Expressions to select documents for indexing or exclusion. - Easily limit searches to parts or all of your web site. - Results can be sorted by relevance or by any number of properties in ascending or descending order. - Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements. - Can report structural errors in your XML and HTML documents. - Index file is portable between platforms. - A Swish-e library is provided to allow embedding Swish-e into your applications for very fast searching. - A Perl module is available that provides a standard API for accessing Swish-e. - Includes example search script with context summaries and search term and phrase highlighting - Can be used with popular Perl templating systems. - Swish-e is fast. Date: 26 January, 2012


Web Pages Indexing - Indexing Tool - Documents Indexing - Web - Pages - Documents

Homepage: http://swish-e.org/

Developer: swish-e.org

License: Freeware

Operating System: All

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

mediaCat-GTK is a cross-platform GUI database frontend designed to allow you to index and search your mp3, dvd, and cd collections.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


MyCDCatalog reads a CDROM volume (ISO9660) takes their information, traverses the file system tree and stores information about each file and directory found.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


phpIndex is a file index generating script for the PHP-enabled web server.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


The purpose of this software is primarily to create a Book Catalog using barcode data from the freely avaliable cuecat(tm) bar code reader.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


This is a PHP script that is used to parse the DMOZ RDF data dump files automatically.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


RFind is a little application that indexes the filenames of a given directory, and allows to quickly search this index with regular expressions.

developer Developer: martin.ankerl.com
license License: Freeware
operating systems Operating System: All


MediaBank is perl/mysql based *NIX indexing system with perl or php frontend.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


MSCBlob (Binary Large Object) is an auxiliary component for data blocks storing and transmitting.

developer Developer: miraplacid.com
license License: Freeware
operating systems Operating System: All


WizSQLiteAdmin is a PHP script to manage SQLite databases.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All