searchdb

Category: PHP - Search Engines

searchdb is an ASP.NET search engine written in VB.NET. It incorporates a webcrawler, indexer and site search engine. The program uses a database to store the crawled pages and extracted words and the results are displayed in a way similar to popular internet search engines. The program is capable of indexing static text web pages and also dynamic pages which are normally extracted from a database and are of the form 'default.asp?name=value' Formats such as Adobe pdf, Microsoft word, Macromedia Flash etc are not supported. The engine is intended for small to medium sized web sites. For example, one web site has about 50 searchable web pages which creates an Access database file of about 600 kbytes and can be searched in less the 0.5 seconds. Another web site has just over 1000 searchable web pages which creates an Access database file of about 13 Mbytes and takes about up to 0.8 seconds to search. Features - Crawls and indexes static and dynamic web pages. - Able to crawl multiple sites. - Stores the crawled URLs and associated words in database tables. - The word indexer extracts title, meta data, alt text and visible text from the web page. - Common words are excluded by the word indexer and search engine. - Search results are displayed in order of word hits in a way similar to popular internet engines. - Works with either Microsoft Access or SQL Server databases. - Set up via password protected management displays. The Crawler The webcrawler starts crawling from a given page extracting a list of url links. It then spiders each link, extracting further links. Eventually all pages for the domain are listed in the database. As each url link is found, the words on the page are extracted including meta tag keywords, meta tag descriptions and image alt text. These are stored in the database with the occurrence of each word. All words of more than one character are indexed except those defined in the exclude word list. Also, punctuation marks are removed so you may see words such as asp.net being stored as aspnet within the database. The same parsing is done on the search side as well as on the indexing side, so searching for asp.net will return the correct results. The current version does not obey the noindex and nofollow meta tag keywords which may appear in the head of a web page. If you wish to exclude certain areas of your site then you can do so by entering the directory names into the list of directories to be excluded. Then all files within the directories and any sub directories will not be indexed. As each page is indexed, its file size is stored in the database. This is so that you may re-index only those pages that have changed in file size rather than re-indexing the complete site. The Search Engine To search the site you enter one or more words into a text box. Any words of one character are ignored, as are common noise words such as 'them', 'they', etc. The search system is based on the word count within the pages. So if you do a search for 'cycling in Scotland' it will do a sql group by query based on 'cycling' and 'Scotland' and sorted by the word count. The word 'in' will be excluded as it is an exclude word. So a page which has the word cycling and the word Scotland several times will have a higher word count and hence higher relevance and will appear further up the top of the search results. The speed of searching is usually less than 0.5 seconds. As the number of web pages increase, the time to search does increase but not significantly because all the processing has been done during the index, and the search method is based on an efficient sql query. Management Displays In order to set up and configure the system, a set of password protected web pages are provided. Requirements: web server with Microsoft .NET framework installed Date: 04 May, 2012


Search Engine - Crawler Script - Search Engine Script - Search - Engine - Crawler

Homepage: http://www.webconcerns.co.uk/

Developer: webconcerns.co.uk

License: Freeware

Operating System: All

Add a Comment

all are required fields

     
What do you think of this resource?

Select Your Rate:

Votes:0

 

Related Scripts Download

Relax multi-platform compatible.

developer Developer: Matti Tukiainen
license License: Freeware
operating systems Operating System: Unix/Linux/Windows


This script supplies all the search engines you could ever want on one page.

developer Developer: Steve S
license License: GNU General Public License (GPL)
operating systems Operating System: Not Available


IntraSITE Search is a new, free utility capable of searching static websites on servers where no database has been set up.

developer Developer: Jim Skorzeny
license License: GNU General Public License (GPL)
operating systems Operating System: Not Available


Javascript object to index and search against any text content.

developer Developer: Martin Drapeau
license License: MIT/X Consortium License
operating systems Operating System: Not Available


This little invisible applet lets you see how long visitors keep your page open in their browsers.

developer Developer: Horace A.
license License: Freeware
operating systems Operating System: All


AnyPortal Site Manager is a Unix compatible powerful application that has a one-page footprint.

developer Developer: http://www.alief.com/any...
license License: Freeware
operating systems Operating System: Unix


Running Gauge is a multi-platform compatible applet gauge that displays realtime data - demo is of server activity level and current visitors on site - any cgi or other web accessible data source can be used to feed realtime input.

developer Developer: RF
license License: Freeware
operating systems Operating System: All


phpOpen is a free PHP script that grabs the contents of the Open Directory dynamically and formats them to make your own version of the Open Directory.

developer Developer: SourceForge.net
license License: Freeware
operating systems Operating System: All


This script provides you over 70 popular and unusual internet search engines in a drop down box.

developer Developer: javascriptsource.com
license License: Freeware
operating systems Operating System: All