Web Crawler On Client Machine

<!– @page { size: 21cm 29.7cm; margin: 2cm } P { margin-bottom: 0.21cm } –>

Storing the web page information in a database


After the downloader retrieves the web page
information from the internet, the information is stored
This Web crawler application builds on the above mentioned
in a database. The database is used to maintain web page
modules and uses ideas from previous crawlers. This is
information to index the web pages so that this database
developed in C++ works on Windows XP operating system.
can be searched, for any search keyword, as in a search
It makes use of Windows API, Graphics Device Interface,
ActiveX controls. For database connectivity we use ODBC
interface. The currently proposed web crawler uses breadth

Keyword search

first search crawling to search the links. The proposed web
crawler is deployed on a client machine. User enters the URL
A search keyword is taken from the user as input
for example http:// rediffmail.com in the browser created.
and the keyword search module searches the keyword
Once the start button is pressed, an automated browsing
from the database and gives the indexing result to the
process is initiated. The HTML page contents of
user. A simple browser is designed to allow user to
rediffmail.com homepage are given to the parser. The parser
browse the pages directly from the application,
puts it in a suitable format as described above and the list of
instead of using a browser outside of the system.
URLs in the HTML page are listed and stored in the frontier.
The URLs are picked up from the frontier and each URL is
assigned to a downloader. The status of downloader whether
busy or free can be known.  After the page is downloaded it is
added to the database and then the particular downloader is
Internet Growth and Staticstics: Credits and Background.
set as free (i.e. released). 

The user has a choice to stop the search process
at any time if the desired results are found.