Web Crawler On Client Machine
<!– @page { size: 21cm 29.7cm; margin: 2cm } P { margin-bottom: 0.21cm } –>
Storing the web page information in a database IMPLEMENTATION After the downloader retrieves the web page information from the internet, the information is stored This Web crawler application builds on the above mentioned in a database. The database is used to maintain web page modules and uses ideas from previous crawlers. This is information to index the web pages so that this database developed in C++ works on Windows XP operating system. can be searched, for any search keyword, as in a search It makes use of Windows API, Graphics Device Interface, engine. ActiveX controls. For database connectivity we use ODBC interface. The currently proposed web crawler uses breadth Keyword search first search crawling to search the links. The proposed web crawler is deployed on a client machine. User enters the URL A search keyword is taken from the user as input for example http:// rediffmail.com in the browser created. and the keyword search module searches the keyword Once the start button is pressed, an automated browsing from the database and gives the indexing result to the process is initiated. The HTML page contents of user. A simple browser is designed to allow user to rediffmail.com homepage are given to the parser. The parser browse the pages directly from the application, puts it in a suitable format as described above and the list of instead of using a browser outside of the system. URLs in the HTML page are listed and stored in the frontier. The URLs are picked up from the frontier and each URL is assigned to a downloader. The status of downloader whether busy or free can be known. After the page is downloaded it is added to the database and then the particular downloader is Internet Growth and Staticstics: Credits and Background. set as free (i.e. released). The user has a choice to stop the search process at any time if the desired results are found.