Perl simple search engine

1. The program

#!C:/Perl/bin/perl.exe -w use strict; use File::Find; use vars qw($DEBUGGING $basedir $baseurl @files $title $title_url search_url @blocked $emulate_matts_code $style $charset $E $count $page); use CGI qw(:standard); use CGI qw(param); $search_url = 'http://localhost:8033/PERL/index.html'; my $query = param("query"); #print header(); # --------------------------- sub start_of_html() { print "Content-Type: text/html; charset=charset\n\n"; print <<END_HTML; <head> <title>The results of the search</title> </head> <body> <center><table width="500" border="0" cellpadding="0" cellspacing="0" bgcolor="#c5e6ef"> <tr> <td height="19" valign="top" bgcolor="#00CCCC"> <font size="2" face="Verdana, Arial, Helvetica, sans-serif" color="#ffffff"> <h3><b>Your searched words:</b>       $query<h3></font></td> </tr> </table></center> <BR> <center><table width="75%" border="0" cellpadding="8" cellspacing="0"> <tr> <td width="341" height="23" valign="top" bgcolor="9b72cf"> <font face="Verdana, Arial, Helvetica, sans-serif" color="#990000" size="3"><b> <font face="Geneva, Arial, Helvetica, san-serif" color="#ffffff"> Results of Search: </font></b></font> </td> </tr> </table></center> <center><table width="75%" border="0" cellpadding="8" cellspacing="0"> <tr> <td valign="top" height="99" bgcolor="#f1f1fd"> END_HTML } # --------------------------- sub subroutine1(%query) { return if($_ =~ /^\./); return unless($_ =~ /\.html/i); stat $File::Find::name; return if -d; return unless -r; open(FILE, "< $File::Find::name") or return; my $string = <FILE>; close (FILE); return unless ($string =~ /\Q$query\E/i); my $page = $_; if($string =~ /<title>(.*?)<\/title>/is) { $page = $1; $count++; #$File::Find::name =~ tr/c/C/; #$File::Find::name =~ tr/\//\\/; print qq"<li><a href=\"$File::Find::name\">$page< /a></b></li>\n"; } } # --------------------------- sub count { #if($page eq ''){ if($count == 0 && $query ne ''){ print "<b>There are no maching files .. </b>"; } } # --------------------------- sub end_of_html (){ print <<END_HTML; </td> </tr> <center></table> <BR> <BR> <center><table width="75%" border="0" cellpadding="8" cellspacing="0" bgcolor="#c5e6ef" > <tr> <td> <a href= $search_url>Back to home page </a></li> </td> </tr> </table></center> </body> </html> END_HTML } start_of_html(); &subroutine1; if ($query){ find(\&subroutine1, './'); } elsif ($query == ''){ print "<b>No words have been entered .</b>"; } count(); end_of_html(); # ---------------------------

2. The related steps:

1. Query to send

#!perl -w Tells the script where the Perl interpreter is located, The "-w" options is used to activate the warnings mode. use strict; To load the "strict" pragma. use CGI qw(:standard); Load the CGI module, using the standard functional interface. In independent file emgile.html for instance, write the following form: <form action="" method="post"> <input type="text" name="query" size="50" /> <input type="submit" /> </form> It will send to the file, by te method "POST" the named "query", by the button "submit". To retrieve the form data, that is "query", we use this with the CGI�s "param" function, present in the CGI module. my $query = param("query"); Finally, we want to print this value to the browser, Before we do that, however, we need to To have the results in an html output at a browser, we print a content�type header. If we want to use the CGI's content�type header, we write: print header(); We can load also the functions from the CGI's module: print start_html(); and print end_html()

2. Finding the files

We will go through the directories and open the correct found types of files by using the File::Find module that exports a function Find(). This function Find() takes two arguments: - a subroutine: list of instructions of what related to each file and - a starting directory: each file in the directory and its sub�directories This function "Find()" returns filenames, their path , and so on ... returning a bunch of information to us (such as filename, path, etc) for each. You'll notice that $query is still there, as well as header() and start_html(). You�ll also notice that we used the File::Find module similarly to the way we used CGI. Next, let�s output a document title: find( sub { # code of the sunroutine goes here ... }, 'starting directory'); We can also use the following code: find(&a_routine, 'starting directory'); and describe the subroutine: sub a_routine{ # code of the subroutine goes here ... } Let's full our subroutine: find( sub { return if ($_ =~ /^\./); return unless ($_ =~ /\.html/i); }, 'starting directory'); In this code, we use the function "return" (do nothing and, move on to the next ), and write that we do not use any file starting with a dot"." nor files other than html extension files. Return (unless those conditions are met) is equivalent to Return ( if those conditions are NOT met) $_ is the default variable : the current filename within the current directory =~ is the "mach" operator ^ this caret means "begins" \. leteral point i for case�insensitive.

3. Testing the files

The variable $File::Find::name gives the full file name (path and the name of the file( while the name of the file is stored in the dfault variable $_. We select (test or check) the files. we use first the function "stat()", then some tests: stat $File::Find::name; return if -d; to checks if it is a directory return unless -r; to check if the file is readable

4. Searching the files

T know if a file contains the terms we are searching for, we open the file and put its contents into a string. open(FILE, "< $File::Find::name") or return; my $string = <FILE>; # This variable holds the contents of our file close (FILE); Next, let�s check if the file contains our query, that is if $query is within $string: return unless ($string =~ /\Q$query\E/i);

5. Displaying the results

Let's replace the default variable $_ by a new variable $page. The condition we will set is to check the title of the file: my $page = $_; if ($string =~ /<title>(.*?)<\/title>/is) { $page = $1; } If the match occurs, the results will be contained in the variable $1 , and $page will be assigned its results. print "<li><a href=\"$File::Find::name\">$page</a></li>\n";

