Our Sites:  Tutorial Buzz  |  How To Tree  |  Recipe Voice  |  Golf Twist  |  DIY Click  |  Movie Lizard  |  Halloween Twist  
Search:
Submit Link
Mail to a Friend
RSS FeedReceive updates via our RSS feed

Comparisons in Perl

This tutorial shows how to use simple comparisons and regular expressions in Perl. Almost everything we put in an ‘if’ expression is going to be comparing a variable with a value, or a variable with another variable. In Perl, we have to keep our numeric comparisons separate from our text comparisons. The main reason for this is that Perl can’t know whether we want it to do a numeric or text comparison tell it what we want. Depending on context, ‘1.0’ could be just the number, or it could be the text string ‘1’, ‘.’, and ‘0’.

Simple Comparisons
There are four major comparison operators: equals, does not equal, greater than, and less than. There are also the combinations, ‘less than or equal to’ and ‘greater than or equal to’.

Comparison Numeric Text
equals == eq
does not equal != ne
greater than > gt
less than < lt
greater than or equal to >= ge
less than or equal to <= le

Here’s a program you can use to test various combinations:

$LS = $ARGV[0];
$RS = $ARGV[1];
$T = "True";
$F = "False";
print "Left side is $LS, Right side is $RS.\n";
print "Comparison\t\tNumeric\tText\n";
print "equals\t\t\t", $LS == $RS? $T : $F, "\t", $LS eq $RS? $T : $F, "\n";
print "not equal\t\t", $LS != $RS? $T : $F, "\t", $LS ne $RS? $T : $F, "\n";
print "greater than\t\t", $LS > $RS? $T : $F, "\t", $LS gt $RS? $T : $F, "\n";
print "less than\t\t", $LS < $RS? $T : $F, "\t", $LS lt $RS? $T : $F, "\n";
print "greater than or equal\t", $LS >= $RS? $T : $F, "\t", $LS ge $RS? $T : $F, "\n";
print "less than or equal\t", $LS <= $RS? $T : $F, "\t", $LS le $RS? $T : $F, "\n";

You can use any of these in ‘if’ or ‘while’ statements just as you have been using the ‘eq’ and ‘==’ in those statements.

You’ll notice yet another form of the ‘if’ statement used heavily in this program. The expression comparison ? value one : value two returns value one if the comparison is true, and value two if the comparison is not true. It is very similar to using:

if (comparison) {
   print value one;
} else {
   print value two;
}

The big difference is that you can use the ‘? :’ form inside other expressions, as we did above.

AND and OR
Just as you can combine arithmetic using parentheses and mathematical operators, you can also combine comparisons using parentheses and the ‘and’ and the ‘or’ ‘logical’ operator. Just like in real life, where you might ask a friend to do something for you if this happens or that happens, but not if some other thing happens. The symbol for ‘and’ in Perl is ‘&&’. The symbol for ‘or’ in Perl is ‘||’. And the symbol for ‘not’ is ‘!’. Let’s look at some examples.

Example Meaning
if (($a > $b) && ($a > $c)) {
   print "$a is pretty big.\n";
}
if the variable ‘a’ is greater than both b and c, print ‘a is pretty big’.
if ((($a>$b) && ($a<$c)) || (($a<$b) && ($a > $c))) {
   print “$a is between $b and $c.\n";
}

if the variable ‘a’ is greater than b and less than c, or if the variable ‘a’ is less than b and greater than c, print ‘a is between b and c’.

if (!(($a gt $b) || ($a > $b))) {
   print “$a is not greater than $b as text or number.\n";
}
The only way this one comes out ‘true’ and prints is if a is not greater than b as text, and a is not greater than b as numbers.

Combining expressions like this can get very complicated very quickly. If you start to use this form, keep careful track of your parentheses! There have to be as many ‘close’ parentheses as there are ‘open’ parentheses.

Regular Expressions
The real power of Perl comes from its extensive use of regular expressions. Regular expressions allow us to look for text ‘strings’ within larger text documents or in lines of text. Regular expressions occur between two forward slashes.

Comparisons
The simplest form of a regular expression is just to put the text you’re looking for between two slashes. We’ve already done that to look for text in the lines of files:

if (/amy/i) {
   print "Amy sighted!\n";
}

The ‘/i’ at the end, recall, means that case doesn’t matter. We’ll find Amy in the line of text whether her name is given as ‘Amy’, ‘amy’, or ‘AMY’. If we only wanted to find ‘Amy’, we would look for ‘/Amy/’ and leave off the ‘i’. The useful part about this expression is that we’ll find ‘amy’ (or ‘Amy’) in the text no matter where it occurs.

This format you’ve already seen. It looks at the current line, assuming the current line is in ‘$_’. You can apply this comparison to any scalar variable, however, with the ‘=~’ operator:

if ($Name =~ /amy/i) {
   print "Amy sighted!\n";
}

And if you want the equivalent of ‘not equal to’, use ‘!~’:

if ($Name !~ /jerry/i) {
   print "No Jerry seen.\n";
}

This is useful, but it is only the tip of the iceberg of the power of regular expressions. You can also use ‘wildcards’, like the wild cards that the Unix shell uses, to fudge your comparison.

Wildcard Meaning
^ The beginning of the line.
$ The end of the line.
. Any character.
* Any number of the preceding character: the preceding character can occur zero or more times.
+

Any number of the preceding character: it must occur at least once.

? The preceding character can occur once, or zero times.
[characterlist] or [character1-character2] This character can be any of the characters between the brackets. Or, if you start the list of characters with the caret (^), the character can be any character except the ones between the brackets.
(string1|string2) Either string1 or string2 must appear here.

These are the most commonly used wildcards. There are more, and if you need something that isn’t listed here, look in the camel book.

This example checks to see if the line begins with a number of any size, a period, and then a space. It will match ‘1. Oranges’ or ‘253. Apples’ but it will not match ‘There are 42 apples.’ or ‘1 apple fell.’

if ($Sentence =~ /^[0-9]+\. /) {
   print "This is probably part of a numeric list.\n";
}

Wait! Isn’t a period a special character? Under normal circumstances, a period means any character, not just a period. Because of that, we had to add a ‘backslash’ in front of the period. The backslash tells Perl that we really mean just a period. We don’t want the special meaning of period. You can use a backslash with any of the special characters above, so that you can look for question marks or plus signs as well.

Now, suppose we were looking for ‘Jerry’, but we don’t know if the name will be spelled ‘Jerry’, ‘Gerry’, ‘Jerold’, ‘Jerrold’, or ‘Gerald’. We’ll use the following regular expression:

if ($Name =~ /[JG]er(ry|r?[oa]ld)/) {
   print "Jerry sighted!\n";
}

First, we put ‘J’ and ‘G’ in brackets, because the name might begin with either of those. Second, we put ‘(ry|r?[oa]ld)’ because the name might be either the diminutive or the proper name. We use ‘r?’ to put a possible second ‘r’ in there--but no more than one extra ‘r’. And we use [oa] to catch either ‘old’ or ‘ald’ endings. This will match the combinations ‘Jerry’, ‘Gerry’, ‘Jerold’, ‘Jerald’, ‘Jerrold’, ‘Jerrald’, ‘Gerold’, ‘Gerald’, ‘Gerrold’, ‘Gerrald’. Some of those are a bit nonsensical, but this should catch all variations on the name ‘Gerald’.

Memory
Suppose we wanted to take our ‘list finder’ and convert it to an HTML list style. We need to convert the number-period-space to a <li>, and keep the text after the number. We can use the parentheses to tell Perl to remember this part of what you found. It remembers it in the variable $1. If you use two parentheses, the next item is remembered in $2. You can remember up to 9 parts of your regular expression.

print "<html><body>\n";
while (<>) {
   if (/^[\t ]*[0-9]+\. (.*)$/) {
      if (!$InList) {
         print "<ol>\n";
         $InList = 1;
      }
      print "<li>$1\n";
   } elsif (/^ *$/) {
      print "<p>\n" if !$InList;
   } else {
      if ($InList) {
         print "</ol>\n";
         $InList = 0;
      }
      print;
   }
}
print "</body></html>\n";

You have to remember to use $1 before your next regular expression, or it will be erased. Usually, it is a good idea to put $1 into another variable ($Text2Remember=$1) immediately following the regular expression you got it from.

Modifications
You can use regular expressions to modify the string of text you’re using the regular expression on. Sound a little complicated? Don’t worry, it is! But if you start simple and work your way up, you’ll get the hang of it. Let’s modify our “text2html” program above so that it handles emphasized text. Many times, people will use the asterisks to emphasize words in text. In HTML, we would surround those words with “<em>É</em>”. To further complicate matters, there might be more than one emphasized word or phrase per line!

$NewLine = $_;
$NewLine =~ s/\*([^*]+)\*/<em>\1<\/em>/g;

Okay, what’s going on here? First, we put an ‘s’ before our first slash. This stands for ‘substitute’, and it means we’re going to do a little substituting, or modifying, of the variable we’re looking at. In this case, ‘$NewLine’.

Next, we have to ‘backquote’ the asterisk. Otherwise, we would get its special meaning of any number of characters, which we don’t want! We want to look specifically for asterisks.

Then, we use the open parentheses to tell Perl we want to remember this next part. In the square brackets, we put a caret, to indicate that the next character can be any character but an asterisk. We use the + to indicate that there has to be at least one of these non-asterisk characters and maybe more. We close the parentheses--this is the end of the word or phrase we want to remember, because the next character we look for is another backquoted asterisk. Thus, what we’re looking for is any string of characters between two asterisks.

Then, we have a forward slash and more junk! This is where we tell Perl how to modify our $NewLine. Whatever we found on the left half of the regular expression gets replaced with the right half of the regular expression. In this case, we replace it with “<em>”, a \1, and “</em>”. The \1 is just like our $1: it’s the item we asked Perl to remember. So, what’s going to happen is that Perl is going to take all text between two asterisks and surround that text with the HTML emphasis code instead.

We have to backslash the ‘/’ in ‘</em>’. Otherwise, Perl would think we were closing the second half of the regular expression, and we aren’t quite done yet.

Finally, we end our regular expression with a ‘g’. This stands for ‘global’, and it means that Perl will continue this substitution as long as it continues to find phrases between asterisks.

Here’s our new program:

print "<html><body>\n";
while (<>) {
   if (/^[\t ]*[0-9]+\. (.*)$/) {
      if (!$InList) {
         print "<ol>\n";
         $InList = 1;
      }
      $NewLine = "<li>$1\n";
   } elsif (/^ *$/) {
      print "<p>\n" if !$InList;
   } else {
      if ($InList) {
         print "</ol>\n";
         $InList = 0;
      }
      $NewLine = $_;
   }
   if ($NewLine) {
      $NewLine =~ s/\*([^*]+)\*/<em>\1<\/em>/g;
      print $NewLine;
      $NewLine = "";
   }
}
print "</body></html>\n";

Try this out on the following file:

This is a test of *Text2HTML*.
   1. Oranges
   2. Apples
   3. Pears
   4. Kumquats
And *here* is *another line*.
And maybe another list.
1. Rhododendrons
2. Roses
3. Tulips
4. Artichokes
5. Violets
6. Hyacinths
7. Dandelions
8. Apple Blossoms
9. Pine cones
10. Maple leaves
11. Oak leaves.
And *this* is *the end*.

So there you have it. Play around with regular expressions a lot, so you get used to them. They are the main reason Perl is so popular for working on the web and for modifying text files.

About this Tutorial
This tutorial is written by Jerry Stratton and is published under the GNU Free Documentation License.

InfoWorld's Java IDE Comparison Strategy Guide
If you're looking for a Java IDE, you want one based on Eclipse. But which offering do you want? Download the Infoworld Java IDE Comparison Strategy Guide. In this three-part guide we'll go deep into the details, comparing Technology of the Year winner JBuilder 2007, IBM's IRAD, MyEclipse and the free open-source Eclipse platform.

We close this Java IDE Strategy Guide with a look at an advanced concept in Java development: Application Factories. This innovative development metaphor and associated collection of tools allows developers to focus more on the nature and purpose of the application, and less on the underlying platform, framework, and technologies being used.

Request Your Free Strategy Guide!

Home  |  News  |  Source Code  |  Tutorials  |  Components  |  Tools  |  Books  |  Free Magazines  |  Jobs  |  Gear  |  Hosting  |  Links
 
Copyright © 2000 - 2006 Code Beach  |    |  Privacy Policy
 
Free thumbnail preview by Thumbshots.org