 |
Comparisons in Perl
|
This tutorial shows how
to use simple comparisons and regular expressions in Perl.
Almost everything we put
in an ‘if’ expression is going to be
comparing a variable with a value, or a variable with another variable. In
Perl, we have to keep our numeric comparisons separate from our text comparisons.
The main reason for this is that Perl can’t know whether we want it to
do a numeric or text comparison tell it what we want. Depending on context, ‘1.0’ could
be just the number, or it could be the text string ‘1’, ‘.’,
and ‘0’.
Simple Comparisons
There are four major comparison operators: equals, does not equal, greater
than, and less than. There are also the combinations, ‘less than or
equal to’ and ‘greater than or equal to’.
| Comparison |
Numeric |
Text |
| equals |
== |
eq |
| does not equal |
!= |
ne |
| greater than |
> |
gt |
| less than |
< |
lt |
| greater than or equal
to |
>= |
ge |
| less than or equal
to |
<= |
le |
Here’s
a program you can use
to test various combinations:
$LS = $ARGV[0];
$RS = $ARGV[1];
$T = "True";
$F = "False";
print "Left side is $LS, Right side is $RS.\n";
print "Comparison\t\tNumeric\tText\n";
print "equals\t\t\t", $LS == $RS? $T : $F, "\t", $LS eq
$RS? $T : $F, "\n";
print "not equal\t\t", $LS != $RS? $T : $F, "\t", $LS ne
$RS? $T : $F, "\n";
print "greater than\t\t", $LS > $RS? $T : $F, "\t",
$LS gt $RS? $T : $F, "\n";
print "less than\t\t", $LS < $RS? $T : $F, "\t", $LS
lt $RS? $T : $F, "\n";
print "greater than or equal\t", $LS >= $RS? $T : $F, "\t",
$LS ge $RS? $T : $F, "\n";
print "less than or equal\t", $LS <= $RS? $T : $F, "\t",
$LS le $RS? $T : $F, "\n";
You can use any of these in ‘if’ or ‘while’ statements
just as you have been using the ‘eq’ and ‘==’ in those
statements.
You’ll notice yet another form of the ‘if’ statement
used heavily in this program. The expression comparison ? value one : value
two
returns value one if the comparison is true, and value two if the comparison
is not true. It is very similar to using:
if (comparison) {
print value one;
} else {
print value two;
}
The big difference is that you can
use the ‘? :’ form inside other
expressions, as we did above.
AND and OR
Just as you can combine arithmetic using parentheses and mathematical operators,
you can also combine comparisons using parentheses and the ‘and’ and
the ‘or’ ‘logical’ operator. Just like in real life,
where you might ask a friend to do something for you if this happens or that
happens, but not if some other thing happens. The symbol for ‘and’ in
Perl is ‘&&’. The symbol for ‘or’ in Perl
is ‘||’. And the symbol for ‘not’ is ‘!’.
Let’s look at some examples.
| Example |
Meaning |
if (($a > $b) && ($a > $c))
{
print "$a is pretty big.\n";
} |
if the variable ‘a’ is
greater than both b and c, print ‘a is pretty big’. |
if ((($a>$b) && ($a<$c))
|| (($a<$b) && ($a > $c))) {
print “$a is between $b and $c.\n";
} |
if the variable ‘a’ is
greater than b and less than c, or if the variable ‘a’ is
less than b and greater than c, print ‘a is between b and c’.
|
if (!(($a gt $b) || ($a > $b))) {
print “$a is not greater than $b as text or number.\n";
} |
The only way this
one comes out ‘true’ and prints is if a is not greater than
b as text, and a is not greater than b as numbers. |
Combining expressions
like this can get very complicated very quickly. If you start to use this form,
keep
careful track of your parentheses! There have
to be as many ‘close’ parentheses as there are ‘open’ parentheses.
Regular Expressions
The real power of Perl comes from its extensive use of regular expressions.
Regular expressions allow us to look for text ‘strings’ within
larger text documents or in lines of text. Regular expressions occur between
two forward slashes.
Comparisons
The simplest form of a regular expression is just to put the text you’re
looking for between two slashes. We’ve already done that to look for
text in the lines of files:
if (/amy/i) {
print "Amy sighted!\n";
}
The ‘/i’ at the end, recall, means that case doesn’t matter.
We’ll find Amy in the line of text whether her name is given as ‘Amy’, ‘amy’,
or ‘AMY’. If we only wanted to find ‘Amy’, we would
look for ‘/Amy/’ and leave off the ‘i’. The useful
part about this expression is that we’ll find ‘amy’ (or ‘Amy’)
in the text no matter where it occurs.
This format you’ve already seen. It looks at the current line, assuming
the current line is in ‘$_’. You can apply this comparison to any
scalar variable, however, with the ‘=~’ operator:
if ($Name =~ /amy/i) {
print "Amy sighted!\n";
}
And if you want the equivalent of ‘not equal to’, use ‘!~’:
if ($Name !~ /jerry/i) {
print "No Jerry seen.\n";
}
This is useful, but it is only the
tip of the iceberg of the power of regular expressions. You can also use ‘wildcards’,
like the wild cards that the Unix shell uses, to fudge your comparison.
| Wildcard |
Meaning |
| ^ |
The beginning of the
line. |
| $ |
The end of the line. |
| . |
Any character. |
| * |
Any number of the
preceding character: the preceding character can occur zero or more times. |
| + |
Any number of the
preceding character: it must occur at least once.
|
| ? |
The preceding character
can occur once, or zero times. |
| [characterlist] or
[character1-character2] |
This character can
be any of the characters between the brackets. Or, if you start the list
of characters with the caret (^), the character can be any character
except the ones between the brackets. |
| (string1|string2) |
Either string1 or
string2 must appear here. |
These are the most
commonly used wildcards. There are more, and if you need something that isn’t
listed here, look in the camel book.
This example checks to see if the
line begins with a number of any size, a period, and then a space. It will
match ‘1. Oranges’ or ‘253.
Apples’ but it will not match ‘There are 42 apples.’ or ‘1
apple fell.’
if ($Sentence =~ /^[0-9]+\. /) {
print "This is probably part of a numeric list.\n";
}
Wait! Isn’t a period a special character? Under normal circumstances,
a period means any character, not just a period. Because of that, we had to
add a ‘backslash’ in front of the period. The backslash tells Perl
that we really mean just a period. We don’t want the special meaning
of period. You can use a backslash with any of the special characters above,
so that you can look for question marks or plus signs as well.
Now, suppose we were looking for ‘Jerry’, but we don’t know
if the name will be spelled ‘Jerry’, ‘Gerry’, ‘Jerold’, ‘Jerrold’,
or ‘Gerald’. We’ll use the following regular expression:
if ($Name =~ /[JG]er(ry|r?[oa]ld)/) {
print "Jerry sighted!\n";
}
First, we put ‘J’ and ‘G’ in brackets, because the
name might begin with either of those. Second, we put ‘(ry|r?[oa]ld)’ because
the name might be either the diminutive or the proper name. We use ‘r?’ to
put a possible second ‘r’ in there--but no more than one extra ‘r’.
And we use [oa] to catch either ‘old’ or ‘ald’ endings.
This will match the combinations ‘Jerry’, ‘Gerry’, ‘Jerold’, ‘Jerald’, ‘Jerrold’, ‘Jerrald’, ‘Gerold’, ‘Gerald’, ‘Gerrold’, ‘Gerrald’.
Some of those are a bit nonsensical, but this should catch all variations on
the name ‘Gerald’.
Memory
Suppose we wanted to take our ‘list finder’ and convert it to an
HTML list style. We need to convert the number-period-space to a <li>,
and keep the text after the number. We can use the parentheses to tell Perl
to remember this part of what you found. It remembers it in the variable $1.
If you use two parentheses, the next item is remembered in $2. You can remember
up to 9 parts of your regular expression.
print "<html><body>\n";
while (<>) {
if (/^[\t ]*[0-9]+\. (.*)$/) {
if (!$InList) {
print "<ol>\n";
$InList = 1;
}
print "<li>$1\n";
} elsif (/^ *$/) {
print "<p>\n" if !$InList;
} else {
if ($InList) {
print "</ol>\n";
$InList = 0;
}
print;
}
}
print "</body></html>\n";
You have to remember to use $1 before your next regular expression, or it
will be erased. Usually, it is a good idea to put $1 into another variable
($Text2Remember=$1) immediately following the regular expression you got it
from.
Modifications
You can use regular expressions to modify the string of text you’re using
the regular expression on. Sound a little complicated? Don’t worry, it
is! But if you start simple and work your way up, you’ll get the hang
of it. Let’s modify our “text2html” program above so that
it handles emphasized text. Many times, people will use the asterisks to emphasize
words in text. In HTML, we would surround those words with “<em>É</em>”.
To further complicate matters, there might be more than one emphasized word
or phrase per line!
$NewLine = $_;
$NewLine =~ s/\*([^*]+)\*/<em>\1<\/em>/g;
Okay, what’s going on here? First, we put an ‘s’ before
our first slash. This stands for ‘substitute’, and it means we’re
going to do a little substituting, or modifying, of the variable we’re
looking at. In this case, ‘$NewLine’.
Next, we have to ‘backquote’ the asterisk. Otherwise, we would
get its special meaning of any number of characters, which we don’t want!
We want to look specifically for asterisks.
Then, we use the open parentheses
to tell Perl we want to remember this next part. In the square brackets,
we put a caret, to indicate that the next character
can be any character but an asterisk. We use the + to indicate that there has
to be at least one of these non-asterisk characters and maybe more. We close
the parentheses--this is the end of the word or phrase we want to remember,
because the next character we look for is another backquoted asterisk. Thus,
what we’re looking for is any string of characters between two asterisks.
Then, we have a forward slash and
more junk! This is where we tell Perl how to modify our $NewLine. Whatever
we found on the left half of the regular expression
gets replaced with the right half of the regular expression. In this case,
we replace it with “<em>”, a \1, and “</em>”.
The \1 is just like our $1: it’s the item we asked Perl to remember.
So, what’s going to happen is that Perl is going to take all text between
two asterisks and surround that text with the HTML emphasis code instead.
We have to backslash the ‘/’ in ‘</em>’. Otherwise,
Perl would think we were closing the second half of the regular expression,
and we aren’t quite done yet.
Finally, we end our regular expression
with a ‘g’. This stands
for ‘global’, and it means that Perl will continue this substitution
as long as it continues to find phrases between asterisks.
Here’s our new program:
print "<html><body>\n";
while (<>) {
if (/^[\t ]*[0-9]+\. (.*)$/) {
if (!$InList) {
print "<ol>\n";
$InList = 1;
}
$NewLine = "<li>$1\n";
} elsif (/^ *$/) {
print "<p>\n" if !$InList;
} else {
if ($InList) {
print "</ol>\n";
$InList = 0;
}
$NewLine = $_;
}
if ($NewLine) {
$NewLine =~ s/\*([^*]+)\*/<em>\1<\/em>/g;
print $NewLine;
$NewLine = "";
}
}
print "</body></html>\n";
Try this out on the following file:
This is a test of *Text2HTML*.
1. Oranges
2. Apples
3. Pears
4. Kumquats
And *here* is *another line*.
And maybe another list.
1. Rhododendrons
2. Roses
3. Tulips
4. Artichokes
5. Violets
6. Hyacinths
7. Dandelions
8. Apple Blossoms
9. Pine cones
10. Maple leaves
11. Oak leaves.
And *this* is *the end*.
So there you have it. Play
around with regular expressions a lot, so you get used to them. They are
the main reason Perl is so popular for working on the
web and for modifying text files.
About
this Tutorial
This tutorial is written by Jerry Stratton and is published
under the GNU Free Documentation License.
InfoWorld's Java IDE Comparison Strategy Guide If you're looking for a Java IDE, you want one based on Eclipse. But which offering do you want? Download the Infoworld Java IDE Comparison Strategy Guide. In this three-part guide we'll go deep into the details, comparing Technology of the Year winner JBuilder 2007, IBM's IRAD, MyEclipse and the free open-source Eclipse platform.
We close this Java IDE Strategy Guide with a look at an advanced concept in Java development: Application Factories. This innovative development metaphor and associated collection of tools allows developers to focus more on the nature and purpose of the application, and less on the underlying platform, framework, and technologies being used. Request Your Free Strategy Guide! |
|