Regular Expressions

Regular Expressions are used for pattern matching in a given string. There are a few conventions in the match syntax. There is also a substition and translation syntax;

@matched_parts = $value =~ m/regular_expression/ims;

The operation is assumed “true” if a match is made.

Note here the following parts:

  • The @matched_parts is an array of substring within the regular expression that have been selected for retun by wraping those substrings within parenthesis (). These are sometimes also called references and can be access with the special variable $1, $2, through $n.
  • The $value is the string being matched
  • The =~ operator binds the value to the regular expression to perform the operation
  • The /regular_expression/ is the coding discussed next to match the string or part of the string
  • “ims” are serparate modifiers to tell the processor other things.
    • i = ignore case.
    • s = treat as a single string, otherwise only the first line up to the new line (\n) is searched and no further.
    • m = treat like multiple lines.

$value =~ s/regex/string/gis;

This is the substitution operation. It matches part of a string and replaces it with another string. Back-References (parenthesized substrings) are replaced when referenced with $1, $2, through $n, or using \1, \2, etc.

  • $value is that being modified
  • The =~ operator bind the value to the expression
  • The ‘s’ deotes the substitution operation
  • /regex/ is the regular expression to match the string or substring
  • /string/ is its replacement value, and can include back-references explained above.
  • Modifiers here are:
    • g = global
    • i = ignore case
    • s = treat as single string

Matching Basics

$v = "abcdefff123123 xyz";
$v =~ /fff/; # matches a substring
 
$v =~ /f{3}/; # matched 3 of previous char or substring
$v =~ /(123){1,2}/; # matches 1 through 2 occurences of 123
 
$v =~ /def+/; #matches de followed by 1 or more f's
$v =~ /def*/; #matches de followed by 0 or more f's
$v =~ /def+?/; #matches de followed by 1 and the num # of f's
 
$v =~ /^abc/; # matches at beginning of string
$v =~ /xyz$/; # matches at end of string
 
$v =~ /./; # matches any character
$v =~ /^abc.*xyz$/; # starts with abc, optional middle (zero or more chars), ends with xyz
$v =~ /\./; # matches a period (escape)
 
$v =~ /[acb]+/; # matches string of letters acb.
$v =~ /[^trq]/; # matches string not containing t,r,q
$v =~ /\w+\d+\s+\S+/; # matches string of at least 1 "\w" word chars [A-Za-z_], at least one "\d" digit chars [0-9], at least one white-space char [ \t\n], and at least one "\S" non-white-space character [^ \t\n].
 
$v =~ /(abc|qwe)/; # matches either abc or qwe. The matched string is returned at the match array.
$v =~ /(?:abc|qwe)/; # same, no string returned
$v =~ /^.*(\d+)/; # matches and returns first set of digits in string.
 
$v =~ /^(.+)(f+)/; #matches the most number (greedy!) of chars and then the most number of f's. The split returned is ("abcdeff","f").
$v =~ /^(.+?)(f+)/; #matches the least number (not greedy) of chars and then the most number of f's. The split returned is ("abcde","fff").
 
programming/perl/regex.txt · Last modified: 2005/07/18 17:15 by allen