Regular expression

sequence of characters that forms a search pattern

A regular expression (abbreviated regexp or regex) is a way to describe sets of characters using syntactic rules.[1] Many programming languages use or support regular expressions. A regular expression is then used by a special program or part of a programming language. This program will either generate a parser that can be used to match expressions or it will match such expressions itself. A simple use case for this is to find all words or phrases in a text that match a certain pattern. In one case, the 'pattern' might just be a word, but in more complex cases, there might be rules that say that the word needs to start with an uppercase letter, or that only certain letters are allowed.

Stephen Cole Kleene, introduced the concept in 1951

regular expression processor is used for processing a regular expression statement in terms of a grammar in a given formal language, and with that examines a text string.

A few examples of what can be matched with regular expressions:

  • The sequence of characters "car" appearing consecutively in any context, such as in "car", "cartoon", or "bicarbonate"
  • The sequence of characters "car" occurring in that order with other characters between them, such as in "Icelander" or "chandler"
  • The word "car" when it appears as an isolated word
  • The word "car" when preceded by the word "blue" or "red"
  • The word "car" when not preceded by the word "motor"
  • A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example, "$10" or "$245.99"). This does not match "$ 5", because of the space between the dollar sign and the digit, nor "€25", because there is no dollar sign.

Regular expressions can be much more complex than these examples. Many regular expression languages also support "wildcard" characters. A more complex example might be t0 validate a date. Some months have 31 days, others have 30; February has 29 in some cases (leap years). So a use case migt be to validate a date.

References

change
  1. "re — Regular expression operations — Python 3.8.3 documentation". docs.python.org. Retrieved 2020-05-17.