RSS

Regex in Salesforce

08 Jun
Regex in Salesforce

(Pardon the dust. 🙂 This post is still a work in progress!)

Regex is a confusing language to me. But even though I find it confusing, I recognize it’s power to verify input. Here are my (ongoing) notes on Regex.


What is Regex?

Regex is a sequence of characters that defines a string pattern. Huh? Let’s break that down:

  • Regex is short for “regular expression”
  • A “regular expression” is a sequence of characters
  • That sequence of characters may look more like hieroglyphics than anything else
    • regex
  • Those hieroglyphics define a string pattern
  • A string pattern is a string of characters, such as a username or an email address

So Regex is hieroglyphics that define a string. 😀


Regex Names, Engines, and Flavors

Other Namesflavors

Regex is also known as Regexp.

Engines & Flavors

Regex is processed by software called an Engine. And to complicate things, there are lots of different engines, each with its own unique way of interpreting the regular expression. The syntax and behavior of a particular engine is called a regular expression flavor. What this means is that if you’re using Regex in Java you’ll be using one flavor, and if you’re using Regex in Python you’ll be using another flavor.

What Flavor Does Salesforce Use?

Salesforce uses the Java flavor (see Java.Util.Regex Class Pattern).


Where Is Regex Used In Salesforce

Salesforce has a REGEX function that is available in field validation formulas, Visual Flow, and Apex. The REGEX function is not available in formula fields or in Visualforce. Boo.


The Salesforce Regex Function

The Salesforce Regex Function looks like this:

REGEX(the_string_to_check, the_regular_expression)


Using REGEX In Validation Rules

Validation rules look for errors. If an error is found (the validation rules evaluates to True), an error message is displayed and the user must correct their input before the record can be saved.

This means that when you’re writing REGEX that matches what you want in the field you’ll need to negate the final result like this:

NOT( REGEX(Zip__c, “[0-9]{5}(-[0-9]{4})?”)


Getting To The Nitty-Gritty Of Understanding Regex

nitty grittyOkay, so how do you start to understand Regex gobbly-gook? You take it just one little bite at a time!

So here we go!

gobbledygookHow characters are represented in Regex

Other than special characters, what you see is what the Regex formula is looking for. In this example the formula is looking for the letter “u” and only the letter “u”. If anything else is present the formula will evaluate to false:

REGEX(String, “u”)

A set of characters, or Character Class, is a group of characters between brackets. Character classes allow for several different characters to be present at a given location in the string. In this example the formula is looking for a single character, the letter “a”, the letter “f”, or the digit “2”:

REGEX(String, “[af2]”)

To include all the characters between two given characters you can use a dash. In this example the formula is looking for a single character between “t” and “x” or between “R” and “Y”:

REGEX(String, “[t-xR-Y]”)

Negation

To exclude characters you put a carat (^) at the beginning of the character class. In this example the formula is looking for a single character other than “g”, “e”, or “n”:

REGEX(String, “[^gen]”)

Note: the carat is a special character that has dual meaning. When it is used within a character class it means negation. When it it used outside of a character class it signifies the beginning of the string (see further down in the post for this).


Grouping

grouping

Characters can be grouped together by including them between parentheses. In this example the string “bouncy ” is grouped and the entire function matches the string “red bouncy ball”:

REGEX(String, “red (bouncy )ball”)

 


Wildcard Characters and Repeating Pattern Quantifiers

Wildcard-80lv

When you need to match any character or a given token zero or more times you can use wildcard characters and repeating patterns:

  • . (period) – Match any character except a line terminator
  • ? (question mark) – Match the preceding token 0 or 1 times
  • * (asterisk) – Match the preceding token 0 or more times
  • + (plus sign) – Match the preceding token 1 or more times
  • {n} – Match the preceding token n times
  • {n,} – Match the preceding token n or more times
    • Note: {0,} is equivalent to * and {1,} is equivalent to +
  • {n,m} – Match the preceding token at least n times and no more than m times
    • Note: {0,1} is equivalent to ?

Examples:

  • REGEX(String, “pe.k”) will match “peak”, “peek”, “pe2k”, etc.
  • REGEX(String, “goat(ee)?”) will match “goat” and “goatee”
  • REGEX(String, “red (bouncy )?ball”) will match “red ball” and “red bouncy ball”
  • REGEX(String, “tiger9*”) will match “tiger”, “tiger9”, “tiger99”, “tiger999”, etc.
  • REGEX(String, “(lily)+”) will match “lily”, “lilylily”, “lilylilylily”, etc.
  • REGEX(String, “(boo){2}”) will match “booboo”
  • REGEX(String, “7{2}”) will match “77”, “777”, “7777”, etc.
  • REGEX(String, “xo{3,4}”) will match “xoxoxo” and “xoxoxoxo”

“Greedy,” “Lazy,” and “Possessive” Quantifiers

When an engine parses a string with quantifiers (things that multiply characters) there are ways to help the engine be more efficient. This is where “greedy,” “lazy,” and “possessive” quantifiers come. While it’s mostly not necessary to know the difference, it is helpful when reading someone else’s Regex to know what these things mean.  See this site for an excellent explanation of quantifiers.

Greedy Quantifiers

The special characters ?, *, +, {n}, {n,}, and {n,m} are called “greedy quantifiers” because they match as many characters as possible.


Lazy Quantifiers

Quantifiers combined with the question mark create what are called “Lazy Quantifiers.”

  • ?? – once or not at all
  • *? – zero or more times
  • +? – one or more times
  • {n}? – exactly n times
  • {n,}? – at least n times
  • {n,m}? – at least n times and up to m times

At first blush at appears that there isn’t any difference. The difference lies in how the engine parses the string to look for a match. With greedy quantifiers the engine will try to match as many instances of the quantified token as possible. With lazy quantifiers the engine will try to match as few as needed.


Possessive Quantifiers

Quantifiers combined with the plus sign create what are called “Possessive Quantifiers.”

  • ?+ = once or not at all
  • *+ – zero or more times
  • ++ – one or more times
  • {n}+ – exactly n times
  • {n,}+ – at least n times
  • {n,m}+ – at least n times and up to m times

These quantifiers are helpful when you want a match must fail quickly if it doesn’t follow a particular pattern. Using possessive quantifiers can increase the efficiency of your Regex function.


Special Characters and Metacharacters

SesameCharacters

Special characters are characters that have a special meaning to the regex engine. Metacharacters are special characters used within a character class.

Special Characters

These are the special characters in Regex:

  • \ (backslash) – is combined with another character to mean something else
  • ^ (carat) – 1) denotes the beginning of a line or 2) negates a character class
  • $ (dollar sign) – denotes the end of a line
  • . (dot or period) – represents any single character except a line terminator
  • | (pipe or vertical bar) – is used as an “OR” operator
  • ? (question mark) – multiplies a token 0 or 1 time
  • * (asterisk) – multiplies a token 0 or more times
  • + (plus sign) – multiplies a token 1 or more times
  • – (dash or hyphen) – used for indicating a sequence (range) of characters
  • () (parentheses) – enclose character groups
  • [ (open bracket) – enclose character classes
  • { (open curly brace) – used when multiplying a token a specified number of times

Metacharacters

Any of the special characters except these, the metacharacters, can be used within a character class (a group of characters within brackets) without escaping them. Metacharacters have special meaning within a character class depending on where they are used. If used in their special places, they have special meaning and you would need to escape them in order to test for the actual character. If used in their not-so-special places, they do not need to be escaped.

These are the metacharacters in Regex:

  • ] (closing bracket)
    • it’s special place is anywhere after the first position in the character class or after the negating carat
    • Example: “[]xyz]” tests for a closing bracket and the letters “x”, “y”, and “z”
      • This can also be written “[xyz\]]”
    • Example: “[^]867]” tests for anything that is not a closing bracket or the digits “8”, “6”, or “7”
    • Example: “[tip\]] tests for a closing bracket and the letters “t”, “i”, and “p”
      • This can also be written “[[tip]”
  • \ (backslash)
    • the backslash must always be escaped within the character class if it’s a character you’re testing for
    • Example: “[+-$*\\]” tests for +, -, $, *, and \
  • ^ (carat)
    • it’s special place is right after the opening bracket. Placed anywhere else it is just a character
    • Example: “[^pex]” – Here the carat means “not”. This would test for anything except the characters “p”, “e”, and “x”
    • Example: “[This^Too]” – Here the carat is nothing more than a character
  • – (dash or hyphen)
    • The hyphen has three special places:
      • right after the opening bracket
      • right before the closing bracket
      • right after the negating carat
    • used anywhere else it is just a character

Escape Character

The escape character is the backslash (\) character. This means to represent any special character in a string you would need to put the backslash in front of it like this:

  • \\\\ – the backslash character
  • \^ – the carat character
  • \$ – the dollar sign character
  • \. – the period character

Backslash – Character Combinations

Certain characters combined with a backslash represent certain non-alphanumeric characters:

  • \\t – tab character
  • \\n – new line (linefeed) character (this is a line terminator character)
  • \\r – carriage return character (this is a line terminator character)
  • \\f – form feed character
  • \\a – alert (bell) character
  • \\e – escape character
  • \\cx – the control character corresponding to “x” (ctrl-B would be “\\cB”)

Predefined Character Classes

The following are predefined character classes you can use:

  • . (period) – matches any character except a line terminator
  • \\d – matches any digit (shorthand for “[0-9]”)
  • \\D – matches any non-digit (shorthand for “[^0-9]”)
  • \\s – a white space character (space, tab, new line, form feed, carriage return, and \\x0B)
  • \\w – a “word character” (shorthand for “[a-zA-z_0-9]”)
  • \\W a non-word character (shorthand for “[^a-zA-z_0-9]” or “[^\w]”)

Boundaries

fence

Boundaries are things like the beginning or end of a line or a word. These are the boundary characters you can use:

  • ^ – the beginning of a line
  • $ – the end of a line
  • \\b – a word boundary
  • \\B – a non-word boundary
  • \\A – the beginning of input
  • \\G – the end of the previous match
  • \\Z – the end of the input except for a final terminator
  • \\z – the end of the input

Some Sample Regex Formulas

  • California Drivers License
    • REGEX(Drivers_License__c, “([A-Z]\\d{7})?”)
    • Example matches:
      • C7768934
      • nothing (the ? means 0 or 1 item)
    • Checks for a capital letter followed by 7 digits
  • Credit card number
    • REGEX(Credit_Card__c, “(((\\d{4}-){3}\\d{4})|\\d{16})?”)
    • Example matches:
      • 1234-1234-1234-1234
      • 1234123412341234
      • nothing (a blank field)
  • Email address
    • email regex
  • Number between 100 and 99999
    • \\b[1-9][0-9]{2,4}\\b
  • US Phone Number
    • REGEX(Phone, “((\\([2-9]\\d{2}\\) ?[2-9]\\d{2}-\\d{4})|(([2-9][0-9]{2}-){2}\\d{4})|(([2-9][0-9]{2}\.){2}\\d{4})|([2-9]\\d{2}){2}\\d{4})?”)
    • Example matches:
      • (223)456-7890
      • (223) 456-7890
      • 223-456-7890
      • 223.456.7890
      • 2234567890
      • nothing
    • Interesting note: the area code and “exchange” code (the first three digits after the are code) cannot start with 0 or 1 (these numbers are reserved for special purposes). All the other digits can be any number from 0 to 9.
  • Social Security Number
    • REGEX(SSN__c, “((\\d{3}-\\d{2}-\\d{4})|\\d{9})?”)
    • Example matches:
      • 123-45-6789
      • 123456789
      • nothing
  • US Zip Code
    • REGEX(BillingPostalCode, “\\d{5}(-\\d{4})?”)
    • Example matches:
      • 95610
      • 84328-4484

Some Regex Resources

regex

  • Regular-Expressions.info – This site is a good place to learn more about Regex. Their tutorial jumps around a little bit, but for the most part is laid out fairly linearly.
  • Java.Util.Regex Class Pattern – This is the Regex pattern that Salesforce uses with the one exception that the backslash character must be escaped with a backslash since it’s a special character in Salesforce. This means that instead of writing \d to indicate a digit from 0 to 9 you would need to write \\d (the first backslash tells Salesforce to use the second one for just what it is, a backslash character and not a special character).
  • Regex101.com – This site is great for testing your Regex. You enter the expression in the top space and in the Test String space you place some strings. The window to the right of these spaces will tell you how the expression worked.
  • Regexr.com – This is another site for testing your Regex.
  • RegexLib.com – Browse for useful expressions and test your own expressions.
  • Regex Quantifiers – This site is an excellent tutorial on quantifiers and much, much more
Advertisements
 
1 Comment

Posted by on June 8, 2017 in Formulas, Visual Flow

 

Tags: ,

One response to “Regex in Salesforce

  1. Eustolia Cisar

    July 29, 2017 at 5:35 am

    Thanks a lot for posting this awesome article.I really liked your blog and will definitely share this on my Instagram.Thank you so much for a great post!

    Like

     

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: