Regular Expressions

It's been a common complaint of mine that Regular Expressions are some black box magic that is as useful as it is incomprehensible. As part of my #100DaysOfCode challenge, I spent the last three days following the Free Code Camp Crash Course in RegEx for Beginners, linked below.

What follows are my rough notes that detail what characters in a RegEx mean, how they can be used, and some examples. Note: These notes are very sloppy and this is not a complete guide to RegEx. Instead, this is a very basic introduction.

At the end of the day, a Regular Expression is just a way to validate and/or manipulate a string. That said, the syntax for it is incredibly opaque. I'm hoping this guide can be a resource for me to start being able to understand RegEx better.

Glossary

  • Regular Expression: defines a search pattern that can be used to search for things in a string. You create Patterns to help you do matching within the string.
  • Pattern: A set of rules and conditions to help you define what you are searching for within a string. This can include letters, numbers, operators (like an "or" operator), ranges, etc.
  • Flag: A modifier to a Pattern that can customize the returned search results. For example, you can ignore casing or return all matches within a string, rather than just the first match.
  • Wildcard: Will match anything. Uses the period character (.).
  • Negated Character Sets: A way of modifying your Pattern that tells the Regular Expression which characters not to match. Uses the carrot (^).
  • Plus Character (+)lets you match characters that appear one or more times.
  • Asterisk Character (*) lets you match characters that appear zero or more times.
  • Question Mark Character: sets an optional character.
  • Greedy Match: finds the longest possible part of a string that fits RegEx pattern and returns that.
  • Lazy Match: finds the shortest possible part of a string that fits RegEx pattern and returns that.
  • Shorthand Character Class: special patterns used to match a wide range of things. For example \w matches the entire alphabet (lower- and upper-case), 0-9, and underscores.
  • Quantity Specifiers: tell the regex the number of instances of a character set to match, either an exact number or a combination of lower/upper bounds. Uses curly brackets. {lower,upper} or {number}.
  • Positive Lookaheads: Will return a match that is followed by a specified character. (think: Q with a U). /q(?=u)/
  • Negative Lookaheads: Will return a match that is not followed by a specified character. (think: Q without a U). /q(!=u)/
  • Capture Group: () groups a substring that we are searching for. Can be used for Repetition and Replace.

Basic RegEx example:

const websiteName = "Code and Tacos";
let regex = /Taco/; // This is the pattern

JavaScript has multiple ways to use RegEx. One way to do so is the test method, which applies the pattern to the given string and returns true or false if the pattern matches.

const websiteName = "Code and Tacos";
const regex = /Taco/;
console.log(regex.test(websiteName));
// expected result: true

A condition like /Taco/ is a way to match things literally and with case sensitivity. Since such strict matching can only get us so far, we need to use some special operators and characters to tap into more powerful and variable RegEx capabilities. Let's take the Or operator (|) for example:

const websiteName = "Code and Tacos";
const regex = /Burrito|Nachos|Taco|Queso/;
console.log(regex.test(websiteName));
// expected result: true

The above RegEx will search for all of the provided words in the pattern: Burrito, Nachos, Taco, or Queso, and if any of those are found, it will result true.

But how can we ignore the case (upper- or lower-) of the letters we're testing? To do so, we need to use a Flag. Let's take a look at the i flag, which ignores letter case:

const websiteName = "Code and Tacos";
const regex = /tacos/i; // the `i` after the final slash is the flag
console.log(regex.test(websiteName));
// expected result: true

Extracting Matches

All we have done so far is return true or false based on tests run by our RegEx. This can be really useful if doing some conditional ternary operators based on the strings you're passing around -- if the string contains "Tacos" do one thing, if it doesn't contain "Tacos", do a different thing. Looking at you, Codewars.

But what if we want to extract the string we've matched so that we can perform some kind of work on that specific portion of the string?

const websiteName = "Code and Tacos";
const regex = /tacos/i;
const result = websiteName.match(regex);
console.log(result);
// expected result: [ 'Tacos', index: 9, input: 'Code and Tacos', groups: undefined ]

We can also use multiple flags that do various things. For example, the g flag will extract all matches of a certain pattern and return them in an array. For example:

const websiteName = "Code and Tacos and Tacos and TACOS";
const regex = /tacos/ig;
const result = websiteName.match(regex);
console.log(result);
// expected result: [ 'Tacos', 'Tacos', 'TACOS' ]

In RegEx, a period . is a wildcard.

const websiteName = "Code and Tacox and Tacopp and TACOS";
const regex = /taco./ig;
const result = websiteName.match(regex);
console.log(result);
// expected result: [ 'Tacox', 'Tacop', 'TACOS' ]

You can also match based on a pre-defined group of characters. In the following example, the RegEx will match any character group that starts with a letter 'b', ends with a letter 'g', and contains either 'a', 'i', 'o', or 'u' in between those characters. Notice that the word 'bang' is not in the resulting match.

const sentence = "A big bug bit me and it made me bang my bog, so I put it in a bag.";
const bgRegex = /b[aiou]g/ig;
const result = sentence.match(bgRegex);
console.log(result);
// expected reult: [ 'big', 'bug', 'bog', 'bag' ]

You can also use a range of letters. For example:

let alphabetRegex = /[a-z]/ig; // will match against the whole lowercase alphabet
let elemenopee = /[l-p]/ig; // only matches l, m, n, o, or p

Let's match a range of numbers as well as a range of letters:

const quote = "Blueberry 3.14592653s are delicious";
const myRegex = /[2-6h-s]/ig; // will match numbers 2 through 6 and letters h through s;
const result = quote.match(myRegex);
console.log(result);
// expected result: [ 'l', 'r', 'r', '3', '4', '5', '2', '6', '5', '3', 's', 'r', 'l', 'i', 'i', 'o', 's' ]

So what if you want to account for characters you do not want to match? These are called Negated Character Sets (^). The following example matches everything except numbers and vowels:

const quote = "3 blind mice";
const myRegex = /[^0-9aeiou]/ig; // will match numbers and vowels
const result = quote.match(myRegex);
console.log(result);
// expected result: [ ' ', 'b', 'l', 'n', 'd', ' ', 'm', 'c' ]

The Carrot character (^) also lets you test at the beginning when it is outside of the bracket/range test:

const rickyAndCal = "Cal and Ricky go to the store";
const myRegex = /^Cal/;
console.log(myRegex.test(rickyAndCal));
// expected result: true

The Dollar Sign character ($) lets you match the end of the string:

const rickyAndCal = "Cal and Ricky go to the store";
const myRegex = /ore$/;
console.log(myRegex.test(rickyAndCal));
// expected result: true

The Plus Character (+) lets you match characters that appear one or more times.

const quote = "Mississipspi";
const myRegex = /[s+]/g; // will match all instances in which there are one or more "s" characters.
const result = quote.match(myRegex);
console.log(result);
// expected result: [ 'ss', 'ss', 's' ];
// NOTE: when run in my terminal with Node, I get ['s', 's', 's', 's', 's'] so be cautious here.

The Asterisk Character (*) lets you match characters that appear zero or more times.

const quote1 = "Goooooooooooal";
const quote2 = "Get over here";
const quote3 = "Over the moon";
const myregex = /go*/ig;

const result1 = quote1.match(myregex);
console.log(result1);
// expected result: "Gooooooooooo"

const result2 = quote2.match(myregex);
console.log(result2);
// expected result: "G"

const result3 = quote3.match(myregex);
console.log(result3);
// expected result: null

const chewieQuote = "Aaaaaaaaaaaaargh";
const chewieRegex = /Aa*/;
const chewieResult = chewieQuote.match(chewieRegex);
console.log(chewieResult);
// expected result: "Aaaaaaaaaaaaa

Greedy Match finds the longest possible part of a string that fits RegEx pattern and returns that.

Greedy Matching example:

const boat = "titanic";
const regex1 = /t[a-z]*i/;
const boatResult = boat.match(regex1);
// expected result: "titani";
console.log(boatResult);

const text = "<h1>Winter is coming</h1>";
const myRegex = /<.*>/;
const winterfellResult = text.match(myRegex);
// expected result: "<h1>Winter is coming</h1>";
console.log(winterfellResult);

Lazy Match finds the shortest possible part of a string that fits RegEx pattern and returns that. Use a question mark (?) to instruct RegEx to go with Lazy Matching.

Lazy Matching example:

// Lazy Matching:
const boat = "titanic";
const regex1 = /t[a-z]*?i/;
const boatResult = boat.match(regex1);
console.log(boatResult);
// expected result: "ti";

const text = "<h1>Winter is coming</h1>";
const myRegex = /<.*?>/;
const winterfellResult = text.match(myRegex);
console.log(winterfellResult);
// expected result: "<h1>";

Note: RegEx defaults to Greedy Matching.

Shorthand Character Classes are special patterns used to match a wide range of things. For example \w matches the entire alphabet (lower- and upper-case), 0-9, and underscores. For example:

const quote = "The 5 boxing wizards jump quickly.";
const myRegex = /\w/g;
const result = quote.match(myRegex);
const length = quote.match(myRegex).length; // length of the whole string, excluding the spaces and the period
console.log(result);
// expected result: [ 'T', 'h', 'e', '5', 'b', 'o', 'x', 'i', 'n', 'g', 'w', 'i', 'z', 'a', 'r', 'd', 's', 'j', 'u', 'm', 'p', 'q', 'u', 'i', 'c', 'k', 'l', 'y' ]
console.log(length);
// expected result: 28

To get the opposite, use an uppercase \W to exclude letters, numbers, and underscores.

const quote = "The 5 boxing wizards jump quickly.";
const myRegex = /\W/g;
const result = quote.match(myRegex);
console.log(result);
// expected result: [ ' ', ' ', ' ', ' ', ' ', '.' ]
const length = quote.match(myRegex).length; // length of the whole string, excluding letters, numbers, and underscores
console.log(result);
// expected result: 6

\d will return all numbers (digits).

\D will return all non-numbers (digits).

Exercise:

  1. If there are numbers, they must be at the end
  2. Letters can be lowercase and uppercase
  3. At least two characters long. Two-letter names cannot have numbers.
const username = "JackOfAllTrades2";
const usernameCheck = /^[A-Za-z]{2,}\d*$/g;
const result = usernameCheck.test(username);
console.log(result);

const whatever = username.match(usernameCheck);
console.log(whatever);

\s will match whitespace:

const phrase = "The rain in Spain falls mainly in the plains";
const myRegex = /\s/g;
const result = phrase.match(myRegex);
console.log(result);
// expected result: [ ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ' ]

\S will match non-whitespace:

What if we want to specify the upper and lower number of matches we're looking for? As you saw in the exercise above, we can use Quantity Specifiers, which use curly brackets: {lower,upper}. You can leave one side blank.

const ohString = "Ohhhhhhh no";
const ohRegex = /Oh{2,8} no/; // Looks for at least two, at most eight h's.
const ohhhhhhhhhRegex = /Oh{8,}/; // Looks for at least eight h's.
const result = ohRegex.test(ohString);
console.log(result);
// expected result: true
const resulthhhhhhhh = ohhhhhhhhhRegex.test(ohString);
console.log(result);
// expected result: false

We can also use Quantity Specifiers to specify the exact number of matches.

const screaming = "Timmmmber";
const checking = /Tim{4}ber/;
const result = checking.test(screaming);
console.log(result);
// expected result: true

We can use a question mark ? to set an optional character.

const word = "favorite";
const wourd = "favourite";
const wosrd = "favosrite";
const myRegex = /favou?rite/;
const result = myRegex.test(word);
console.log(result);
// expected result: true
const resultu = myRegex.test(wourd);
console.log(resultu);
// expected result: true
const results = myRegex.test(wosrd);
console.log(results);
// expected result: false

There are Positive Lookaheads and Negative Lookaheads that will return true or false based on what the next letter is (for example: q without u):

const quit = "quit";
const noquit = "qit";
const quRegex = /q(?=u)/;
const qRegex = /q(?!u)/;

const quResult = quRegex.test(quit);
console.log(quResult);
// expected result: true
const quReturn = quit.match(quRegex);
console.log(quReturn);
// expected result: "q"

const qResult = qRegex.test(noquit);
console.log(qResult);
// expected result: true
const qReturn = noquit.match(qRegex);
console.log(qReturn);
// expected result: "q"

Exercise:

// 1. Password should be at least 5 characters long
// 2. Password should include at least two consecutive vowels

const sampleword = "astronaut22";
const pwRegex = /(?=\w{5,})(?=\D*\d{2,})/;
const result = pwRegex.test(sampleword);
console.log(result);
// expected result: true

We can use a Capture Group , () to group a substring that we are searching for, and then also use repetition. For example:

const repeat = "regex regex";
const myRegex = /(\w+)\s\1/; // looks for a group of any letters, numbers, or underscores, followed by a space, then a repetition of that same group of letters, numbers or underscores.
const result = myRegex.test(repeat);
console.log(result);
// expected result: true

const repeatNum = "42 42 42";
const myRegex = /^(\d+)\s\1\s\1$/;
const result = myRegex.test(repeatNum);
console.log(result);
// expected result: true

You can search and replace text on a string using the .replace() function.

// Example 1:
const wrongText = "The sky is silver.";
const silverRegex = /silver/;
const correctText = wrongText.replace(silverRegex, "blue");
console.log(correctText);
// expected result: "The sky is blue.";

// Example 2:
console.log("Code Camp".replace(/(\w+)\s(\w+)/, '$2 $1'));
expected result: "Camp Code";

Exercise: Remove the whitespace from the beginning and end of this text.

const hello = "    Hello, World!    ";
const wsRegex = /^\s+|\s+$/g;
const result = hello.replace(wsRegex, '');
console.log(result);