You Should Learn Regex
Regular Expressions (Regex): One of the most powerful, widely applicable, and sometimes intimidating techniques in software engineering. From validating email addresses to performing complex code refactors, regular expressions have a wide range of uses and are an essential entry in any software engineer's toolbox.
What is a regular expression?
A regular expression (or regex, or regexp) is a way to describe complex search patterns using sequences of characters.
The complexity of the specialized regex syntax, however, can make these expressions somewhat inaccessible. For instance, here is a basic regex that describes any time in the 24-hour HH/MM format.
\b([01]?[0-9]|2[0-3]):([0-5]\d)\b
If this looks complex to you now, don't worry, by the time we finish the tutorial understanding this expression will be trivial.
Learn once, write anywhere
Regular expressions can be used in virtually any programming language. A knowledge of regex is very useful for validating user input, interacting with the Unix shell, searching/refactoring code in your favorite text editor, performing database text searches, and lots more.
In this tutorial, I'll attempt to give an provide an approachable introduction to regex syntax and usage in a variety of scenarios, languages, and environments.
This web application is my favorite tool for building, testing, and debugging regular expressions. I highly recommend that you use it to test out the expressions that we'll cover in this tutorial.
The source code for the examples in this tutorial can be found at the Github repository here - https://github.com/triestpa/You-Should-Learn-Regex
0 - Match Any Number Line
We'll start with a very simple example - Match any line that only contains numbers.
^[0-9]+$
Let's walk through this piece-by-piece.
^
- Signifies the start of a line.[0-9]
- Matches any digit between 0 and 9+
- Matches one or more instance of the preceding expression.$
- Signifies the end of the line.
We could re-write this regex in pseudo-English as [start of line][one or more digits][end of line]
.
Pretty simple right?
We could replace
[0-9]
with\d
, which will do the same thing (match any digit).
The great thing about this expression (and regular expressions in general) is that it can be used, without much modification, in any programing language.
To demonstrate we'll now quickly go through how to perform this simple regex search on a text file using 16 of the most popular programming languages.
We can use the following input file (test.txt
) as an example.
1234
abcde
12db2
5362
1
Each script will read the test.txt
file, search it using our regular expression, and print the result ('1234', '5362', '1'
) to the console.
Language Examples
0.0 - Javascript / Node.js / Typescript
const fs = require('fs')
const testFile = fs.readFileSync('test.txt', 'utf8')
const regex = /^([0-9]+)$/gm
let results = testFile.match(regex)
console.log(results)
0.1 - Python
import re
with open('test.txt', 'r') as f:
test_string = f.read()
regex = re.compile(r'^([0-9]+)