7. Regular Expressions

7.1. script [regex-01]
In the PHP course, we used the following code to illustrate PHP 7 regular expressions:
<?php
// strict type for function parameters
declare (strict_types=1);
// regular expressions in PHP
// extract the different fields from a string
// the pattern: a sequence of digits surrounded by any characters
// we only want to extract the sequence of digits
$pattern = "/(\d+)/";
// compare the string to the pattern
comparePatternToString($pattern, "xyz1234abcd");
comparePatternToString($pattern, "12 34");
comparePatternToString($pattern, "abcd");
// the pattern: a sequence of digits surrounded by any characters
// We want the sequence of digits as well as the fields that follow and precede it
$pattern = "/^(.*?)(\d+)(.*?)$/";
// we match the string against the pattern
comparePattern2String($pattern, "xyz1234abcd");
comparePattern2String($pattern, "12 34");
comparePatternToString($pattern, "abcd");
// the pattern - a date in dd/mm/yy format
$pattern = "/^\s*(\d\d)\/(\d\d)\/(\d\d)\s*$/";
comparePattern2String($pattern, "10/05/97");
comparePattern2String($pattern, " 04/04/01 ");
compareTemplate2String($template, "5/1/01");
// the pattern - a decimal number
$pattern = "/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*/";
comparePattern2String($pattern, "187.8");
comparePattern2String($pattern, "-0.6");
comparePattern2String($pattern, "4");
comparePattern2String($pattern, ".6");
comparePattern2String($pattern, "4.");
compareModel2String($model, " + 4");
// end
exit;
// --------------------------------------------------------------------------
function compareTemplateToString(string $template, string $string): void {
// compares the string $string to the pattern $pattern
// compare the string to the pattern
$fields = [];
$match = preg_match($pattern, $string, $fields);
// display results
print "\nResults($pattern,$string)\n";
if ($match) {
for ($i = 0; $i < count($fields); $i++) {
print "fields[$i]=$fields[$i]\n";
}
} else {
print "The string [$string] does not match the pattern [$pattern]\n";
}
}
We convert this code to JavaScript as follows:
'use strict';
/// Regular expressions in JavaScript
// extract the different fields from a string
// the pattern: a sequence of digits surrounded by any characters
// we only want to extract the sequence of digits
let pattern = /(\d+)/;
// compare the string to the pattern
comparePatternToString(pattern, "xyz1234abcd");
comparePatternToString(pattern, "12 34");
comparePatternToString(pattern, "abcd");
// the pattern: a sequence of digits surrounded by any characters
// We want the sequence of digits as well as the fields that come before and after it
pattern = /^(.*?)(\d+)(.*?)$/;
// we match the string against the pattern
comparePatternToString(pattern, "xyz1234abcd");
comparePatternToString(pattern, "12 34");
comparePatternToString(pattern, "abcd");
// the pattern - a date in dd/mm/yy format
pattern = /^\s*(\d\d)\/(\d\d)\/(\d\d)\s*$/;
comparePatternToString(pattern, "10/05/97");
comparePatternToString(pattern, " 04/04/01 ");
comparePatternToString(pattern, "5/1/01");
// the pattern - a decimal number
pattern = /^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/;
comparePatternToString(pattern, "187.8");
comparePatternToString(pattern, "-0.6");
comparePatternToString(pattern, "4");
comparePatternToString(pattern, ".6");
comparePatternToString(pattern, "4.");
compareModelToString(model, " + 4");
// --------------------------------------------------------------------------
function comparePatternToString(pattern, string) {
// compares the string [string] to the pattern [pattern]
console.log(`----------- string=${string}, pattern=${pattern}`)
// compare the string to the pattern
const result1 = pattern.exec(string);
console.log(`comparison with exec=`, result1);
// another way to do it
const result2 = string.match(pattern);
console.log(`comparison with match=`, result2);
}
Comments
- PHP and JavaScript code are very similar;
- line 7: Note that in JavaScript, a regular expression is not a string but an object. Do not put quotes or apostrophes around the expression;
- Lines 41 and 44: There are two ways to achieve the same result;
Execution
[Running] C:\myprograms\laragon-lite\bin\nodejs\node-v10\node.exe -r esm "c:\Data\st-2019\dev\es6\javascript\regexp\regexp-01.js"
type of a regular expression: object
----------- string=xyz1234abcd, pattern=/(\d+)/
comparison with exec= [ '1234',
'1234',
index: 3,
input: 'xyz1234abcd',
groups: undefined ]
comparison with match= [ '1234',
'1234',
index: 3,
input: 'xyz1234abcd',
groups: undefined ]
----------- string=12 34, pattern=/(\d+)/
Comparison with exec= [ '12', '12', index: 0, input: '12 34', groups: undefined ]
comparison with match= [ '12', '12', index: 0, input: '12 34', groups: undefined ]
----------- string=abcd, pattern=/(\d+)/
comparison with exec= null
comparison with match= null
----------- string=xyz1234abcd, pattern=/^(.*?)(\d+)(.*?)$/
comparison with exec= [ 'xyz1234abcd',
'xyz',
'1234',
'abcd',
index: 0,
input: 'xyz1234abcd',
groups: undefined ]
comparison with match= [ 'xyz1234abcd',
'xyz',
'1234',
'abcd',
index: 0,
input: 'xyz1234abcd',
groups: undefined ]
----------- string=12 34, pattern=/^(.*?)(\d+)(.*?)$/
comparison with exec= [ '12 34',
'',
'12',
' 34',
index: 0,
input: '12 34',
groups: undefined ]
comparison with match= [ '12 34',
'',
'12',
' 34',
index: 0,
input: '12 34',
groups: undefined ]
----------- string=abcd, pattern=/^(.*?)(\d+)(.*?)$/
comparison with exec=null
comparison with match=null
----------- string=10/05/97, pattern=/^\s*(\d\d)\/(\d\d)\/(\d\d)\s*$/
comparison with exec= [ '10/05/97',
'10',
'05',
'97',
index: 0,
input: '10/05/97',
groups: undefined ]
comparison with match= [ '10/05/97',
'10',
'05',
'97',
index: 0,
input: '10/05/97',
groups: undefined ]
----------- string= 04/04/01 , pattern=/^\s*(\d\d)\/(\d\d)\/(\d\d)\s*$/
comparison with exec= [ ' 04/04/01 ',
'04',
'04',
'01',
index: 0,
input: '04/04/01',
groups: undefined ]
comparison with match= [ ' 04/04/01 ',
'04',
'04',
'01',
index: 0,
input: '04/04/01',
groups: undefined ]
----------- string=5/1/01, pattern=/^\s*(\d\d)\/(\d\d)\/(\d\d)\s*$/
comparison with exec=null
comparison with match=null
----------- string=187.8, pattern=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ '187.8',
'',
'187.8',
index: 0,
input: '187.8',
groups: undefined ]
comparison with match= [ '187.8',
'',
'187.8',
index: 0,
input: '187.8',
groups: undefined ]
----------- string=-0.6, pattern=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ '-0.6', '-', '0.6', index: 0, input: '-0.6', groups: undefined ]
comparison with match= [ '-0.6', '-', '0.6', index: 0, input: '-0.6', groups: undefined ]
----------- string=4, pattern=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ '4', '', '4', index: 0, input: '4', groups: undefined ]
comparison with match= [ '4', '', '4', index: 0, input: '4', groups: undefined ]
----------- pattern=.6, regex=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ '.6', '', '.6', index: 0, input: '.6', groups: undefined ]
comparison with match= [ '.6', '', '.6', index: 0, input: '.6', groups: undefined ]
----------- string=4., pattern=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ '4.', '', '4.', index: 0, input: '4.', groups: undefined ]
comparison with match= [ '4.', '', '4.', index: 0, input: '4.', groups: undefined ]
----------- string= + 4, pattern=/^\s*([+|-]?)\s*(\d+\.\d*|\.\d+|\d+)\s*$/
comparison with exec= [ ' + 4', '+', '4', index: 0, input: ' + 4', groups: undefined ]
comparison with match= [ ' + 4', '+', '4', index: 0, input: ' + 4', groups: undefined ]
The [regexp.exec] and [string.match] methods return the same results:
- [null] if there are no matches between the string and its pattern;
- an array t, if there is a match with:
- t[0]: the string matching the pattern;
- t[1]: the string matching the first parenthesis of the pattern;
- t[2]: the string matching the second parenthesis of the pattern;
- …
- t[input]: the entire string in which the pattern was searched for;
7.2. script [regexp-02]
Sometimes you don’t want to extract elements from the tested string, but only want to know if it matches the pattern:
'use strict';
/// regular expressions in JavaScript
// extract the different fields from a string
// the pattern: a sequence of digits surrounded by any characters
// we only want to extract the sequence of digits
let pattern = /\d+/;
console.log("Type of a regular expression: ", typeof (pattern));
// compare the string to the pattern
comparePatternToString(pattern, "xyz1234abcd");
comparePatternToString(pattern, "12 34");
comparePatternToString(pattern, "abcd");
// the pattern: a sequence of digits surrounded by arbitrary characters
// We want the sequence of digits as well as the fields that follow and precede it
pattern = /^.*?\d+.*?$/;
// we compare the string to the pattern
comparePatternToString(pattern, "xyz1234abcd");
comparePatternToString(pattern, "12 34");
comparePatternToString(pattern, "abcd");
// the pattern - a date in dd/mm/yy format
pattern = /^\s*\d\d\/\d\d\/\d\d\s*$/;
comparePatternToString(pattern, "10/05/97");
comparePatternToString(pattern, " 04/04/01 ");
comparePatternToString(pattern, "5/1/01");
// the pattern - a decimal number
pattern = /^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/;
comparePatternToString(pattern, "187.8");
compareModelToString(model, "-0.6");
compareModelToString(model, "4");
compareModelToString(model, ".6");
compareModelToString(model, "4.");
compareModelToString(model, " + 4");
// --------------------------------------------------------------------------
function comparePatternToString(pattern, string) {
// test
const matches = pattern.test(string);
// compare the string [string] to the pattern [pattern]
console.log(`----------- string=${string}, pattern=${pattern}, matches=${matches}`);
}
Comments
- [regexp-02] uses the code from [regexp-01] with the following differences:
- we do not want to extract elements from the tested string. Therefore, we have removed the parentheses from the regular expressions used;
- Line 40: We use the [Regexp.test] method to determine if a string matches a regular expression;
The results of the execution are as follows:
[Running] C:\myprograms\laragon-lite\bin\nodejs\node-v10\node.exe -r esm "c:\Data\st-2019\dev\es6\cours\regexp\regexp-02.js"
regular expression type: object
----------- string=xyz1234abcd, pattern=/\d+/, match=true
----------- string=12 34, pattern=/\d+/, match=true
----------- string=abcd, pattern=/\d+/, match=false
----------- string=xyz1234abcd, pattern=/^.*?\d+.*?$/, matches=true
----------- string=12 34, pattern=/^.*?\d+.*?$/, matches=true
----------- string=abcd, pattern=/^.*?\d+.*?$/, matches=false
----------- string=10/05/97, pattern=/^\s*\d\d\/\d\d\/\d\d\s*$/, matches=true
----------- string= 04/04/01 , pattern=/^\s*\d\d\/\d\d\/\d\d\s*$/, matches=true
----------- string=5/1/01, pattern=/^\s*\d\d\/\d\d\/\d\d\s*$/, matches=false
----------- string=187.8, pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
----------- string=-0.6, pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
----------- string=4, pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
----------- string=.6, pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
----------- string=4., pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
----------- string= + 4, pattern=/^\s*[+|-]?\s*\d+\.\d*|\.\d+|\d+\s*$/, matches=true
[Done] exited with code=0 in 0.269 seconds