HyperMath

Pattern Finding

Pattern Finding

Previous topic Next topic Expand/collapse all hidden text  

Pattern Finding

Previous topic Next topic JavaScript is required for expanding text JavaScript is required for the print function  

StrFind((s, pattern [, init ])

or

string.find(s, pattern [, init ])

Pattern finding finds the first occurrence of the pattern in the string passed.  If an instance of the pattern is found, a pair of values representing the position (one-based) of the start and end of the string is returned.  If the pattern cannot be found, nil is returned.  The search is case sensitive.  There are some cases when the patterns are returned as well.

Option

Description

s

String to perform search in subject string.

pattern

The pattern to look for in the subject string.

init(optional)

The start position of search in the string, defaults to the beginning:

> print(StrFind("Hello HyperMath user", "HyperMath"))

7       15

> print(StrFind("Hello HyperMath user", "banana")) // won’t find it

nil

You can optionally specify where to start the search with a third argument.

> print(StrFind("Hello HyperMath user", "HyperMath", 1)) // start at first character

7       15

> print(StrFind("Hello HyperMath user", "HyperMath", 8))  // "HyperMath" won’t be found after character 8

nil

> print(StrFind("talk and talk", "talk",4)) // the second instance

10       13

> print(StrFind("Hello HyperMath user", "e", -5))   // first "e" 5 characters from the end

19      19

Character Classes

The above example is somewhat restrictive.  For example, if you want to look for some five letter words that begin with "f", use the special character class %a:

> print(StrFind("Find the first", "f%a%a%a%a"))

10                14

In this example, only first was found since it's a five letter word that starts with "f".  The search pattern used was f%a%a%a%a.  The %a represents a wildcard character, which can represent any letter.  Patterns contain sequences of normal characters and pattern formatters, which have special meanings.

A character class represents a set of characters.  There are several other character classes representing subsets of characters, such as numbers, letters, uppercase characters, lowercase characters and so on.  These sets have the format %X where X is a letter representing the class.  The following table lists all available character classes.

Option

Definition

.

all characters

%a

letters

%c

control characters

%d

digits

%l

lower case letters

%p

punctuation characters

%s

space characters

%u

upper case letters

%w

alphanumeric characters

%x

hexadecimal digits

%z

the character that represents 0

Here are some examples:

> print(StrFind("this is chapter 7", "%d") )  // %d finds the first digit

17        17

> print(StrFind("UPPER lower", "%l"))         // %l finds first lowercase character

7                7

> print(StrFind("UPPER lower", "%u"))         // %u finds first uppercase character

1                1

Just as you can look for strings in strings, you can look for sequences of characters from the character classes, and you can mix these with regular strings.  For example:

> print(StrFind("UPPERlower", "%u%l"))   // upper followed by lower

5                6

> print(StrFind("dimension of 5x3", "%d%a%d")) // digit letter digit

14        16

Sets

As well as the predefined character classes, of the form %X, you can also explicitly define you own sets.  These are represented as [set], where set is a list of characters for which to look.

> print(StrFind("HyperMath", "[abcde]")) // look for one of "abcde"

4                4

Use the hyphen, -, to denote a range of characters.

> print(StrFind("HyperMath", "[a-e]")) // equivalent to "abcde"

4                4

You can use character classes in sets:

> print(StrFind("one hr is 60 min", "[%dabc]"))

11        11

Sets can also be used in the form [^set], which finds any of the characters not in the set listed.  For example:

> print(StrFind("HyperMath", "[^Hm]") )        // we don't want H or m

2                2

> print(StrFind("one hr is 60 min", "[^%l%s]")) // not a lowercase or space

11        11

Repetition

Sometimes you want to look for patterns and you don't know the number of characters they contain.  Support for searching for variable length patterns is supplied using the characters: *, + and -.

* looks for 0 or more repetitions of the previous pattern element.  For example:

·> print(StrFind("room", "rm*")) // looks for ‘r’ optionally followed by zero or more ‘m’, finds ‘r’ only

1                1

> print(StrFind("room", "ro*")) // will find "roo"

1                3

> print(StrFind("room", "ro*m")) // will find the entire string

1                4

> print(StrFind("room222", "room%d*")) // "room" optionally followed by digits, hence finds the entire string

1                7

+ looks for 1 or more repetitions of the previous pattern element.  For example:

> print(StrFind("room", 'rm+')) // won't be found as at least one ‘m’ needs to follow ‘r’

nil

> print(StrFind("room", "room%d+")) // "room" needs to be followed by at least one digit, hence no match

nil

> print(StrFind("room222", "room%d+")) // finds the entire string

1                7

> print(StrFind("room222A", "room%d+")) // same result as above

1                7

- looks for 0 or more repetitions of the previous pattern element.  This differs from * because - it looks for the shortest sequence of elements, whereas * looks for the longest sequence.  For example, - this finds the first sequence that includes a pair of braces {},

> print(StrFind("{this} and {that}", "{.-}"))

1                6

Using an * finds the entire string instead since it has opening and closing braces at the ends.

Captures

The capture mechanism allows for extracting parts of the subject string that patches parts of the pattern.  Specify a capture by writing the parts of the pattern that you want to capture between parentheses.  In addition to the position of the match, it also returns the extracted strings.  Suppose you want to extract the hour, minute, and seconds information from a string that records the time in the format hr:min:sec.

 > print(StrFind("02:33:29","(%d+):(%d+):(%d+)"))

 1        8        02        33        29

Since parentheses have special meanings, how do you extract something with parentheses in it?  The answer is to put the parentheses that are part of the pattern after the escape character %.

 > print(StrFind("Extract (this)", "(%(this%))"))

 9        14        (this)

hmtoggle_plus1FindStr(str, subject)

Similar to StrFind but finds the start location of all occurrences of a smaller string within a larger string  The indices (zero-based) are returned in a matrix.  The match is for the exact string and returns an empty matrix if there are none.

str

String to find in the subject string.

subject

The subject string.

 

> s =FindStr("abc",'ABCabcXYZabc'); print(s)

[Matrix] 2 x 1

             4

             10