pyregexp package#

Submodules#

pyregexp.engine module#

engine.py uml diagram — *UML of all pyregexp.engine classes.*#

Module containing the RegexEngine class.

The RegexEngine class implements a regular expressions engine.

Example

Matching a regex with some test string:

reng = RegexEngine()
result, consumed = reng.match(r"a+bx", "aabx")

class pyregexp.engine.RegexEngine[source]#

Bases: object

Regular Expressions Engine.

This class contains all the necessary to recognize regular expressions in a test string.

match(re: str, string: str, return_matches: bool = False, continue_after_match: bool = False, ignore_case: int = 0) → Union[Tuple[bool, int, List[Deque[pyregexp.match.Match]]], Tuple[bool, int]][source]#

Searches a regex in a test string.

Searches the passed regular expression in the passed test string and returns the result.

It is possible to customize both the returned value and the search method.

The ignore_case flag may cause unexpected results in the returned number of matched characters, and also in the returned matches, e.g. when the character ẞ is present in either the regex or the test string.

Parameters

re (str) – the regular expression to search
string (str) – the test string
return_matches (bool) – if True a data structure containing the matches - the whole match and the subgroups matched (default is False)
continue_after_match (bool) – if True the engine continues matching until the whole input is consumed (default is False)
ignore_case (int) – when 0 the case is not ignored, when 1 a “soft” case ignoring is performed, when 2 casefolding is performed. (default is 0)

Returns

A tuple containing whether a match was found or not, the last matched character index, and, if return_matches is True, a list of deques of Match, where each list of matches represents in the first position the whole match, and in the subsequent positions all the group and subgroups matched.

pyregexp.lexer module#

lexer.py uml diagram — *UML of all pyregexp.lexer classes.*#

class pyregexp.lexer.Lexer[source]#

Bases: object

Lexer for the pyregexp library.

This class contains the method to scan a regular expression string producing the corresponding tokens.

scan(re: str) → List[pyregexp.tokens.Token][source]#

Regular expressions scanner.

Scans the regular expression in input and produces the list of recognized Tokens in output. It raises an Exception if there are errors in the regular expression.

Parameters: re (str) – the regular expression to scan
Returns: the list of tokens recognized in the passed regex
Return type: List[Token]

pyregexp.match module#

match.py uml diagram — *UML of all pyregexp.match classes.*#

class pyregexp.match.Match(group_id: int, start_idx: int, end_idx: int, string: str, name: str)[source]#

Bases: object

Contains the information of a match in a regular expression.

pyregexp.pyrser module#

pyrser.py uml diagram — *UML of all pyregexp.pyrser classes.*#

class pyregexp.pyrser.Pyrser[source]#

Bases: object

Regular Expression Parser.

Pyrser instances can parse regular expressions and return the corresponding AST.

parse(re: str) → pyregexp.re_ast.RE[source]#

Parses a regular expression.

Parses a regex and returns the corresponding AST. If the regex contains errors raises an Exception.

Parameters: re (str) – a regular expression
Returns: the root node of the regular expression’s AST
Return type: RE

pyregexp.re_ast module#

re_ast.py uml diagram — *UML of all pyregexp.re_ast classes.*#

class pyregexp.re_ast.ASTNode[source]#

Bases: object

AST nodes base class.

Abstract Syntax Tree classes hierarchy base class.

class pyregexp.re_ast.Element(match_ch: Optional[str] = None)[source]#

Bases: pyregexp.re_ast.LeafNode

AST Element.

Specialization of the LeafNode class. This class models the elements of a regex.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.EndElement[source]#

Bases: pyregexp.re_ast.LeafNode

AST EndElement.

Inherits from LeafNode and models the match-end-element behavior.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.GroupNode(children: Deque[pyregexp.re_ast.ASTNode], capturing: bool = False, group_name: Optional[str] = None, group_id: int = - 1)[source]#

Bases: pyregexp.re_ast.ASTNode

AST GroupNode.

Inherits from ASTNode and models the group in a regex.

is_capturing() → bool[source]#

Returns whether the GroupNode is capturing.

Returns: True if the group is capturing, False otherwise
Return type: bool

class pyregexp.re_ast.LeafNode[source]#

Bases: pyregexp.re_ast.ASTNode

AST class defining the leaf nodes.

Every leaf node inherits from this class.

is_match(ch: Optional[str] = None, str_i: Optional[int] = None, str_len: Optional[int] = None) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.NotNode(child: pyregexp.re_ast.ASTNode)[source]#

Bases: pyregexp.re_ast.ASTNode

AST NotNode.

Inherits from ASTNode and models the not-node behavior.

class pyregexp.re_ast.OrNode(left: pyregexp.re_ast.ASTNode, right: pyregexp.re_ast.ASTNode)[source]#

Bases: pyregexp.re_ast.ASTNode

AST OrNode.

Inherits from ASTNode and models the or-nodes, that is the nodes that divides the regex into two possible matching paths.

class pyregexp.re_ast.RE(child: pyregexp.re_ast.ASTNode, capturing: bool = False, group_name: str = 'RegEx')[source]#

Bases: pyregexp.re_ast.ASTNode

Entry point of the AST.

This class acts as the entry point for a regular expression’s AST.

is_capturing() → bool[source]#

class pyregexp.re_ast.RangeElement(match_str: str, is_positive_logic: bool = True)[source]#

Bases: pyregexp.re_ast.LeafNode

AST RangeElement.

Specialization of the LeafNode class modeling the range-element behavior, that is that it matches with more than one character.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.SpaceElement[source]#

Bases: pyregexp.re_ast.Element

AST SpaceElement.

Specialization of the element class to model the match-space behavior.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.StartElement[source]#

Bases: pyregexp.re_ast.LeafNode

AST StartElement.

Inherits from LeafNode and models the match-start-element behavior.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters

ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length

Returns

represents whether there is a match between the node and the passed parameters or not.

Return type

bool

class pyregexp.re_ast.WildcardElement[source]#

Bases: pyregexp.re_ast.Element

AST WildcardElement.

Specialization of the Element class to model the wildcard behavior.

is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) → bool[source]#

Returns whether the passed inputs matches with the node.

For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.

Parameters