pyregexp package#
Submodules#
pyregexp.engine module#
Module containing the RegexEngine class.
The RegexEngine class implements a regular expressions engine.
Example
Matching a regex with some test string:
reng = RegexEngine()
result, consumed = reng.match(r"a+bx", "aabx")
- class pyregexp.engine.RegexEngine[source]#
Bases:
object
Regular Expressions Engine.
This class contains all the necessary to recognize regular expressions in a test string.
- match(re: str, string: str, return_matches: bool = False, continue_after_match: bool = False, ignore_case: int = 0) Union[Tuple[bool, int, List[Deque[pyregexp.match.Match]]], Tuple[bool, int]] [source]#
Searches a regex in a test string.
Searches the passed regular expression in the passed test string and returns the result.
It is possible to customize both the returned value and the search method.
The ignore_case flag may cause unexpected results in the returned number of matched characters, and also in the returned matches, e.g. when the character ẞ is present in either the regex or the test string.
- Parameters
re (str) – the regular expression to search
string (str) – the test string
return_matches (bool) – if True a data structure containing the matches - the whole match and the subgroups matched (default is False)
continue_after_match (bool) – if True the engine continues matching until the whole input is consumed (default is False)
ignore_case (int) – when 0 the case is not ignored, when 1 a “soft” case ignoring is performed, when 2 casefolding is performed. (default is 0)
- Returns
A tuple containing whether a match was found or not, the last matched character index, and, if return_matches is True, a list of deques of Match, where each list of matches represents in the first position the whole match, and in the subsequent positions all the group and subgroups matched.
pyregexp.lexer module#
- class pyregexp.lexer.Lexer[source]#
Bases:
object
Lexer for the pyregexp library.
This class contains the method to scan a regular expression string producing the corresponding tokens.
- scan(re: str) List[pyregexp.tokens.Token] [source]#
Regular expressions scanner.
Scans the regular expression in input and produces the list of recognized Tokens in output. It raises an Exception if there are errors in the regular expression.
- Parameters
re (str) – the regular expression to scan
- Returns
the list of tokens recognized in the passed regex
- Return type
List[Token]
pyregexp.match module#
pyregexp.pyrser module#
- class pyregexp.pyrser.Pyrser[source]#
Bases:
object
Regular Expression Parser.
Pyrser instances can parse regular expressions and return the corresponding AST.
- parse(re: str) pyregexp.re_ast.RE [source]#
Parses a regular expression.
Parses a regex and returns the corresponding AST. If the regex contains errors raises an Exception.
- Parameters
re (str) – a regular expression
- Returns
the root node of the regular expression’s AST
- Return type
pyregexp.re_ast module#
- class pyregexp.re_ast.ASTNode[source]#
Bases:
object
AST nodes base class.
Abstract Syntax Tree classes hierarchy base class.
- class pyregexp.re_ast.Element(match_ch: Optional[str] = None)[source]#
Bases:
pyregexp.re_ast.LeafNode
AST Element.
Specialization of the LeafNode class. This class models the elements of a regex.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.EndElement[source]#
Bases:
pyregexp.re_ast.LeafNode
AST EndElement.
Inherits from LeafNode and models the match-end-element behavior.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.GroupNode(children: Deque[pyregexp.re_ast.ASTNode], capturing: bool = False, group_name: Optional[str] = None, group_id: int = - 1)[source]#
Bases:
pyregexp.re_ast.ASTNode
AST GroupNode.
Inherits from ASTNode and models the group in a regex.
- class pyregexp.re_ast.LeafNode[source]#
Bases:
pyregexp.re_ast.ASTNode
AST class defining the leaf nodes.
Every leaf node inherits from this class.
- is_match(ch: Optional[str] = None, str_i: Optional[int] = None, str_len: Optional[int] = None) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.NotNode(child: pyregexp.re_ast.ASTNode)[source]#
Bases:
pyregexp.re_ast.ASTNode
AST NotNode.
Inherits from ASTNode and models the not-node behavior.
- class pyregexp.re_ast.OrNode(left: pyregexp.re_ast.ASTNode, right: pyregexp.re_ast.ASTNode)[source]#
Bases:
pyregexp.re_ast.ASTNode
AST OrNode.
Inherits from ASTNode and models the or-nodes, that is the nodes that divides the regex into two possible matching paths.
- class pyregexp.re_ast.RE(child: pyregexp.re_ast.ASTNode, capturing: bool = False, group_name: str = 'RegEx')[source]#
Bases:
pyregexp.re_ast.ASTNode
Entry point of the AST.
This class acts as the entry point for a regular expression’s AST.
- class pyregexp.re_ast.RangeElement(match_str: str, is_positive_logic: bool = True)[source]#
Bases:
pyregexp.re_ast.LeafNode
AST RangeElement.
Specialization of the LeafNode class modeling the range-element behavior, that is that it matches with more than one character.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.SpaceElement[source]#
Bases:
pyregexp.re_ast.Element
AST SpaceElement.
Specialization of the element class to model the match-space behavior.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.StartElement[source]#
Bases:
pyregexp.re_ast.LeafNode
AST StartElement.
Inherits from LeafNode and models the match-start-element behavior.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
- class pyregexp.re_ast.WildcardElement[source]#
Bases:
pyregexp.re_ast.Element
AST WildcardElement.
Specialization of the Element class to model the wildcard behavior.
- is_match(ch: Optional[str] = None, str_i: int = 0, str_len: int = 0) bool [source]#
Returns whether the passed inputs matches with the node.
For example, if the node matches the character “a” and the passed ch is “b” the method will return False, but if the passed ch was “a” then the result would have been True.
- Parameters
ch (str) – the char you want to match
str_i (int) – the string index you are considering
str_len (int) – the test string length
- Returns
represents whether there is a match between the node and the passed parameters or not.
- Return type
bool
pyregexp.tokens module#
- class pyregexp.tokens.Asterisk[source]#
Bases:
pyregexp.tokens.ZeroOrMore
Quantifier ‘zero or more’ token using character ‘*’.
- class pyregexp.tokens.Bracket[source]#
Bases:
pyregexp.tokens.Token
Brackets token.
- class pyregexp.tokens.Circumflex[source]#
Bases:
pyregexp.tokens.NotToken
Token of the negation using ‘^’.
- class pyregexp.tokens.Comma[source]#
Bases:
pyregexp.tokens.Token
Token of a comma.
- class pyregexp.tokens.CurlyBrace[source]#
Bases:
pyregexp.tokens.Token
Curly brace token.
- class pyregexp.tokens.Dash[source]#
Bases:
pyregexp.tokens.Token
Token of the dash ‘-‘.
- class pyregexp.tokens.ElementToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token that are not associated to special meaning.
- class pyregexp.tokens.End[source]#
Bases:
pyregexp.tokens.EndToken
Token using ‘$’ to match end.
- class pyregexp.tokens.EndToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of match end.
- class pyregexp.tokens.Escape[source]#
Bases:
pyregexp.tokens.Token
Token of the escape character.
- class pyregexp.tokens.LeftBracket[source]#
Bases:
pyregexp.tokens.Bracket
Left bracke token.
- class pyregexp.tokens.LeftCurlyBrace[source]#
Bases:
pyregexp.tokens.CurlyBrace
Left curly brace token.
- class pyregexp.tokens.LeftParenthesis[source]#
Bases:
pyregexp.tokens.Parenthesis
Left parenthesis token.
- class pyregexp.tokens.NotToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of the negation.
- class pyregexp.tokens.OneOrMore(char: str)[source]#
Bases:
pyregexp.tokens.Quantifier
Quantifier ‘one or more’ token.
- class pyregexp.tokens.OrToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of the or.
- class pyregexp.tokens.Parenthesis[source]#
Bases:
pyregexp.tokens.Token
Token of a parenthesis.
- class pyregexp.tokens.Plus[source]#
Bases:
pyregexp.tokens.OneOrMore
Quantifier ‘one or more’ token using character ‘+’.
- class pyregexp.tokens.Quantifier(char: str)[source]#
Bases:
pyregexp.tokens.Token
Quantifier token.
- class pyregexp.tokens.QuestionMark[source]#
Bases:
pyregexp.tokens.ZeroOrOne
Quantifier ‘zero or one’ token using character ‘?’.
- class pyregexp.tokens.RightBracket[source]#
Bases:
pyregexp.tokens.Bracket
Right bracket token.
- class pyregexp.tokens.RightCurlyBrace[source]#
Bases:
pyregexp.tokens.CurlyBrace
Right curly brace token.
- class pyregexp.tokens.RightParenthesis[source]#
Bases:
pyregexp.tokens.Parenthesis
Right parenthesis token.
- class pyregexp.tokens.SpaceToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of a space.
- class pyregexp.tokens.Start[source]#
Bases:
pyregexp.tokens.StartToken
Token using ‘^’ to match start.
- class pyregexp.tokens.StartToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of match start.
- class pyregexp.tokens.VerticalBar[source]#
Bases:
pyregexp.tokens.OrToken
Token of the or using ‘|’.
- class pyregexp.tokens.Wildcard[source]#
Bases:
pyregexp.tokens.WildcardToken
Token using ‘.’ as wildcard.
- class pyregexp.tokens.WildcardToken(char: str)[source]#
Bases:
pyregexp.tokens.Token
Token of a wildcard.
- class pyregexp.tokens.ZeroOrMore(char: str)[source]#
Bases:
pyregexp.tokens.Quantifier
Quantifier ‘zero or more’ token.
- class pyregexp.tokens.ZeroOrOne(char: str)[source]#
Bases:
pyregexp.tokens.Quantifier
Quantifier ‘zero or one’ token.