Package Torello.HTML
Class TagNode.AttrRegEx
- java.lang.Object
-
- Torello.HTML.TagNode.AttrRegEx
-
- Enclosing class:
- TagNode
public static final class TagNode.AttrRegEx extends java.lang.Object
Regular-Expressions that are used by both the parsing classHTMLPage
, and classTagNode
for searching HTML tags for attributes and even data.
All instances ofclass TagNode
are simply wrappedJava-String
instance objects. The classTagNode
, indeed has well over a dozen instance methods, but the internal data is nothing more than aString
that contains the exact text of the HTML Element. Since JavaString's
are always immutable, modifying the internal-attributes of an element requires creating a newTagNode Object.
The way information / data about the individual attribute key-value pairs (for example:HREF="http://some.url.com"
) involves using standard Java regular-expressions to parse the attribute key-value pairs, and then returning the data via a standard-JavaStream<String>
, or another, ubiquitous, Java data structure. Thisinner, static
class just keeps the regular-expressions used by theclass TagNode
together. Generally, it is not very crucial to understand how the regular-expressions parse the attributes inside of an HTML Element Tag, but if further understanding of this HTML package is needed, the expressions are all here for review. They are fully documented, and links to their use insideclass TagNode
are even provided in some cases.
Hi-Lited Source-Code:- View Here: Torello/HTML/TagNode.java
- Open New Browser-Tab: Torello/HTML/TagNode.java
-
-
Field Summary
HTML Attribute Regular-Expressions Modifier and Type Field static Pattern
ATTRIBUTE_KEY_REGEX
static Pattern
CSS_INLINE_STYLE_REGEX
static Pattern
DATA_ATTRIBUTE_REGEX
static Pattern
KEY_VALUE_REGEX
static Pattern
QUOTES_AND_VALUE_REGEX
Regular-Expressions as Predicate<String> Modifier and Type Field static Predicate<String>
ATTRIBUTE_KEY_REGEX_PRED
static Predicate<String>
CSS_INLINE_STYLE_REGEX_PRED
static Predicate<String>
DATA_ATTRIBUTE_REGEX_PRED
static Predicate<String>
KEY_VALUE_REGEX_PRED
static Predicate<String>
QUOTES_AND_VALUE_REGEX_PRED
-
-
-
Field Detail
-
KEY_VALUE_REGEX
public static final java.util.regex.Pattern KEY_VALUE_REGEX
Understanding Regular-Expressions:
Knowledge and understanding ofjava.util.regex.*
can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside classTagNode
, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.
Attributes RegEx:
This is the regular-expression used to match inner-tag key-value pairs inside an already instantiated HTML Element. Examples of such pairs would includeSRC=SomeURL
,CLASS=SomeClass
,ID=SomeID
, etc... This regular-expression will match 3 types of key-value pairs:Regular-Expression Sub-Part Explanation '[^']*?'
Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes. \"[^\"]*?\"
Double-Quote Match: Matches a key-value pair that employs double-quotes. [^\"'>\\s]*
No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value- String
.([\\w-]+?)=
Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the key-value pair. \\s+?
Mandatory Leading White-Space: When inner-tags are defined, their key-value pairs must be separated by at least one space-character.
Match Groups:
The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use thejava.util.regex.Matcher
methodgroup(int)
, where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.Match Group Number Group Return String matcher.group(1)
Returns entire key-value pair (as a String
), leaving out the leading white-spacematcher.group(2)
Returns 'key' String
of the key-value attributematcher.group(3)
Returns 'value' String
of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this returnString
.
Non-Capturing Group:
The first set, or "opening pair", of parenthesis begin with the marker'?:'
. This marker is used to mean:- This is a Regular-Expression Match-Group
- This is a
'non-captureing'
Group, and it's contents may not be retrieved by the matcher's methods
- See Also:
TagNode.allAV(boolean, boolean)
- Code:
- Exact Field Declaration Expression:
public static final Pattern KEY_VALUE_REGEX = Pattern.compile( "(?:\\s+?" + // mandatory leading white-space "(([\\w-]+?)=(" + // inner-tag name (a.k.a. 'key' or 'attribute-name') "'[^']*?'" + "|" + // inner-tag value using single-quotes ... 'OR' "\"[^\"]*?\"" + "|" + // inner-tag value using double-quotes ... 'OR' "[^\"'>\\s]*" + // inner-tag value without quotes ")))", Pattern.CASE_INSENSITIVE | Pattern.DOTALL );
-
KEY_VALUE_REGEX_PRED
public static final java.util.function.Predicate<java.lang.String> KEY_VALUE_REGEX_PRED
APredicate<String>
Regular-Expression.- See Also:
KEY_VALUE_REGEX
- Code:
- Exact Field Declaration Expression:
public static final Predicate<String> KEY_VALUE_REGEX_PRED = KEY_VALUE_REGEX.asPredicate();
-
QUOTES_AND_VALUE_REGEX
public static final java.util.regex.Pattern QUOTES_AND_VALUE_REGEX
Legacy Regular Expression:
This RegEx was originall used by the methodTagNode.AV(String)
, but no longer is. This isn't being deprecated because it still serves the purpose of showing how the HTML Tags in this class are stored.
Capture Group:
This Regular-Expression has a single set of parenthesis (and therefore only one Capture-Group!). Notice that that group practically includes the entire RegEx - all except the very first equals-sign located at the first character of theString
.- See Also:
TagNode.AV(String)
- Code:
- Exact Field Declaration Expression:
public static final Pattern QUOTES_AND_VALUE_REGEX = Pattern.compile( // Matches, for example: ='MyClass' or ="MyClass" or =MyClass "=(" + "\"[^\"]*?\"" + "|" + // inner-tag value using single-quotes ... 'OR' "'[^']*?'" + "|" + // inner-tag value using double-quotes ... 'OR' "[\\w-]+" + // inner-tag value without quotes ")", Pattern.DOTALL );
-
QUOTES_AND_VALUE_REGEX_PRED
public static final java.util.function.Predicate<java.lang.String> QUOTES_AND_VALUE_REGEX_PRED
APredicate<String>
Regular-Expression.- See Also:
QUOTES_AND_VALUE_REGEX
- Code:
- Exact Field Declaration Expression:
public static final Predicate<String> QUOTES_AND_VALUE_REGEX_PRED = QUOTES_AND_VALUE_REGEX.asPredicate();
-
ATTRIBUTE_KEY_REGEX
public static final java.util.regex.Pattern ATTRIBUTE_KEY_REGEX
This matches all valid attribute-keys (not values) of HTML Element key-value pairs.- PART-1:
[A-Za-z_]
The first character must be a letter or the underscore. - PART-2:
[A-Za-z0-9_-]
All other characters must be alpha-numeric, the dash'-'
, or the underscore'_'
.
- See Also:
InnerTagKeyException.check(String[])
,TagNode.allKeyOnlyAttributes(boolean)
- Code:
- Exact Field Declaration Expression:
public static final Pattern ATTRIBUTE_KEY_REGEX = Pattern.compile("^[A-Za-z_][A-Za-z0-9_-]*$");
- PART-1:
-
ATTRIBUTE_KEY_REGEX_PRED
public static final java.util.function.Predicate<java.lang.String> ATTRIBUTE_KEY_REGEX_PRED
APredicate<String>
Regular-Expression.- See Also:
ATTRIBUTE_KEY_REGEX
- Code:
- Exact Field Declaration Expression:
public static final Predicate<String> ATTRIBUTE_KEY_REGEX_PRED = ATTRIBUTE_KEY_REGEX.asPredicate();
-
DATA_ATTRIBUTE_REGEX
public static final java.util.regex.Pattern DATA_ATTRIBUTE_REGEX
This is used to match HTML "Data-Attribute" elements. An HTML Data-Attribute is one which the attribute-name of the attribute key-value pair - begins with the characters'data-*'
Understanding Regular-Expressions:
Knowledge and understanding ofjava.util.regex.*
can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside classTagNode
, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.
HTML Data-Attributes RegEx:
The table include below is a brief explanation of what each of the elements of this Regular-Expression for capturing HTML Data-Attributes / Inner-Tags can do.Regular-Expression Sub-Part Explanation '[^']*?'
Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes. \"[^\"]*?\"
Double-Quote Match: Matches a key-value pair that employs double-quotes. [^\"'>\\s]*
No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value- String
.([\\w-]+?)
Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the data-attribute key-value pair. Note that the characters 'data-'
are required to match the attribute, but are not included in this capture group.
Match Groups:
The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use thejava.util.regex.Matcher
methodgroup(int)
, where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.Match Group Number Group Return String matcher.group(1)
Returns entire data-attribute key-value pair (as a String
), leaving out the leading white-spacematcher.group(2)
Returns 'key' String
of the data-attribute key-value attribute. Note that the initial substringdata-*
is not included along with the attribute-name (return value) for this capture-group, because it is outside of the capturing parenthesis for this group.matcher.group(3)
Returns 'value' String
of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this returnString
.
Non-Capturing Group:
The first set, or "opening pair", of parenthesis begin with the marker'?:'
. This marker is used to mean:- This is a Regular-Expression Match-Group
- This is a
'non-captureing'
Group, and it's contents may not be retrieved by the matcher's methods
- See Also:
TagNode.getDataAN()
,TagNode.getDataAV()
- Code:
- Exact Field Declaration Expression:
public static final Pattern DATA_ATTRIBUTE_REGEX = Pattern.compile( // regex will match, for example: data-src="https://cdn.imgur.com/MyImage.jpg" "(?:\\s+?" + // mandatory leading white-space "(data-([\\w-]+?)=" + // data inner-tag name "(" + "'[^']*?'" + "|" + // inner-tag value using single-quotes ... 'OR' "\"[^\"]*?\"" + "|" + // inner-tag value using double-quotes ... 'OR "[^\"'>\\s]*" + // inner-tag value without quotes ")))", Pattern.CASE_INSENSITIVE | Pattern.DOTALL );
-
DATA_ATTRIBUTE_REGEX_PRED
public static final java.util.function.Predicate<java.lang.String> DATA_ATTRIBUTE_REGEX_PRED
APredicate<String>
Regular-Expression.- See Also:
DATA_ATTRIBUTE_REGEX
- Code:
- Exact Field Declaration Expression:
public static final Predicate<String> DATA_ATTRIBUTE_REGEX_PRED = DATA_ATTRIBUTE_REGEX.asPredicate();
-
CSS_INLINE_STYLE_REGEX
public static final java.util.regex.Pattern CSS_INLINE_STYLE_REGEX
Understanding Regular-Expressions:
Knowledge and understanding ofjava.util.regex.*
can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside classTagNode
, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.
Inline-CSS RegEx:
This is a regular expressionPattern
that matches CSSStyle
Definitions that are directly 'inlined' into HTMLTagNode
instances.Regular-Expression Sub-Part Explanation -?[_a-zA-Z]+[_\\-a-zA-Z0-9]*
The standard CSS-Token definition. The CSS variable-name may begin with a '-'
(dash), and may then have a lettera..z, A..Z
. Afterwards, the declaration may only contain the following: letters, numbers, dashes and/or the underscore'_'
.:
After the CSS variable-name, the declaration shall be followed by a colon ( ':'
), and then may contain any ASCII text-characters - except the character, semi-colon (';'
).;|$|[\\w]+$
says that the CSS inline declaration should be continued with a semicolon, or it may also reach the end of the 'style'
attribute-value after arriving at the "end of the declaration" which, in regular-expressions, is marked by a dollar-sign:'$'
.
Match Groups:
The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use thejava.util.regex.Matcher
methodgroup(int)
, where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.Match Group Number Group Return String matcher.group(1)
Returns the CSS Style Property Name, for-instance 'font-weight'
or'border'
matcher.group(2)
Returns the CSS Style Property Value, for-instance bold
or1px 1px 1px 1px
matcher.group(3)
Returns white-space, or the semicolon, that may exist between property definitions. - See Also:
TagNode.cssStyle()
- Code:
- Exact Field Declaration Expression:
public static final Pattern CSS_INLINE_STYLE_REGEX = Pattern.compile( // regex will match, for example: font-weight: bold; // CSS Style Property Name - Must begin with letter or underscore "([_\\-a-zA-Z]+" + "[_\\-a-zA-Z0-9]*)" + // The ":" symbol between property-name and property-value "\\s*?" + ":" + "\\s*?" + // CSS Style Property Value "([^;]+?\\s*)" + // text after the "Name : Value" definition "(;|$|[\\w]+$)" );
-
CSS_INLINE_STYLE_REGEX_PRED
public static final java.util.function.Predicate<java.lang.String> CSS_INLINE_STYLE_REGEX_PRED
APredicate<String>
Regular-Expression.- See Also:
CSS_INLINE_STYLE_REGEX
- Code:
- Exact Field Declaration Expression:
public static final Predicate<String> CSS_INLINE_STYLE_REGEX_PRED = CSS_INLINE_STYLE_REGEX.asPredicate();
-
-