Package Torello.HTML

Class TagNode.AttrRegEx

  • Enclosing class:
    TagNode

    public static final class TagNode.AttrRegEx
    extends java.lang.Object
    Regular-Expressions that are used by both the parsing class HTMLPage, and class TagNode for searching HTML tags for attributes and even data.

    All instances of class TagNode are simply wrapped Java-String instance objects. The class TagNode, indeed has well over a dozen instance methods, but the internal data is nothing more than a String that contains the exact text of the HTML Element. Since Java String's are always immutable, modifying the internal-attributes of an element requires creating a new TagNode Object. The way information / data about the individual attribute key-value pairs (for example: HREF="http://some.url.com") involves using standard Java regular-expressions to parse the attribute key-value pairs, and then returning the data via a standard-Java Stream<String>, or another, ubiquitous, Java data structure. This inner, static class just keeps the regular-expressions used by the class TagNode together. Generally, it is not very crucial to understand how the regular-expressions parse the attributes inside of an HTML Element Tag, but if further understanding of this HTML package is needed, the expressions are all here for review. They are fully documented, and links to their use inside class TagNode are even provided in some cases.


    • Field Detail

      • KEY_VALUE_REGEX

        🡇    
        public static final java.util.regex.Pattern KEY_VALUE_REGEX


        Understanding Regular-Expressions:
        Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside class TagNode, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.

        Attributes RegEx:
        This is the regular-expression used to match inner-tag key-value pairs inside an already instantiated HTML Element. Examples of such pairs would include SRC=SomeURL, CLASS=SomeClass, ID=SomeID, etc... This regular-expression will match 3 types of key-value pairs:
        Regular-Expression Sub-Part Explanation
        '[^']*?' Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes.
        \"[^\"]*?\" Double-Quote Match: Matches a key-value pair that employs double-quotes.
        [^\"'>\\s]* No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value-String.
        ([\\w-]+?)= Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the key-value pair.
        \\s+? Mandatory Leading White-Space: When inner-tags are defined, their key-value pairs must be separated by at least one space-character.

        Match Groups:
        The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use the java.util.regex.Matcher method group(int), where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns entire key-value pair (as a String), leaving out the leading white-space
        matcher.group(2) Returns 'key' String of the key-value attribute
        matcher.group(3) Returns 'value' String of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this return String.

        Non-Capturing Group:
        The first set, or "opening pair", of parenthesis begin with the marker '?:'. This marker is used to mean:

        • This is a Regular-Expression Match-Group
        • This is a 'non-captureing' Group, and it's contents may not be retrieved by the matcher's methods
        See Also:
        TagNode.allAV(boolean, boolean)
        Code:
        Exact Field Declaration Expression:
         public static final Pattern KEY_VALUE_REGEX = Pattern.compile(
                     "(?:\\s+?" +                    // mandatory leading white-space
                         "(([\\w-]+?)=(" +           // inner-tag name (a.k.a. 'key' or 'attribute-name')
                             "'[^']*?'"     + "|" +  // inner-tag value using single-quotes ... 'OR'
                             "\"[^\"]*?\""   + "|" + // inner-tag value using double-quotes ... 'OR'
                             "[^\"'>\\s]*"   +       // inner-tag value without quotes
                     ")))",
                     Pattern.CASE_INSENSITIVE | Pattern.DOTALL
                 );
        
      • KEY_VALUE_REGEX_PRED

        🡅  🡇    
        public static final java.util.function.Predicate<java.lang.String> KEY_VALUE_REGEX_PRED
        A Predicate<String> Regular-Expression.
        See Also:
        KEY_VALUE_REGEX
        Code:
        Exact Field Declaration Expression:
         public static final Predicate<String> KEY_VALUE_REGEX_PRED =
                     KEY_VALUE_REGEX.asPredicate();
        
      • QUOTES_AND_VALUE_REGEX

        🡅  🡇    
        public static final java.util.regex.Pattern QUOTES_AND_VALUE_REGEX
        Legacy Regular Expression:
        This RegEx was originall used by the method TagNode.AV(String), but no longer is. This isn't being deprecated because it still serves the purpose of showing how the HTML Tags in this class are stored.

        Capture Group:
        This Regular-Expression has a single set of parenthesis (and therefore only one Capture-Group!). Notice that that group practically includes the entire RegEx - all except the very first equals-sign located at the first character of the String.
        See Also:
        TagNode.AV(String)
        Code:
        Exact Field Declaration Expression:
         public static final Pattern QUOTES_AND_VALUE_REGEX = Pattern.compile(
                     // Matches, for example:  ='MyClass'   or    ="MyClass"   or   =MyClass
                     "=(" + 
                         "\"[^\"]*?\""   + "|" + // inner-tag value using single-quotes ... 'OR'
                         "'[^']*?'"      + "|" + // inner-tag value using double-quotes ... 'OR'
                         "[\\w-]+"       +       // inner-tag value without quotes
                     ")",
                     Pattern.DOTALL
                 );
        
      • QUOTES_AND_VALUE_REGEX_PRED

        🡅  🡇    
        public static final java.util.function.Predicate<java.lang.String> QUOTES_AND_VALUE_REGEX_PRED
        A Predicate<String> Regular-Expression.
        See Also:
        QUOTES_AND_VALUE_REGEX
        Code:
        Exact Field Declaration Expression:
         public static final Predicate<String> QUOTES_AND_VALUE_REGEX_PRED =
                     QUOTES_AND_VALUE_REGEX.asPredicate();
        
      • ATTRIBUTE_KEY_REGEX

        🡅  🡇    
        public static final java.util.regex.Pattern ATTRIBUTE_KEY_REGEX
        This matches all valid attribute-keys (not values) of HTML Element key-value pairs.

        • PART-1: [A-Za-z_] The first character must be a letter or the underscore.
        • PART-2: [A-Za-z0-9_-] All other characters must be alpha-numeric, the dash '-', or the underscore '_'.
        See Also:
        InnerTagKeyException.check(String[]), TagNode.allKeyOnlyAttributes(boolean)
        Code:
        Exact Field Declaration Expression:
         public static final Pattern ATTRIBUTE_KEY_REGEX = 
                     Pattern.compile("^[A-Za-z_][A-Za-z0-9_-]*$");
        
      • ATTRIBUTE_KEY_REGEX_PRED

        🡅  🡇    
        public static final java.util.function.Predicate<java.lang.String> ATTRIBUTE_KEY_REGEX_PRED
        A Predicate<String> Regular-Expression.
        See Also:
        ATTRIBUTE_KEY_REGEX
        Code:
        Exact Field Declaration Expression:
         public static final Predicate<String> ATTRIBUTE_KEY_REGEX_PRED =
                     ATTRIBUTE_KEY_REGEX.asPredicate();
        
      • DATA_ATTRIBUTE_REGEX

        🡅  🡇    
        public static final java.util.regex.Pattern DATA_ATTRIBUTE_REGEX
        This is used to match HTML "Data-Attribute" elements. An HTML Data-Attribute is one which the attribute-name of the attribute key-value pair - begins with the characters'data-*'

        Understanding Regular-Expressions:
        Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside class TagNode, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.

        HTML Data-Attributes RegEx:
        The table include below is a brief explanation of what each of the elements of this Regular-Expression for capturing HTML Data-Attributes / Inner-Tags can do.
        Regular-Expression Sub-Part Explanation
        '[^']*?' Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes.
        \"[^\"]*?\" Double-Quote Match: Matches a key-value pair that employs double-quotes.
        [^\"'>\\s]* No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value-String.
        ([\\w-]+?) Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the data-attribute key-value pair. Note that the characters 'data-' are required to match the attribute, but are not included in this capture group.

        Match Groups:
        The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use the java.util.regex.Matcher method group(int), where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns entire data-attribute key-value pair (as a String), leaving out the leading white-space
        matcher.group(2) Returns 'key' String of the data-attribute key-value attribute. Note that the initial substring data-* is not included along with the attribute-name (return value) for this capture-group, because it is outside of the capturing parenthesis for this group.
        matcher.group(3) Returns 'value' String of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this return String.

        Non-Capturing Group:
        The first set, or "opening pair", of parenthesis begin with the marker '?:'. This marker is used to mean:

        • This is a Regular-Expression Match-Group
        • This is a 'non-captureing' Group, and it's contents may not be retrieved by the matcher's methods
        See Also:
        TagNode.getDataAN(), TagNode.getDataAV()
        Code:
        Exact Field Declaration Expression:
         public static final Pattern DATA_ATTRIBUTE_REGEX = Pattern.compile(
                     // regex will match, for example:   data-src="https://cdn.imgur.com/MyImage.jpg"
                     "(?:\\s+?" +                            // mandatory leading white-space
                         "(data-([\\w-]+?)=" +               // data inner-tag name 
                             "(" +   "'[^']*?'"      + "|" + // inner-tag value using single-quotes ... 'OR'
                                     "\"[^\"]*?\""   + "|" + // inner-tag value using double-quotes ... 'OR
                                     "[^\"'>\\s]*"   +       // inner-tag value without quotes
                         ")))",
                     Pattern.CASE_INSENSITIVE | Pattern.DOTALL  
                 );
        
      • DATA_ATTRIBUTE_REGEX_PRED

        🡅  🡇    
        public static final java.util.function.Predicate<java.lang.String> DATA_ATTRIBUTE_REGEX_PRED
        A Predicate<String> Regular-Expression.
        See Also:
        DATA_ATTRIBUTE_REGEX
        Code:
        Exact Field Declaration Expression:
         public static final Predicate<String> DATA_ATTRIBUTE_REGEX_PRED =
                     DATA_ATTRIBUTE_REGEX.asPredicate();
        
      • CSS_INLINE_STYLE_REGEX

        🡅  🡇    
        public static final java.util.regex.Pattern CSS_INLINE_STYLE_REGEX
        Understanding Regular-Expressions:
        Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither the methods inside class TagNode, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.

        Inline-CSS RegEx:
        This is a regular expression Pattern that matches CSS Style Definitions that are directly 'inlined' into HTML TagNode instances.
        Regular-Expression Sub-Part Explanation
        -?[_a-zA-Z]+[_\\-a-zA-Z0-9]* The standard CSS-Token definition. The CSS variable-name may begin with a '-' (dash), and may then have a letter a..z, A..Z. Afterwards, the declaration may only contain the following: letters, numbers, dashes and/or the underscore '_'.
        : After the CSS variable-name, the declaration shall be followed by a colon (':'), and then may contain any ASCII text-characters - except the character, semi-colon (';').
        ;|$|[\\w]+$ says that the CSS inline declaration should be continued with a semicolon, or it may also reach the end of the 'style' attribute-value after arriving at the "end of the declaration" which, in regular-expressions, is marked by a dollar-sign: '$'.

        Match Groups:
        The table below will help explain & point-out how each of the "Regular-Expression Match-Groups" are evaluated. To retrieve a sub-part of a match, use the java.util.regex.Matcher method group(int), where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns the CSS Style Property Name, for-instance 'font-weight' or 'border'
        matcher.group(2) Returns the CSS Style Property Value, for-instance bold or 1px 1px 1px 1px
        matcher.group(3) Returns white-space, or the semicolon, that may exist between property definitions.
        See Also:
        TagNode.cssStyle()
        Code:
        Exact Field Declaration Expression:
         public static final Pattern CSS_INLINE_STYLE_REGEX = Pattern.compile(
                         // regex will match, for example:  font-weight: bold;
        
                         // CSS Style Property Name - Must begin with letter or underscore
                         "([_\\-a-zA-Z]+" + "[_\\-a-zA-Z0-9]*)" +
        
                         // The ":" symbol between property-name and property-value
                         "\\s*?" + ":" + "\\s*?" +
        
                         // CSS Style Property Value
                         "([^;]+?\\s*)" +
        
                         // text after the "Name : Value" definition    
                         "(;|$|[\\w]+$)"
                 );
        
      • CSS_INLINE_STYLE_REGEX_PRED

        🡅    
        public static final java.util.function.Predicate<java.lang.String> CSS_INLINE_STYLE_REGEX_PRED
        A Predicate<String> Regular-Expression.
        See Also:
        CSS_INLINE_STYLE_REGEX
        Code:
        Exact Field Declaration Expression:
         public static final Predicate<String> CSS_INLINE_STYLE_REGEX_PRED =
                     CSS_INLINE_STYLE_REGEX.asPredicate();