Package Torello.HTML

Class Util


  • public class Util
    extends java.lang.Object
    A long list of utilities for searching, finding, extracting and removing HTML from Vectorized-HTML.

    This is a list of some of the common "helper routines" that I occasionally need. There are not in any particular order. Almost all of these routines are used internally, either in the NodeSearch search-loops and iterators, or else they are found in parts of package "Tools." The possibility to expand classes like this is probably "boundless" - however, keep in mind that classes like public class 'SubSection' and also public class 'NodeIndex' and both of its sub-classes public class 'TagNodeIndex' and 'TextNodeIndex' make some of the short, for-loop-driven, helper-routines seems a little spurious.

    The most complicated and easy-to-make-mistakes are the for-loops & iterators of the node-search package. With these solidly tested for over a year, the helper routines that build those for-loops are included in this class here. Extending more utility and modification tools for vectorized-html pages might be the subject of future development work, but easily the most complicated stuff - search and iterate - have been handled. The methods here might be useful, but it is not a "precise science" on what is a usable class, and what is not. Please remember that the methods ending in "OPT" (meaning optimized) really just mean that a couple of the exception throw checks are not there, because those do not need to be repeated on each iteration of a node-search search-for-loop when the for-loop criteria are specified in the method-signature, and (hopefully, obviously) do not need to be checked on each loop iteration.



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 36 Method(s), 36 declared static
    • 0 Field(s)


    • Method Detail

      • trimTextNodes

        🡅  🡇     🗕  🗗  🗖
        public static int trimTextNodes​(java.util.Vector<HTMLNode> page,
                                        int sPos,
                                        int ePos,
                                        boolean deleteZeroLengthStrings)
        This will iterate through the entire Vector<HTMLNode>, and invoke java.lang.String.trim() on each TextNode on the page. If this invocation results in a reduction of String.length(), then a new TextNode will be instantiated whose TextNode.str field is set to the result of the String.trim(old_node.str) operation.
        Parameters:
        deleteZeroLengthStrings - If a TextNode's length is zero (before or after trim() is called) and when this parameter is TRUE, that TextNode must be removed from the Vector.
        Returns:
        Any node that is trimmed or deleted will increment the counter. This counter final-value is returned
        Code:
        Exact Method Body:
         int                 counter = 0;
         IntStream.Builder   b       = deleteZeroLengthStrings ? IntStream.builder() : null;
         HTMLNode            n       = null;
         LV                  l       = new LV(page, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++)
        
             if ((n = page.elementAt(i)).isTextNode())
             {
                 String  trimmed         = n.str.trim();
                 int     trimmedLength   = trimmed.length();
        
                 if ((trimmedLength == 0) && deleteZeroLengthStrings)
                     { b.add(i); counter++; }
        
                 else if (trimmedLength < n.str.length())
                     { page.setElementAt(new TextNode(trimmed), i); counter++; }
             }
        
         if (deleteZeroLengthStrings) Util.Remove.nodesOPT(page, b.build().toArray());
        
         return counter;
        
      • rangeToString

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String rangeToString​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        The purpose of this method/function is to convert a portion of the contents of an HTML-Page, currently being represented as a Vector of HTMLNode's into a String. Two 'int' parameters are provided in this method's signature to define a sub-list of a page to be converted to a java.lang.String
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The Vector converted into a String.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        pageToString(Vector), rangeToString(Vector, DotPair)
        Code:
        Exact Method Body:
         StringBuilder   ret = new StringBuilder();
         LV              l   = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++) ret.append(html.elementAt(i).str);
        
         return ret.toString();
        
      • textNodesString

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String textNodesString​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        This will return a String that is comprised of ONLY the TextNode's contained within the input Vector - and furthermore, only nodes that are situated between index int 'sPos' and index int 'ePos' in that Vector.

        The for-loop that iterates the input-Vector parameter will simply skip an instance of 'TagNode' and 'CommentNode' when building the output return String..
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        This will return a String that is comprised of the text-only elements in the web-page or sub-page. Only text between the requested Vector-indices is included.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        textNodesString(Vector, DotPair), textNodesString(Vector)
        Code:
        Exact Method Body:
         StringBuilder   sb  = new StringBuilder();
         LV              l   = new LV(html, sPos, ePos);
         HTMLNode        n;
        
         for (int i=l.start; i < l.end; i++)
             if ((n = html.elementAt(i)).isTextNode())
                 sb.append(n.str);
        
         return sb.toString();
        
      • escapeTextNodes

        🡅  🡇     🗕  🗗  🗖
        public static int escapeTextNodes​(java.util.Vector<HTMLNode> html,
                                          int sPos,
                                          int ePos)
        Will call HTML.Escape.replaceAll on each TextNode in the range of sPos ... ePos
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TextNode's that changed as a result of the Escape.replaceAll(n.str) loop.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        Escape.replaceAll(String)
        Code:
        Exact Method Body:
         LV          l       = new LV(html, sPos, ePos);
         HTMLNode    n       = null;
         String      s       = null;
         int	        counter = 0;
        
         for (int i=l.start; i < l.end; i++)
        
             if ((n = html.elementAt(i)).isTextNode())
                 if (! (s = Escape.replace(n.str)).equals(n.str))
                 {
                     html.setElementAt(new TextNode(s), i);
                     counter++;
                 }
        
         return counter;
        
      • cloneRange

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Vector<HTMLNodecloneRange​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Copies (clones!) a sub-range of the HTML page, stores the results in a Vector, and returns it.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The "cloned" (copied) sub-range specified by 'sPos' and 'ePos'.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        cloneRange(Vector, DotPair)
        Code:
        Exact Method Body:
         LV                  l   = new LV(html, sPos, ePos);
         Vector<HTMLNode>    ret = new Vector<>(l.size());
        
         // Copy the range specified into the return vector
         //
         // HOW THIS WAS DONE BEFORE NOTICING Vector.subList
         //
         // for (int i = l.start; i < l.end; i++) ret.addElement(html.elementAt(i));
        
         ret.addAll(html.subList(l.start, l.end));
        
         return ret;
        
      • textStrLength

        🡅  🡇     🗕  🗗  🗖
        public static int textStrLength​(java.util.Vector<? extends HTMLNode> html,
                                        int sPos,
                                        int ePos)
        This method will return the length of the strings contained by all/only instances of 'TextNode' among the nodes of the input HTML-Vector. This is identical to the behavior of the method with the same name, but includes starting and ending bounds on the html Vector: 'sPos' & 'ePos'.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The sum of the lengths of the text contained by text-nodes in the Vector between 'sPos' and 'ePos'.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         HTMLNode    n;
         int         sum = 0;
         LV          l   = new LV(html, sPos, ePos);
        
         // Counts the length of each "String" in a "TextNode" between sPos and ePos
         for (int i=l.start; i < l.end; i++)
        
             if ((n = html.elementAt(i)).isTextNode())
                 sum += n.str.length();
        
         return sum;
        
      • compactTextNodes

        🡅  🡇     🗕  🗗  🗖
        public static int compactTextNodes​(java.util.Vector<HTMLNode> html,
                                           int sPos,
                                           int ePos)
        Occasionally, when removing instances of TagNode from a vectorized-html page, certain instances of TextNode which were not adjacent / neighbours in the Vector, all of a sudden become adjacent. Although there are no major problems with contiguous instances of TextNode from the Search Algorithm's perspective, for programmer's, it can sometimes be befuddling to realize that the output text that is returned from a call to Util.pageToString(html) is not being found because the text that is left is broken amongst multiple instances of adjacent TextNodes.

        This method merely combines "Adjacent" instances of class TextNode in the Vector into single instances of class TextNode
        Parameters:
        html - Any vectorized-html web-page. If this page contain any contiguously placed TextNode's, the extra's will be eliminated, and the internal-string's inside the node's (TextNode.str) will be combined. This action will reduce the size of the actual html-Vector.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of nodes that were eliminated after being combined, or 0 if there were no text-nodes that were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HTMLNode.str, TextNode
        Code:
        Exact Method Body:
         LV      l           = new LV(html, sPos, ePos);
         boolean compacting  = false;
         int     firstPos    = -1;
         int     delta       = 0;
        
         for (int i=l.start; i < (l.end - delta); i++)
        
             if (html.elementAt(i).isTextNode())
             {
                 if (compacting) continue;   // Not in "Compacting Mode"
                 compacting  = true;         // Start "Compacting Mode" - this is a TextNode
                 firstPos    = i;
             }
        
             else if (compacting && (firstPos < (i-1)))  // Else - Must be a TagNode or CommentNode
             {
                 // Save compacted TextNode String's into this StringBuilder
                 StringBuilder compacted = new StringBuilder();
        
                 // Iterate all TextNodes that were adjacent, put them together into StringBuilder
                 for (int j=firstPos; j < i; j++) compacted.append(html.elementAt(j).str);
        
                 // Place this new "aggregate TextNode" at location of the first TextNode that
                 // was compacted into this StringBuilder
        
                 html.setElementAt(new TextNode(compacted.toString()), firstPos);
        
                 // Remove the rest of the positions in the Vector that had TextNode's.  These have
                 // all been put together into the "Aggregate TextNode" at position "firstPos"
        
                 Util.Remove.range(html, firstPos + 1, i);
        
                 // The change in the size of the Vector needs to be accounted for.
                 delta += (i - firstPos - 1);
        
                 // Change the loop-counter variable, too, since the size of the Vector has changed.
                 i = firstPos + 1;
        
                 // Since we just hit a CommentNode, or TagNode, exit "Compacting Mode."
                 compacting = false;
        
             }
        
             // NOTE: This, ALSO, MUST BE a TagNode or CommentNode (just like the previous
             //       if-else branch !)
             // TRICKY: Don't forget this 'else' !
        
             else compacting = false;
        
         // Added - Don't forget the case where the Vector ends with a series of TextNodes
         // TRICKY TOO! (Same as the HTML Parser... The ending or 'trailing' nodes must be parsed
        
         int lastNodePos = html.size() - 1;
        
         if (html.elementAt(lastNodePos).isTextNode()) if (compacting && (firstPos < lastNodePos))
         {
             StringBuilder compacted = new StringBuilder();
        
             // Compact the TextNodes that were identified at the end of the Vector range.
             for (int j=firstPos; j <= lastNodePos; j++) compacted.append(html.elementAt(j).str);
        
             // Replace the group of TextNode's at the end of the Vector, with the single, aggregate
             html.setElementAt(new TextNode(compacted.toString()), firstPos);
             Util.Remove.range(html, firstPos + 1, lastNodePos + 1);
         }
        
         return delta;
        
      • strLength

        🡅  🡇     🗕  🗗  🗖
        public static int strLength​(java.util.Vector<? extends HTMLNode> html,
                                    int sPos,
                                    int ePos)
        This method simply adds / sums the String-length of every HTMLNode.str field in the passed page-Vector. It only counts nodes between parameters sPos (inclusive) and ePos (exclusive).
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The total length - in characters - of the sub-page of HTML between 'sPos' and 'ePos'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        strLength(Vector)
        Code:
        Exact Method Body:
         int ret = 0;
         LV  l   = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++) ret += html.elementAt(i).str.length();
        
         return ret;
        
      • hashCode

        🡅  🡇     🗕  🗗  🗖
        public static int hashCode​(java.util.Vector<? extends HTMLNode> html,
                                   int sPos,
                                   int ePos)
        Generates a hash-code for a vectorized html page-Vector.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        Returns the String.hashCode() of the partial HTML-page as if it were not being stored as a Vector, but rather as HTML inside of a Java-String.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hashCode(Vector)
        Code:
        Exact Method Body:
         int h   = 0;
         LV  lv  = new LV(html, sPos, ePos);
        
         for (int j=lv.start; j < lv.end; j++)
         {
             String  s = html.elementAt(j).str;
             int     l = s.length();
        
             // This line has been copied from the jdk8/jdk8 "String.hashCode()" method.
             // The difference is that it iterates over the entire vector
        
             for (int i=0; i < l; i++) h = 31 * h + s.charAt(i);
         }
        
         return h;
        
      • getJSONScriptBlocks

        🡅  🡇     🗕  🗗  🗖
        public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks​
                    (java.util.Vector<HTMLNode> html,
                     int sPos,
                     int ePos)
        
        This method shall search for any and all <SCRIPT TYPE="json"> JSON TEXT </SCRIPT> block present in a range of Vectorized HTML. The search method shall simply look for the toke "JSON" in the TYPE attribute of each and every <SCRIPT> TagNode that is found on the page. The validity of the JSON found within such blocks is not checked for validity, nor is it even guaranteed to be JSON data!
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        This will return a java.util.stream.Stream<String> of each of the JSON elements present in the specified range of the Vectorized HTML passed to parameter 'html'.

        Conversion-Target Stream-Method Invocation
        String[] Stream.toArray(String[]::new);
        List<String> Stream.collect(Collectors.toList());
        Vector<String> Stream.collect(Collectors.toCollection(Vector::new));
        TreeSet<String> Stream.collect(Collectors.toCollection(TreeSet::new));
        Iterator<String> Stream.iterator();
        See Also:
        StrTokCmpr.containsIgnoreCase(String, Predicate, String), rangeToString(Vector, int, int)
        Code:
        Exact Method Body:
         // Whenever building lists, it is usually easiest to use a Stream.Builder
         Stream.Builder<String> b = Stream.builder();
        
         // This Predicate simply tests that if the substring "json" (CASE INSENSITIVE) is found
         // in the TYPE attribute of a <SCRIPT TYPE=...> node, that the token-string is, indeed a
         // word - not a substring of some other word.  For instance: TYPE="json" would PASS, but
         // TYPE="rajsong" would FAIL - because the token string is not surrounded by white-space
        
         final Predicate<String> tester = (String s) ->
             StrTokCmpr.containsIgnoreCase
                 (s, (Character c) -> ! Character.isLetterOrDigit(c), "json");
        
         // Find all <SCRIPT> node-blocks whose "TYPE" attribute abides by the tester
         // String-Predicate named above.
        
         Vector<DotPair> jsonDPList = InnerTagFindInclusive.all
             (html, sPos, ePos, "script", "type", tester);
        
         // Convert each of these DotPair element into a java.lang.String
         // Add the String to the Stream.Builder<String>
        
         for (DotPair jsonDP : jsonDPList)
             if (jsonDP.size() > 2)
                 b.accept(Util.rangeToString(html, jsonDP.start + 1, jsonDP.end));
        
         // Build the Stream, and return it.
         return b.build();
        
      • insertNodes

        🡅  🡇     🗕  🗗  🗖
        public static void insertNodes​(java.util.Vector<HTMLNode> html,
                                       int pos,
                                       HTMLNode... nodes)
        Inserts nodes, and allows a 'varargs' parameter.
        Parameters:
        html - Any HTML Page
        pos - The position in the original Vector where the nodes shall be inserted.
        nodes - A list of nodes to insert.
        Code:
        Exact Method Body:
         Vector<HTMLNode> nodesVec = new Vector<>(nodes.length);
         for (HTMLNode node : nodes) nodesVec.addElement(node);
         html.addAll(pos, nodesVec);
        
      • replaceRange

        🡅  🡇     🗕  🗗  🗖
        public static void replaceRange​(java.util.Vector<HTMLNode> page,
                                        int sPos,
                                        int ePos,
                                        java.util.Vector<HTMLNode> newNodes)
        Replaces any all and all HTMLNode's located between the Vector locations 'sPos' (inclusive) and 'ePos' (exclusive). By exclusive, this means that the HTMLNode located at positon 'ePos' will not be replaced, but the one at 'sPos' is replaced.

        The size of the Vector will change by newNodes.size() - (ePos + sPos). The contents situated between Vector location sPos and sPos + newNodes.size() will, indeed, be the contents of the 'newNodes' parameter.
        Parameters:
        page - Any Java HTML page, constructed of HTMLNode (TagNode & TextNode)
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        newNodes - Any Java HTML page-Vector of HTMLNode.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        pollRange(Vector, int, int), Util.Remove.range(Vector, int, int), replaceRange(Vector, DotPair, Vector)
        Code:
        Exact Method Body:
         // Torello.Java.LV
         LV l = new LV(sPos, ePos, page);
        
         int oldSize     = ePos - sPos;
         int newSize     = newNodes.size();
         int insertPos   = sPos;
         int i           = 0;
        
         while ((i < newSize) && (i < oldSize))
             page.setElementAt(newNodes.elementAt(i++), insertPos++);
        
        
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
         // CASE ONE:
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
        
         if (newSize == oldSize) return;
        
        
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
         // CASE TWO:
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
         //
         // The new Vector is SMALLER than the old sub-range
         // The rest of the nodes just need to be trashed
         //
         // OLD-WAY: (Before realizing what Vector.subList is actually doing)
         // Util.removeRange(page, insertPos, ePos);
        
         if (newSize < oldSize) page.subList(insertPos, ePos).clear();
        
        
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
         // CASE THREE:
         // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
         //
         // The new Vector is BIGGER than the old sub-range
         // There are still more nodes to insert.
        
         else page.addAll(ePos, newNodes.subList(i, newSize));
        
      • pollRange

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Vector<HTMLNodepollRange​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Java's java.util.Vector class does not allow public access to the removeRange(start, end) function. It is listed as 'protected' in Java's Documentation about the class Vector. This method upstages that, and performs the 'Poll' operation, where the nodes are first removed, stored, and then return as a function result.

        Poll a Range:
        The nodes that are removed are placed in a separate return Vector, and returned as a result to this method.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A complete list (Vector<HTMLNode>) of the nodes that were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        Util.Remove.range(Vector, int, int), Util.Remove.range(Vector, DotPair), pollRange(Vector, DotPair)
        Code:
        Exact Method Body:
         // The original version of this method is preserved inside comments at the bottom of this
         // method.  Prior to seeing the Sun-Oracle Docs explaining that the return from the SubList
         // operation "mirrors changes" back to to the original vector, the code in the comments is
         // how this method was accomplished.
        
         LV                          l       = new LV(html, sPos, ePos);
         Vector<HTMLNode>            ret     = new Vector<HTMLNode>(l.end - l.start);
         List<? extends HTMLNode>    list    = html.subList(l.start, l.end);
        
         // Copy the Nodes into the return Vector that the end-user receives
         ret.addAll(list);
        
         // Clear the nodes out of the original Vector.  The Sun-Oracle Docs 
         // state that the returned sub-list is "mirrored back into" the original
        
         list.clear();
        
         // Return the Vector to the user.  Note that the List<HTMLNode> CANNOT be returned,
         // because of it's mirror-qualities, and because this method expects a vector.
        
         return ret;
        
         /*
         // BEFORE READING ABOUT Vector.subList(...), this is how this was accomplished:
         // NOTE: It isn't so clear how the List<HTMLNode> works - likely it doesn't actually
         //       create any new memory-allocated arrays, it is just an "overlay"
        
         // Copy the elements from the input vector into the return vector
         for (int i=l.start; i < l.end; i++) ret.add(html.elementAt(i));
        
         // Remove the range from the input vector (this is the meaning of 'poll')
         Util.removeRange(html, sPos, ePos);
        
         return ret;
         */
        
      • split

        🡅     🗕  🗗  🗖
        public static java.util.Vector<HTMLNodesplit​
                    (java.util.Vector<? extends HTMLNode> html,
                     int pos)
        
        This removes every element from the Vector beginning at position 0, all the way to position 'pos' (exclusive). The elementAt(pos) remains in the original page input-Vector. This is the definition of 'exclusive'.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        pos - Any position within the range of the input Vector.
        Returns:
        The elements in the Vector from position: 0 ('zero') all the way to position: 'pos'
        Code:
        Exact Method Body:
         return pollRange(html, 0, pos);