Package Torello.HTML
Class Util
- java.lang.Object
-
- Torello.HTML.Util
-
public class Util extends java.lang.Object
A long list of utilities for searching, finding, extracting and removing HTML from Vectorized-HTML.
This is a list of some of the common "helper routines" that I occasionally need. There are not in any particular order. Almost all of these routines are used internally, either in the NodeSearch search-loops and iterators, or else they are found in parts of package "Tools." The possibility to expand classes like this is probably "boundless" - however, keep in mind that classes likepublic class 'SubSection'and alsopublic class 'NodeIndex'and both of its sub-classespublic class 'TagNodeIndex'and'TextNodeIndex'make some of the short, for-loop-driven, helper-routines seems a little spurious.
The most complicated and easy-to-make-mistakes are the for-loops & iterators of the node-search package. With these solidly tested for over a year, the helper routines that build those for-loops are included in this class here. Extending more utility and modification tools for vectorized-html pages might be the subject of future development work, but easily the most complicated stuff - search and iterate - have been handled. The methods here might be useful, but it is not a "precise science" on what is a usable class, and what is not. Please remember that the methods ending in "OPT" (meaning optimized) really just mean that a couple of the exception throw checks are not there, because those do not need to be repeated on each iteration of a node-search search-for-loop when the for-loop criteria are specified in the method-signature, and (hopefully, obviously) do not need to be checked on each loop iteration.
Hi-Lited Source-Code:- View Here: Torello/HTML/Util.java
- Open New Browser-Tab: Torello/HTML/Util.java
File Size: 92,666 Bytes Line Count: 2,160 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctionalAnnotation may also be called 'The Spaghetti Report'.Static-Functionalclasses are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@StatelessAnnotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 36 Method(s), 36 declared static
- 0 Field(s)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class static classUtil.Countstatic classUtil.Inclusivestatic classUtil.Remove
-
Method Summary
Convert Vectorized-HTML to a String Modifier and Type Method static StringpageToString(Vector<? extends HTMLNode> html)static StringrangeToString(Vector<? extends HTMLNode> html, int sPos, int ePos)static StringrangeToString(Vector<? extends HTMLNode> html, DotPair dp)Compact Multiple, Contiguous TextNodes to one TextNode Modifier and Type Method static intcompactTextNodes(Vector<HTMLNode> html)static intcompactTextNodes(Vector<HTMLNode> html, int sPos, int ePos)static intcompactTextNodes(Vector<HTMLNode> html, DotPair dp)Convert all TextNode's to a Single-String Modifier and Type Method static StringtextNodesString(Vector<? extends HTMLNode> html)static StringtextNodesString(Vector<? extends HTMLNode> html, int sPos, int ePos)static StringtextNodesString(Vector<? extends HTMLNode> html, DotPair dp)Invoke String.trim() on all TextNode instances Modifier and Type Method static inttrimTextNodes(Vector<HTMLNode> page, boolean deleteZeroLengthStrings)static inttrimTextNodes(Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)static inttrimTextNodes(Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)Replace 'escapable' Text, with HTML Escape-Strings Modifier and Type Method static intescapeTextNodes(Vector<HTMLNode> html)static intescapeTextNodes(Vector<HTMLNode> html, int sPos, int ePos)static intescapeTextNodes(Vector<HTMLNode> html, DotPair dp)Total String.length() for all HTMLNode.str Modifier and Type Method static intstrLength(Vector<? extends HTMLNode> html)static intstrLength(Vector<? extends HTMLNode> html, int sPos, int ePos)static intstrLength(Vector<? extends HTMLNode> html, DotPair dp)Total String.length() for all TextNode.str Modifier and Type Method static inttextStrLength(Vector<? extends HTMLNode> html)static inttextStrLength(Vector<? extends HTMLNode> html, int sPos, int ePos)static inttextStrLength(Vector<? extends HTMLNode> html, DotPair dp)Retrieve In-Line JSON Script Modifier and Type Method static Stream<String>getJSONScriptBlocks(Vector<HTMLNode> html)static Stream<String>getJSONScriptBlocks(Vector<HTMLNode> html, int sPos, int ePos)static Stream<String>getJSONScriptBlocks(Vector<HTMLNode> html, DotPair dp)java.util.Vector Improvements: Clone Elements Modifier and Type Method static Vector<HTMLNode>clone(Vector<? extends HTMLNode> html)static Vector<HTMLNode>cloneRange(Vector<? extends HTMLNode> html, int sPos, int ePos)static Vector<HTMLNode>cloneRange(Vector<? extends HTMLNode> html, DotPair dp)java.util.Vector Improvements: Insert Elements Modifier and Type Method static voidinsertNodes(Vector<HTMLNode> html, int pos, HTMLNode... nodes)java.util.Vector Improvements: Poll (Remove & Return) Elements Modifier and Type Method static Vector<HTMLNode>pollRange(Vector<? extends HTMLNode> html, int sPos, int ePos)static Vector<HTMLNode>pollRange(Vector<? extends HTMLNode> html, DotPair dp)java.util.Vector Improvements: Replace Elements Modifier and Type Method static voidreplaceRange(Vector<HTMLNode> page, int sPos, int ePos, Vector<HTMLNode> newNodes)static voidreplaceRange(Vector<HTMLNode> page, DotPair range, Vector<HTMLNode> newNodes)Hash Code Modifier and Type Method static inthashCode(Vector<? extends HTMLNode> html)static inthashCode(Vector<? extends HTMLNode> html, int sPos, int ePos)static inthashCode(Vector<? extends HTMLNode> html, DotPair dp)More Functions Modifier and Type Method static Vector<HTMLNode>split(Vector<? extends HTMLNode> html, int pos)
-
-
-
Method Detail
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, boolean deleteZeroLengthStrings)
- Code:
- Exact Method Body:
return trimTextNodes(page, 0, -1, deleteZeroLengthStrings);
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)
- Code:
- Exact Method Body:
return trimTextNodes(page, dp.start, dp.end + 1, deleteZeroLengthStrings);
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)
This will iterate through the entireVector<HTMLNode>, and invokejava.lang.String.trim()on eachTextNodeon the page. If this invocation results in a reduction ofString.length(), then a newTextNodewill be instantiated whoseTextNode.strfield is set to the result of theString.trim(old_node.str)operation.- Parameters:
deleteZeroLengthStrings- If aTextNode'slength is zero (before or aftertrim()is called) and when this parameter isTRUE, thatTextNodemust be removed from theVector.- Returns:
- Any node that is trimmed or deleted will increment the counter. This counter final-value is returned
- Code:
- Exact Method Body:
int counter = 0; IntStream.Builder b = deleteZeroLengthStrings ? IntStream.builder() : null; HTMLNode n = null; LV l = new LV(page, sPos, ePos); for (int i=l.start; i < l.end; i++) if ((n = page.elementAt(i)).isTextNode()) { String trimmed = n.str.trim(); int trimmedLength = trimmed.length(); if ((trimmedLength == 0) && deleteZeroLengthStrings) { b.add(i); counter++; } else if (trimmedLength < n.str.length()) { page.setElementAt(new TextNode(trimmed), i); counter++; } } if (deleteZeroLengthStrings) Util.Remove.nodesOPT(page, b.build().toArray()); return counter;
-
pageToString
public static java.lang.String pageToString (java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return rangeToString(html, 0, -1);
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return rangeToString(html, dp.start, dp.end + 1);
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
The purpose of this method/function is to convert a portion of the contents of an HTML-Page, currently being represented as aVectorofHTMLNode'sinto aString.Two'int'parameters are provided in this method's signature to define a sub-list of a page to be converted to ajava.lang.String- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The
Vectorconverted into aString. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
pageToString(Vector),rangeToString(Vector, DotPair)- Code:
- Exact Method Body:
StringBuilder ret = new StringBuilder(); LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret.append(html.elementAt(i).str); return ret.toString();
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return textNodesString(html, 0, -1);
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return textNodesString(html, dp.start, dp.end + 1);
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This will return aStringthat is comprised of ONLY theTextNode'scontained within the inputVector- and furthermore, only nodes that are situated between indexint 'sPos'and indexint 'ePos'in thatVector.
Thefor-loopthat iterates the input-Vectorparameter will simply skip an instance of'TagNode'and'CommentNode'when building the output returnString..- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- This will return a
Stringthat is comprised of the text-only elements in the web-page or sub-page. Only text between the requestedVector-indices is included. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
textNodesString(Vector, DotPair),textNodesString(Vector)- Code:
- Exact Method Body:
StringBuilder sb = new StringBuilder(); LV l = new LV(html, sPos, ePos); HTMLNode n; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sb.append(n.str); return sb.toString();
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return escapeTextNodes(html, 0, -1);
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return escapeTextNodes(html, dp.start, dp.end + 1);
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Will callHTML.Escape.replaceAllon eachTextNodein the range ofsPos ... ePos- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of
TextNode'sthat changed as a result of theEscape.replaceAll(n.str)loop. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
Escape.replaceAll(String)- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); HTMLNode n = null; String s = null; int counter = 0; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) if (! (s = Escape.replace(n.str)).equals(n.str)) { html.setElementAt(new TextNode(s), i); counter++; } return counter;
-
clone
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return cloneRange(html, dp.start, dp.end + 1);
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Copies (clones!) a sub-range of the HTML page, stores the results in aVector, and returns it.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The "cloned" (copied) sub-range specified by
'sPos'and'ePos'. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
cloneRange(Vector, DotPair)- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<>(l.size()); // Copy the range specified into the return vector // // HOW THIS WAS DONE BEFORE NOTICING Vector.subList // // for (int i = l.start; i < l.end; i++) ret.addElement(html.elementAt(i)); ret.addAll(html.subList(l.start, l.end)); return ret;
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return textStrLength(html, dp.start, dp.end + 1);
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return textStrLength(html, 0, -1);
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will return the length of the strings contained by all/only instances of'TextNode'among the nodes of the input HTML-Vector. This is identical to the behavior of the method with the same name, but includes starting and ending bounds on the htmlVector:'sPos'&'ePos'.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The sum of the lengths of the text contained by text-nodes in the
Vectorbetween'sPos'and'ePos'. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- Code:
- Exact Method Body:
HTMLNode n; int sum = 0; LV l = new LV(html, sPos, ePos); // Counts the length of each "String" in a "TextNode" between sPos and ePos for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sum += n.str.length(); return sum;
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return compactTextNodes(html, 0, html.size());
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return compactTextNodes(html, dp.start, dp.end + 1);
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Occasionally, when removing instances ofTagNodefrom a vectorized-html page, certain instances ofTextNodewhich were not adjacent / neighbours in theVector, all of a sudden become adjacent. Although there are no major problems with contiguous instances ofTextNodefrom the Search Algorithm's perspective, for programmer's, it can sometimes be befuddling to realize that the output text that is returned from a call toUtil.pageToString(html)is not being found because the text that is left is broken amongst multiple instances of adjacent TextNodes.
This method merely combines "Adjacent" instances ofclass TextNodein theVectorinto single instances ofclass TextNode- Parameters:
html- Any vectorized-html web-page. If this page contain any contiguously placedTextNode's, the extra's will be eliminated, and the internal-string's inside the node's (TextNode.str) will be combined. This action will reduce the size of the actual html-Vector.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of nodes that were eliminated after being combined, or 0 if there were no text-nodes that were removed.
- Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
HTMLNode.str,TextNode- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); boolean compacting = false; int firstPos = -1; int delta = 0; for (int i=l.start; i < (l.end - delta); i++) if (html.elementAt(i).isTextNode()) { if (compacting) continue; // Not in "Compacting Mode" compacting = true; // Start "Compacting Mode" - this is a TextNode firstPos = i; } else if (compacting && (firstPos < (i-1))) // Else - Must be a TagNode or CommentNode { // Save compacted TextNode String's into this StringBuilder StringBuilder compacted = new StringBuilder(); // Iterate all TextNodes that were adjacent, put them together into StringBuilder for (int j=firstPos; j < i; j++) compacted.append(html.elementAt(j).str); // Place this new "aggregate TextNode" at location of the first TextNode that // was compacted into this StringBuilder html.setElementAt(new TextNode(compacted.toString()), firstPos); // Remove the rest of the positions in the Vector that had TextNode's. These have // all been put together into the "Aggregate TextNode" at position "firstPos" Util.Remove.range(html, firstPos + 1, i); // The change in the size of the Vector needs to be accounted for. delta += (i - firstPos - 1); // Change the loop-counter variable, too, since the size of the Vector has changed. i = firstPos + 1; // Since we just hit a CommentNode, or TagNode, exit "Compacting Mode." compacting = false; } // NOTE: This, ALSO, MUST BE a TagNode or CommentNode (just like the previous // if-else branch !) // TRICKY: Don't forget this 'else' ! else compacting = false; // Added - Don't forget the case where the Vector ends with a series of TextNodes // TRICKY TOO! (Same as the HTML Parser... The ending or 'trailing' nodes must be parsed int lastNodePos = html.size() - 1; if (html.elementAt(lastNodePos).isTextNode()) if (compacting && (firstPos < lastNodePos)) { StringBuilder compacted = new StringBuilder(); // Compact the TextNodes that were identified at the end of the Vector range. for (int j=firstPos; j <= lastNodePos; j++) compacted.append(html.elementAt(j).str); // Replace the group of TextNode's at the end of the Vector, with the single, aggregate html.setElementAt(new TextNode(compacted.toString()), firstPos); Util.Remove.range(html, firstPos + 1, lastNodePos + 1); } return delta;
-
strLength
-
strLength
-
strLength
public static int strLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method simply adds / sums theString-length of everyHTMLNode.strfield in the passed page-Vector. It only counts nodes between parameterssPos(inclusive) andePos(exclusive).- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The total length - in characters - of the sub-page of HTML between
'sPos'and'ePos' - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
strLength(Vector)- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret += html.elementAt(i).str.length(); return ret;
-
hashCode
-
hashCode
-
hashCode
public static int hashCode(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Generates a hash-code for a vectorized html page-Vector.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- Returns the
String.hashCode()of the partial HTML-page as if it were not being stored as aVector, but rather as HTML inside of a Java-String. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
hashCode(Vector)- Code:
- Exact Method Body:
int h = 0; LV lv = new LV(html, sPos, ePos); for (int j=lv.start; j < lv.end; j++) { String s = html.elementAt(j).str; int l = s.length(); // This line has been copied from the jdk8/jdk8 "String.hashCode()" method. // The difference is that it iterates over the entire vector for (int i=0; i < l; i++) h = 31 * h + s.charAt(i); } return h;
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return getJSONScriptBlocks(html, 0, -1);
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return getJSONScriptBlocks(html, dp.start, dp.end + 1);
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, int sPos, int ePos)
This method shall search for any and all<SCRIPT TYPE="json">JSON TEXT</SCRIPT>block present in a range of Vectorized HTML. The search method shall simply look for the toke"JSON"in theTYPEattribute of each and every<SCRIPT> TagNodethat is found on the page. The validity of theJSONfound within such blocks is not checked for validity, nor is it even guaranteed to beJSONdata!- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- This will return a
java.util.stream.Stream<String>of each of theJSONelements present in the specified range of the Vectorized HTML passed to parameter'html'.Conversion-Target Stream-Method Invocation String[]Stream.toArray(String[]::new);List<String>Stream.collect(Collectors.toList());Vector<String>Stream.collect(Collectors.toCollection(Vector::new));TreeSet<String>Stream.collect(Collectors.toCollection(TreeSet::new));Iterator<String>Stream.iterator(); - See Also:
StrTokCmpr.containsIgnoreCase(String, Predicate, String),rangeToString(Vector, int, int)- Code:
- Exact Method Body:
// Whenever building lists, it is usually easiest to use a Stream.Builder Stream.Builder<String> b = Stream.builder(); // This Predicate simply tests that if the substring "json" (CASE INSENSITIVE) is found // in the TYPE attribute of a <SCRIPT TYPE=...> node, that the token-string is, indeed a // word - not a substring of some other word. For instance: TYPE="json" would PASS, but // TYPE="rajsong" would FAIL - because the token string is not surrounded by white-space final Predicate<String> tester = (String s) -> StrTokCmpr.containsIgnoreCase (s, (Character c) -> ! Character.isLetterOrDigit(c), "json"); // Find all <SCRIPT> node-blocks whose "TYPE" attribute abides by the tester // String-Predicate named above. Vector<DotPair> jsonDPList = InnerTagFindInclusive.all (html, sPos, ePos, "script", "type", tester); // Convert each of these DotPair element into a java.lang.String // Add the String to the Stream.Builder<String> for (DotPair jsonDP : jsonDPList) if (jsonDP.size() > 2) b.accept(Util.rangeToString(html, jsonDP.start + 1, jsonDP.end)); // Build the Stream, and return it. return b.build();
-
insertNodes
public static void insertNodes(java.util.Vector<HTMLNode> html, int pos, HTMLNode... nodes)
Inserts nodes, and allows a 'varargs' parameter.- Parameters:
html- Any HTML Pagepos- The position in the originalVectorwhere the nodes shall be inserted.nodes- A list of nodes to insert.- Code:
- Exact Method Body:
Vector<HTMLNode> nodesVec = new Vector<>(nodes.length); for (HTMLNode node : nodes) nodesVec.addElement(node); html.addAll(pos, nodesVec);
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, DotPair range, java.util.Vector<HTMLNode> newNodes)
- Code:
- Exact Method Body:
replaceRange(page, range.start, range.end+1, newNodes);
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.util.Vector<HTMLNode> newNodes)
Replaces any all and allHTMLNode'slocated between theVectorlocations'sPos'(inclusive) and'ePos'(exclusive). By exclusive, this means that theHTMLNodelocated at positon'ePos'will not be replaced, but the one at'sPos'is replaced.
The size of theVectorwill change bynewNodes.size() - (ePos + sPos). The contents situated betweenVectorlocationsPosandsPos + newNodes.size()will, indeed, be the contents of the'newNodes'parameter.- Parameters:
page- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.newNodes- Any Java HTML page-VectorofHTMLNode.- Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
pollRange(Vector, int, int),Util.Remove.range(Vector, int, int),replaceRange(Vector, DotPair, Vector)- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); int oldSize = ePos - sPos; int newSize = newNodes.size(); int insertPos = sPos; int i = 0; while ((i < newSize) && (i < oldSize)) page.setElementAt(newNodes.elementAt(i++), insertPos++); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE ONE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** if (newSize == oldSize) return; // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE TWO: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is SMALLER than the old sub-range // The rest of the nodes just need to be trashed // // OLD-WAY: (Before realizing what Vector.subList is actually doing) // Util.removeRange(page, insertPos, ePos); if (newSize < oldSize) page.subList(insertPos, ePos).clear(); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE THREE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is BIGGER than the old sub-range // There are still more nodes to insert. else page.addAll(ePos, newNodes.subList(i, newSize));
-
pollRange
public static java.util.Vector<HTMLNode> pollRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Java'sjava.util.Vectorclass does not allow public access to theremoveRange(start, end)function. It is listed as'protected'in Java's Documentation about theclass Vector.This method upstages that, and performs the'Poll'operation, where the nodes are first removed, stored, and then return as a function result.
Poll a Range:
The nodes that are removed are placed in a separate returnVector, and returned as a result to this method.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- A complete list (
Vector<HTMLNode>) of the nodes that were removed. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
Util.Remove.range(Vector, int, int),Util.Remove.range(Vector, DotPair),pollRange(Vector, DotPair)- Code:
- Exact Method Body:
// The original version of this method is preserved inside comments at the bottom of this // method. Prior to seeing the Sun-Oracle Docs explaining that the return from the SubList // operation "mirrors changes" back to to the original vector, the code in the comments is // how this method was accomplished. LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<HTMLNode>(l.end - l.start); List<? extends HTMLNode> list = html.subList(l.start, l.end); // Copy the Nodes into the return Vector that the end-user receives ret.addAll(list); // Clear the nodes out of the original Vector. The Sun-Oracle Docs // state that the returned sub-list is "mirrored back into" the original list.clear(); // Return the Vector to the user. Note that the List<HTMLNode> CANNOT be returned, // because of it's mirror-qualities, and because this method expects a vector. return ret; /* // BEFORE READING ABOUT Vector.subList(...), this is how this was accomplished: // NOTE: It isn't so clear how the List<HTMLNode> works - likely it doesn't actually // create any new memory-allocated arrays, it is just an "overlay" // Copy the elements from the input vector into the return vector for (int i=l.start; i < l.end; i++) ret.add(html.elementAt(i)); // Remove the range from the input vector (this is the meaning of 'poll') Util.removeRange(html, sPos, ePos); return ret; */
-
pollRange
-
split
public static java.util.Vector<HTMLNode> split (java.util.Vector<? extends HTMLNode> html, int pos)
This removes every element from theVectorbeginning at position 0, all the way to position'pos'(exclusive). TheelementAt(pos)remains in the original page input-Vector. This is the definition of 'exclusive'.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.pos- Any position within the range of the inputVector.- Returns:
- The elements in the
Vectorfrom position:0 ('zero')all the way to position:'pos' - Code:
- Exact Method Body:
return pollRange(html, 0, pos);
-
-