Package Torello.HTML
Class Util
- java.lang.Object
-
- Torello.HTML.Util
-
public class Util extends java.lang.Object
A long list of utilities for searching, finding, extracting and removing HTML from Vectorized-HTML.
This is a list of some of the common "helper routines" that I occasionally need. There are not in any particular order. Almost all of these routines are used internally, either in the NodeSearch search-loops and iterators, or else they are found in parts of package "Tools." The possibility to expand classes like this is probably "boundless" - however, keep in mind that classes likepublic class 'SubSection'
and alsopublic class 'NodeIndex'
and both of its sub-classespublic class 'TagNodeIndex'
and'TextNodeIndex'
make some of the short, for-loop-driven, helper-routines seems a little spurious.
The most complicated and easy-to-make-mistakes are the for-loops & iterators of the node-search package. With these solidly tested for over a year, the helper routines that build those for-loops are included in this class here. Extending more utility and modification tools for vectorized-html pages might be the subject of future development work, but easily the most complicated stuff - search and iterate - have been handled. The methods here might be useful, but it is not a "precise science" on what is a usable class, and what is not. Please remember that the methods ending in "OPT" (meaning optimized) really just mean that a couple of the exception throw checks are not there, because those do not need to be repeated on each iteration of a node-search search-for-loop when the for-loop criteria are specified in the method-signature, and (hopefully, obviously) do not need to be checked on each loop iteration.
Hi-Lited Source-Code:- View Here: Torello/HTML/Util.java
- Open New Browser-Tab: Torello/HTML/Util.java
File Size: 88,298 Bytes Line Count: 2,066 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 36 Method(s), 36 declared static
- 0 Field(s)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class static class
Util.Count
static class
Util.Inclusive
static class
Util.Remove
-
Method Summary
Convert Vectorized-HTML to a String Modifier and Type Method static String
pageToString(Vector<? extends HTMLNode> html)
static String
rangeToString(Vector<? extends HTMLNode> html, int sPos, int ePos)
static String
rangeToString(Vector<? extends HTMLNode> html, DotPair dp)
Compact Multiple, Contiguous TextNodes to one TextNode Modifier and Type Method static int
compactTextNodes(Vector<HTMLNode> html)
static int
compactTextNodes(Vector<HTMLNode> html, int sPos, int ePos)
static int
compactTextNodes(Vector<HTMLNode> html, DotPair dp)
Convert all TextNode's to a Single-String Modifier and Type Method static String
textNodesString(Vector<? extends HTMLNode> html)
static String
textNodesString(Vector<? extends HTMLNode> html, int sPos, int ePos)
static String
textNodesString(Vector<? extends HTMLNode> html, DotPair dp)
Invoke String.trim() on all TextNode instances Modifier and Type Method static int
trimTextNodes(Vector<HTMLNode> page, boolean deleteZeroLengthStrings)
static int
trimTextNodes(Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)
static int
trimTextNodes(Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)
Replace 'escapable' Text, with HTML Escape-Strings Modifier and Type Method static int
escapeTextNodes(Vector<HTMLNode> html)
static int
escapeTextNodes(Vector<HTMLNode> html, int sPos, int ePos)
static int
escapeTextNodes(Vector<HTMLNode> html, DotPair dp)
Total String.length() for all HTMLNode.str Modifier and Type Method static int
strLength(Vector<? extends HTMLNode> html)
static int
strLength(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
strLength(Vector<? extends HTMLNode> html, DotPair dp)
Total String.length() for all TextNode.str Modifier and Type Method static int
textStrLength(Vector<? extends HTMLNode> html)
static int
textStrLength(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
textStrLength(Vector<? extends HTMLNode> html, DotPair dp)
Retrieve In-Line JSON Script Modifier and Type Method static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html)
static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html, int sPos, int ePos)
static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html, DotPair dp)
java.util.Vector Improvements: Clone Elements Modifier and Type Method static Vector<HTMLNode>
clone(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
cloneRange(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
cloneRange(Vector<? extends HTMLNode> html, DotPair dp)
java.util.Vector Improvements: Insert Elements Modifier and Type Method static void
insertNodes(Vector<HTMLNode> html, int pos, HTMLNode... nodes)
java.util.Vector Improvements: Poll (Remove & Return) Elements Modifier and Type Method static Vector<HTMLNode>
pollRange(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
pollRange(Vector<? extends HTMLNode> html, DotPair dp)
java.util.Vector Improvements: Replace Elements Modifier and Type Method static void
replaceRange(Vector<HTMLNode> page, int sPos, int ePos, Vector<HTMLNode> newNodes)
static void
replaceRange(Vector<HTMLNode> page, DotPair range, Vector<HTMLNode> newNodes)
Hash Code Modifier and Type Method static int
hashCode(Vector<? extends HTMLNode> html)
static int
hashCode(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
hashCode(Vector<? extends HTMLNode> html, DotPair dp)
More Functions Modifier and Type Method static Vector<HTMLNode>
split(Vector<? extends HTMLNode> html, int pos)
-
-
-
Method Detail
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, boolean deleteZeroLengthStrings)
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)
This will iterate through the entireVector<HTMLNode>
, and invokejava.lang.String.trim()
on eachTextNode
on the page. If this invocation results in a reduction ofString.length()
, then a newTextNode
will be instantiated whoseTextNode.str
field is set to the result of theString.trim(old_node.str)
operation.- Parameters:
deleteZeroLengthStrings
- If aTextNode's
length is zero (before or aftertrim()
is called) and when this parameter isTRUE
, thatTextNode
must be removed from theVector
.- Returns:
- Any node that is trimmed or deleted will increment the counter. This counter final-value is returned
- Code:
- Exact Method Body:
int counter = 0; IntStream.Builder b = deleteZeroLengthStrings ? IntStream.builder() : null; HTMLNode n = null; LV l = new LV(page, sPos, ePos); for (int i=l.start; i < l.end; i++) if ((n = page.elementAt(i)).isTextNode()) { String trimmed = n.str.trim(); int trimmedLength = trimmed.length(); if ((trimmedLength == 0) && deleteZeroLengthStrings) { b.add(i); counter++; } else if (trimmedLength < n.str.length()) { page.setElementAt(new TextNode(trimmed), i); counter++; } } if (deleteZeroLengthStrings) Util.Remove.nodesOPT(page, b.build().toArray()); return counter;
-
pageToString
public static java.lang.String pageToString (java.util.Vector<? extends HTMLNode> html)
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
The purpose of this method/function is to convert a portion of the contents of an HTML-Page, currently being represented as aVector
ofHTMLNode's
into aString.
Two'int'
parameters are provided in this method's signature to define a sub-list of a page to be converted to ajava.lang.String
- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The
Vector
converted into aString
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
pageToString(Vector)
,rangeToString(Vector, DotPair)
- Code:
- Exact Method Body:
StringBuilder ret = new StringBuilder(); LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret.append(html.elementAt(i).str); return ret.toString();
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html)
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This will return aString
that is comprised of ONLY theTextNode's
contained within the inputVector
- and furthermore, only nodes that are situated between indexint 'sPos'
and indexint 'ePos'
in thatVector.
Thefor-loop
that iterates the input-Vector
parameter will simply skip an instance of'TagNode'
and'CommentNode'
when building the output returnString.
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- This will return a
String
that is comprised of the text-only elements in the web-page or sub-page. Only text between the requestedVector
-indices is included. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
textNodesString(Vector, DotPair)
,textNodesString(Vector)
- Code:
- Exact Method Body:
StringBuilder sb = new StringBuilder(); LV l = new LV(html, sPos, ePos); HTMLNode n; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sb.append(n.str); return sb.toString();
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html)
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Will callHTML.Escape.replaceAll
on eachTextNode
in the range ofsPos ... ePos
- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TextNode's
that changed as a result of theEscape.replaceAll(n.str)
loop. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
Escape.replaceAll(String)
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); HTMLNode n = null; String s = null; int counter = 0; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) if (! (s = Escape.replace(n.str)).equals(n.str)) { html.setElementAt(new TextNode(s), i); counter++; } return counter;
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, DotPair dp)
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Copies (clones!) a sub-range of the HTML page, stores the results in aVector
, and returns it.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The "cloned" (copied) sub-range specified by
'sPos'
and'ePos'.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
cloneRange(Vector, DotPair)
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<>(l.size()); // Copy the range specified into the return vector // // HOW THIS WAS DONE BEFORE NOTICING Vector.subList // // for (int i = l.start; i < l.end; i++) ret.addElement(html.elementAt(i)); ret.addAll(html.subList(l.start, l.end)); return ret;
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, DotPair dp)
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html)
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will return the length of the strings contained by all/only instances of'TextNode'
among the nodes of the input HTML-Vector
. This is identical to the behavior of the method with the same name, but includes starting and ending bounds on the htmlVector
:'sPos'
&'ePos'
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The sum of the lengths of the text contained by text-nodes in the
Vector
between'sPos'
and'ePos'
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
HTMLNode n; int sum = 0; LV l = new LV(html, sPos, ePos); // Counts the length of each "String" in a "TextNode" between sPos and ePos for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sum += n.str.length(); return sum;
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html)
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Occasionally, when removing instances ofTagNode
from a vectorized-html page, certain instances ofTextNode
which were not adjacent / neighbours in theVector
, all of a sudden become adjacent. Although there are no major problems with contiguous instances ofTextNode
from the Search Algorithm's perspective, for programmer's, it can sometimes be befuddling to realize that the output text that is returned from a call toUtil.pageToString(html)
is not being found because the text that is left is broken amongst multiple instances of adjacent TextNodes.
This method merely combines "Adjacent" instances ofclass TextNode
in theVector
into single instances ofclass TextNode
- Parameters:
html
- Any vectorized-html web-page. If this page contain any contiguously placedTextNode's
, the extra's will be eliminated, and the internal-string's inside the node's (TextNode.str
) will be combined. This action will reduce the size of the actual html-Vector
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of nodes that were eliminated after being combined, or 0 if there were no text-nodes that were removed.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
HTMLNode.str
,TextNode
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); boolean compacting = false; int firstPos = -1; int delta = 0; for (int i=l.start; i < (l.end - delta); i++) if (html.elementAt(i).isTextNode()) { if (compacting) continue; // Not in "Compacting Mode" compacting = true; // Start "Compacting Mode" - this is a TextNode firstPos = i; } else if (compacting && (firstPos < (i-1))) // Else - Must be a TagNode or CommentNode { // Save compacted TextNode String's into this StringBuilder StringBuilder compacted = new StringBuilder(); // Iterate all TextNodes that were adjacent, put them together into StringBuilder for (int j=firstPos; j < i; j++) compacted.append(html.elementAt(j).str); // Place this new "aggregate TextNode" at location of the first TextNode that // was compacted into this StringBuilder html.setElementAt(new TextNode(compacted.toString()), firstPos); // Remove the rest of the positions in the Vector that had TextNode's. These have // all been put together into the "Aggregate TextNode" at position "firstPos" Util.Remove.range(html, firstPos + 1, i); // The change in the size of the Vector needs to be accounted for. delta += (i - firstPos - 1); // Change the loop-counter variable, too, since the size of the Vector has changed. i = firstPos + 1; // Since we just hit a CommentNode, or TagNode, exit "Compacting Mode." compacting = false; } // NOTE: This, ALSO, MUST BE a TagNode or CommentNode (just like the previous // if-else branch !) // TRICKY: Don't forget this 'else' ! else compacting = false; // Added - Don't forget the case where the Vector ends with a series of TextNodes // TRICKY TOO! (Same as the HTML Parser... The ending or 'trailing' nodes must be parsed int lastNodePos = html.size() - 1; if (html.elementAt(lastNodePos).isTextNode()) if (compacting && (firstPos < lastNodePos)) { StringBuilder compacted = new StringBuilder(); // Compact the TextNodes that were identified at the end of the Vector range. for (int j=firstPos; j <= lastNodePos; j++) compacted.append(html.elementAt(j).str); // Replace the group of TextNode's at the end of the Vector, with the single, aggregate html.setElementAt(new TextNode(compacted.toString()), firstPos); Util.Remove.range(html, firstPos + 1, lastNodePos + 1); } return delta;
-
strLength
public static int strLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method simply adds / sums theString
-length of everyHTMLNode.str
field in the passed page-Vector
. It only counts nodes between parameterssPos
(inclusive) andePos
(exclusive).- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The total length - in characters - of the sub-page of HTML between
'sPos'
and'ePos'
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
strLength(Vector)
- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret += html.elementAt(i).str.length(); return ret;
-
hashCode
public static int hashCode(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Generates a hash-code for a vectorized html page-Vector
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- Returns the
String.hashCode()
of the partial HTML-page as if it were not being stored as aVector
, but rather as HTML inside of a Java-String
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
hashCode(Vector)
- Code:
- Exact Method Body:
int h = 0; LV lv = new LV(html, sPos, ePos); for (int j=lv.start; j < lv.end; j++) { String s = html.elementAt(j).str; int l = s.length(); // This line has been copied from the jdk8/jdk8 "String.hashCode()" method. // The difference is that it iterates over the entire vector for (int i=0; i < l; i++) h = 31 * h + s.charAt(i); } return h;
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html)
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, DotPair dp)
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, int sPos, int ePos)
This method shall search for any and all<SCRIPT TYPE="json">
JSON TEXT</SCRIPT>
block present in a range of Vectorized HTML. The search method shall simply look for the toke"JSON"
in theTYPE
attribute of each and every<SCRIPT> TagNode
that is found on the page. The validity of theJSON
found within such blocks is not checked for validity, nor is it even guaranteed to beJSON
data!- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- This will return a
java.util.stream.Stream<String>
of each of theJSON
elements present in the specified range of the Vectorized HTML passed to parameter'html'
.Conversion-Target Stream-Method Invocation String[]
Stream.toArray(String[]::new);
List<String>
Stream.collect(Collectors.toList());
Vector<String>
Stream.collect(Collectors.toCollection(Vector::new));
TreeSet<String>
Stream.collect(Collectors.toCollection(TreeSet::new));
Iterator<String>
Stream.iterator();
- See Also:
StrTokCmpr.containsIgnoreCase(String, Predicate, String)
,rangeToString(Vector, int, int)
- Code:
- Exact Method Body:
// Whenever building lists, it is usually easiest to use a Stream.Builder Stream.Builder<String> b = Stream.builder(); // This Predicate simply tests that if the substring "json" (CASE INSENSITIVE) is found // in the TYPE attribute of a <SCRIPT TYPE=...> node, that the token-string is, indeed a // word - not a substring of some other word. For instance: TYPE="json" would PASS, but // TYPE="rajsong" would FAIL - because the token string is not surrounded by white-space final Predicate<String> tester = (String s) -> StrTokCmpr.containsIgnoreCase (s, (Character c) -> ! Character.isLetterOrDigit(c), "json"); // Find all <SCRIPT> node-blocks whose "TYPE" attribute abides by the tester // String-Predicate named above. Vector<DotPair> jsonDPList = InnerTagFindInclusive.all (html, sPos, ePos, "script", "type", tester); // Convert each of these DotPair element into a java.lang.String // Add the String to the Stream.Builder<String> for (DotPair jsonDP : jsonDPList) if (jsonDP.size() > 2) b.accept(Util.rangeToString(html, jsonDP.start + 1, jsonDP.end)); // Build the Stream, and return it. return b.build();
-
insertNodes
public static void insertNodes(java.util.Vector<HTMLNode> html, int pos, HTMLNode... nodes)
Inserts nodes, and allows a 'varargs' parameter.- Parameters:
html
- Any HTML Pagepos
- The position in the originalVector
where the nodes shall be inserted.nodes
- A list of nodes to insert.- Code:
- Exact Method Body:
Vector<HTMLNode> nodesVec = new Vector<>(nodes.length); for (HTMLNode node : nodes) nodesVec.addElement(node); html.addAll(pos, nodesVec);
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, DotPair range, java.util.Vector<HTMLNode> newNodes)
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.util.Vector<HTMLNode> newNodes)
Replaces any all and allHTMLNode's
located between theVector
locations'sPos'
(inclusive) and'ePos'
(exclusive). By exclusive, this means that theHTMLNode
located at positon'ePos'
will not be replaced, but the one at'sPos'
is replaced.
The size of theVector
will change bynewNodes.size() - (ePos + sPos)
. The contents situated betweenVector
locationsPos
andsPos + newNodes.size()
will, indeed, be the contents of the'newNodes'
parameter.- Parameters:
page
- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.newNodes
- Any Java HTML page-Vector
ofHTMLNode
.- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
pollRange(Vector, int, int)
,Util.Remove.range(Vector, int, int)
,replaceRange(Vector, DotPair, Vector)
- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); int oldSize = ePos - sPos; int newSize = newNodes.size(); int insertPos = sPos; int i = 0; while ((i < newSize) && (i < oldSize)) page.setElementAt(newNodes.elementAt(i++), insertPos++); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE ONE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** if (newSize == oldSize) return; // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE TWO: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is SMALLER than the old sub-range // The rest of the nodes just need to be trashed // // OLD-WAY: (Before realizing what Vector.subList is actually doing) // Util.removeRange(page, insertPos, ePos); if (newSize < oldSize) page.subList(insertPos, ePos).clear(); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE THREE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is BIGGER than the old sub-range // There are still more nodes to insert. else page.addAll(ePos, newNodes.subList(i, newSize));
-
pollRange
public static java.util.Vector<HTMLNode> pollRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Java'sjava.util.Vector
class does not allow public access to theremoveRange(start, end)
function. It is listed as'protected'
in Java's Documentation about theclass Vector.
This method upstages that, and performs the'Poll'
operation, where the nodes are first removed, stored, and then return as a function result.
Poll a Range:
The nodes that are removed are placed in a separate returnVector
, and returned as a result to this method.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- A complete list (
Vector<HTMLNode>
) of the nodes that were removed. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
Util.Remove.range(Vector, int, int)
,Util.Remove.range(Vector, DotPair)
,pollRange(Vector, DotPair)
- Code:
- Exact Method Body:
// The original version of this method is preserved inside comments at the bottom of this // method. Prior to seeing the Sun-Oracle Docs explaining that the return from the SubList // operation "mirrors changes" back to to the original vector, the code in the comments is // how this method was accomplished. LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<HTMLNode>(l.end - l.start); List<? extends HTMLNode> list = html.subList(l.start, l.end); // Copy the Nodes into the return Vector that the end-user receives ret.addAll(list); // Clear the nodes out of the original Vector. The Sun-Oracle Docs // state that the returned sub-list is "mirrored back into" the original list.clear(); // Return the Vector to the user. Note that the List<HTMLNode> CANNOT be returned, // because of it's mirror-qualities, and because this method expects a vector. return ret; /* // BEFORE READING ABOUT Vector.subList(...), this is how this was accomplished: // NOTE: It isn't so clear how the List<HTMLNode> works - likely it doesn't actually // create any new memory-allocated arrays, it is just an "overlay" // Copy the elements from the input vector into the return vector for (int i=l.start; i < l.end; i++) ret.add(html.elementAt(i)); // Remove the range from the input vector (this is the meaning of 'poll') Util.removeRange(html, sPos, ePos); return ret; */
-
pollRange
-
split
public static java.util.Vector<HTMLNode> split (java.util.Vector<? extends HTMLNode> html, int pos)
This removes every element from theVector
beginning at position 0, all the way to position'pos'
(exclusive). TheelementAt(pos)
remains in the original page input-Vector
. This is the definition of 'exclusive'.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.pos
- Any position within the range of the inputVector
.- Returns:
- The elements in the
Vector
from position:0 ('zero')
all the way to position:'pos'
- Code:
- Exact Method Body:
return pollRange(html, 0, pos);
-
-