Package Torello.HTML
Class Util
- java.lang.Object
-
- Torello.HTML.Util
-
public class Util extends java.lang.Object
A long list of utilities for searching, finding, extracting and removing HTML from Vectorized-HTML.
This is a list of some of the common "helper routines" that I occasionally need. There are not in any particular order. Almost all of these routines are used internally, either in the NodeSearch search-loops and iterators, or else they are found in parts of package "Tools." The possibility to expand classes like this is probably "boundless" - however, keep in mind that classes likepublic class 'SubSection'
and alsopublic class 'NodeIndex'
and both of its sub-classespublic class 'TagNodeIndex'
and'TextNodeIndex'
make some of the short, for-loop-driven, helper-routines seems a little spurious.
The most complicated and easy-to-make-mistakes are the for-loops & iterators of the node-search package. With these solidly tested for over a year, the helper routines that build those for-loops are included in this class here. Extending more utility and modification tools for vectorized-html pages might be the subject of future development work, but easily the most complicated stuff - search and iterate - have been handled. The methods here might be useful, but it is not a "precise science" on what is a usable class, and what is not. Please remember that the methods ending in "OPT" (meaning optimized) really just mean that a couple of the exception throw checks are not there, because those do not need to be repeated on each iteration of a node-search search-for-loop when the for-loop criteria are specified in the method-signature, and (hopefully, obviously) do not need to be checked on each loop iteration.
Hi-Lited Source-Code:- View Here: Torello/HTML/Util.java
- Open New Browser-Tab: Torello/HTML/Util.java
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member field. It is very similar to the Java-Bean@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 70 Method(s), 70 declared static
- 0 Field(s)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class static class
Util.Inclusive
-
Method Summary
Convert Vectorized-HTML to a String Modifier and Type Method static String
pageToString(Vector<? extends HTMLNode> html)
static String
rangeToString(Vector<? extends HTMLNode> html, int sPos, int ePos)
static String
rangeToString(Vector<? extends HTMLNode> html, DotPair dp)
Compact Multiple, Contiguous TextNodes to one TextNode Modifier and Type Method static int
compactTextNodes(Vector<HTMLNode> html)
static int
compactTextNodes(Vector<HTMLNode> html, int sPos, int ePos)
static int
compactTextNodes(Vector<HTMLNode> html, DotPair dp)
Convert all TextNode's to a Single-String Modifier and Type Method static String
textNodesString(Vector<? extends HTMLNode> html)
static String
textNodesString(Vector<? extends HTMLNode> html, int sPos, int ePos)
static String
textNodesString(Vector<? extends HTMLNode> html, DotPair dp)
Invoke String.trim() on all TextNode instances Modifier and Type Method static int
trimTextNodes(Vector<HTMLNode> page, boolean deleteZeroLengthStrings)
static int
trimTextNodes(Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)
static int
trimTextNodes(Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)
Count CommentNode instances Modifier and Type Method static int
countCommentNodes(Vector<HTMLNode> page)
static int
countCommentNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
countCommentNodes(Vector<HTMLNode> page, DotPair dp)
Count TagNode instances Modifier and Type Method static int
countTagNodes(Vector<HTMLNode> page)
static int
countTagNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
countTagNodes(Vector<HTMLNode> page, DotPair dp)
Count TextNode intances Modifier and Type Method static int
countTextNodes(Vector<HTMLNode> page)
static int
countTextNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
countTextNodes(Vector<HTMLNode> page, DotPair dp)
Remove all CommentNode instances Modifier and Type Method static int
removeAllCommentNodes(Vector<HTMLNode> page)
static int
removeAllCommentNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
removeAllCommentNodes(Vector<HTMLNode> page, DotPair dp)
Remove all TagNode instances Modifier and Type Method static int
removeAllTagNodes(Vector<HTMLNode> page)
static int
removeAllTagNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
removeAllTagNodes(Vector<HTMLNode> page, DotPair dp)
Remove all TextNode instances Modifier and Type Method static int
removeAllTextNodes(Vector<HTMLNode> page)
static int
removeAllTextNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
removeAllTextNodes(Vector<HTMLNode> page, DotPair dp)
Remove all Attributes from all TagNode instances Modifier and Type Method static int
removeAllInnerTags(Vector<? super TagNode> html, int sPos, int ePos)
static int
removeAllInnerTags(Vector<? super TagNode> html, DotPair dp)
static int
removeAllInnerTags(Vector<HTMLNode> html)
Count all New-Lines Modifier and Type Method static int
countNewLines(Vector<? extends HTMLNode> html)
static int
countNewLines(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
countNewLines(Vector<? extends HTMLNode> html, DotPair dp)
Replace 'escapable' Text, with HTML Escape-Strings Modifier and Type Method static int
escapeTextNodes(Vector<HTMLNode> html)
static int
escapeTextNodes(Vector<HTMLNode> html, int sPos, int ePos)
static int
escapeTextNodes(Vector<HTMLNode> html, DotPair dp)
Total String.length() for all HTMLNode.str Modifier and Type Method static int
strLength(Vector<? extends HTMLNode> html)
static int
strLength(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
strLength(Vector<? extends HTMLNode> html, DotPair dp)
Total String.length() for all TextNode.str Modifier and Type Method static int
textStrLength(Vector<? extends HTMLNode> html)
static int
textStrLength(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
textStrLength(Vector<? extends HTMLNode> html, DotPair dp)
Function for Removing 'Empty' Tags Modifier and Type Method static int
removeInclusiveEmpty(Vector<HTMLNode> page, int sPos, int ePos, String... htmlTags)
static int
removeInclusiveEmpty(Vector<HTMLNode> page, String... htmlTags)
static int
removeInclusiveEmpty(Vector<HTMLNode> page, DotPair dp, String... htmlTags)
Retrieve In-Line JSON Script Modifier and Type Method static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html)
static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html, int sPos, int ePos)
static Stream<String>
getJSONScriptBlocks(Vector<HTMLNode> html, DotPair dp)
Remove <SCRIPT> Node Blocks Modifier and Type Method static int
removeScriptNodeBlocks(Vector<? extends HTMLNode> html)
Remove <STYLE> Node Blocks Modifier and Type Method static int
removeStyleNodeBlocks(Vector<? extends HTMLNode> html)
java.util.Vector Improvements: Clone Elements Modifier and Type Method static Vector<HTMLNode>
clone(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
cloneRange(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
cloneRange(Vector<? extends HTMLNode> html, DotPair dp)
java.util.Vector Improvements: Insert Elements Modifier and Type Method static void
insertNodes(Vector<HTMLNode> html, int pos, HTMLNode... nodes)
java.util.Vector Improvements: Remove Elements Modifier and Type Method static <T extends HTMLNode>
voidremoveNodes(boolean preserveInputArray, Vector<T> page, int... nodeList)
static <T extends HTMLNode>
voidremoveNodesOPT(Vector<T> page, int... posArr)
static int
removeRange(Vector<? extends HTMLNode> html, DotPair dp)
static <T extends HTMLNode>
intremoveRange(Vector<T> page, int sPos, int ePos)
java.util.Vector Improvements: Poll (Remove & Return) Elements Modifier and Type Method static Vector<HTMLNode>
pollRange(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
pollRange(Vector<? extends HTMLNode> html, DotPair dp)
java.util.Vector Improvements: Replace Elements Modifier and Type Method static void
replaceRange(Vector<HTMLNode> page, int sPos, int ePos, Vector<HTMLNode> newNodes)
static void
replaceRange(Vector<HTMLNode> page, DotPair range, Vector<HTMLNode> newNodes)
Hash Code Modifier and Type Method static int
hashCode(Vector<? extends HTMLNode> html)
static int
hashCode(Vector<? extends HTMLNode> html, int sPos, int ePos)
static int
hashCode(Vector<? extends HTMLNode> html, DotPair dp)
More Functions Modifier and Type Method static void
removeFirstLast(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
split(Vector<? extends HTMLNode> html, int pos)
-
-
-
Method Detail
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, boolean deleteZeroLengthStrings)
- Code:
- Exact Method Body:
return trimTextNodes(page, 0, -1, deleteZeroLengthStrings);
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, DotPair dp, boolean deleteZeroLengthStrings)
- Code:
- Exact Method Body:
return trimTextNodes(page, dp.start, dp.end + 1, deleteZeroLengthStrings);
-
trimTextNodes
public static int trimTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos, boolean deleteZeroLengthStrings)
This will iterate through the entireVector<HTMLNode>
, and invokejava.lang.String.trim()
on eachTextNode
on the page. If this invocation results in a reduction ofString.length()
, then a newTextNode
will be instantiated whoseTextNode.str
field is set to the result of theString.trim(old_node.str)
operation.- Parameters:
deleteZeroLengthStrings
- If aTextNode's
length is zero (before or aftertrim()
is called) and when this parameter is TRUE, thatTextNode
must be removed from theVector
.- Returns:
- Any node that is trimmed or deleted will increment the counter. This counter final-value is returned
- Code:
- Exact Method Body:
int counter = 0; IntStream.Builder b = deleteZeroLengthStrings ? IntStream.builder() : null; HTMLNode n = null; LV l = new LV(page, sPos, ePos); for (int i=l.start; i < l.end; i++) if ((n = page.elementAt(i)).isTextNode()) { String trimmed = n.str.trim(); int trimmedLength = trimmed.length(); if ((trimmedLength == 0) && deleteZeroLengthStrings) { b.add(i); counter++; } else if (trimmedLength < n.str.length()) { page.setElementAt(new TextNode(trimmed), i); counter++; } } if (deleteZeroLengthStrings) removeNodesOPT(page, b.build().toArray()); return counter;
-
removeInclusiveEmpty
public static int removeInclusiveEmpty(java.util.Vector<HTMLNode> page, java.lang.String... htmlTags)
- Code:
- Exact Method Body:
return removeInclusiveEmpty(page, 0, -1, htmlTags);
-
removeInclusiveEmpty
public static int removeInclusiveEmpty(java.util.Vector<HTMLNode> page, DotPair dp, java.lang.String... htmlTags)
- Code:
- Exact Method Body:
return removeInclusiveEmpty(page, dp.start, dp.end + 1, htmlTags);
-
removeInclusiveEmpty
public static int removeInclusiveEmpty(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.lang.String... htmlTags)
This will do an "Inclusive Search" using the standardclass TagNodeInclusiveIterator
in thepackage NodeSearch
. Then it will inspect the contents of the subsections. Any subsections that do not contain any instances ofHTMLNode
in between them, or any subsections that only contain "blank-text" (white-space) between them shall be removed.
IMPORTANT: The search logic shall perform multiple recursive iterations of itself, such that if, for instance, the user requested that all empty HTML divider (<DIV>
) elements be removed, if after removing a set a dividers resulted in more empty ones (nested<DIV>
elements), then an additional removal shall be called. This recursion shall continue until there are no empty HTML elements of the types listed by'htmlTags'
- Parameters:
page
- Any vectorized-html page or sub-page.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.htmlTags
- The list of inclusive (non-singleton) html elements to search for possibly being empty container tags.- Returns:
- The number of
HTMLNode's
that were removed. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
DotPair subList; int removed = 0; HNLIInclusive iter = TagNodeInclusiveIterator.iter(page, htmlTags); LV l = new LV(page, sPos, ePos); iter.restrictCursor(l); TOP: while (iter.hasNext()) // If there is only the opening & closing pair, with nothing in between, // then the pair must be removed because it is "Empty" (Inclusive Empty) if ((subList = iter.nextDotPair()).size() == 2) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } else { // If there is any TagNode in between the start-end pair, then this is NOT EMPTY // In this case, skip to the next start-end opening-closing pair. for (int i=(subList.start + 1); i < subList.end; i++) if (! page.elementAt(i).isTextNode()) continue TOP; // If there were only TextNode's between an opening-closing TagNode Pair.... // **AND** those TextNode's are only white-space, then this also considered // Inclusively Empty. (Get all TextNode's, and if .trim() reduces the length() // to zero, then it was only white-space. if (Util.textNodesString(page, subList).trim().length() == 0) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } } // This process must be continued recursively, because if any inner, for instance, // <DIV> ... </DIV> was removed, then the outer list must be re-checked... if (removed > 0) return removed + removeInclusiveEmpty(page, sPos, ePos, htmlTags); else return 0;
-
pageToString
public static java.lang.String pageToString (java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return rangeToString(html, 0, -1);
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return rangeToString(html, dp.start, dp.end + 1);
-
rangeToString
public static java.lang.String rangeToString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
The purpose of this method/function is to convert a portion of the contents of an HTML-Page, currently being represented as aVector
ofHTMLNode's
into aString.
Two'int'
parameters are provided in this method's signature to define a sub-list of a page to be converted to ajava.lang.String
- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The
Vector
converted into aString
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
pageToString(Vector)
,rangeToString(Vector, DotPair)
- Code:
- Exact Method Body:
StringBuilder ret = new StringBuilder(); LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret.append(html.elementAt(i).str); return ret.toString();
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return textNodesString(html, 0, -1);
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return textNodesString(html, dp.start, dp.end + 1);
-
textNodesString
public static java.lang.String textNodesString (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This will return aString
that is comprised of ONLY theTextNode's
contained within the inputVector
- and furthermore, only nodes that are situated between indexint 'sPos'
and indexint 'ePos'
in thatVector.
Thefor-loop
that iterates the input-Vector
parameter will simply skip an instance of'TagNode'
and'CommentNode'
when building the output returnString.
.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- This will return a
String
that is comprised of the text-only elements in the web-page or sub-page. Only text between the requestedVector
-indices is included. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
textNodesString(Vector, DotPair)
,textNodesString(Vector)
- Code:
- Exact Method Body:
StringBuilder sb = new StringBuilder(); LV l = new LV(html, sPos, ePos); HTMLNode n; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sb.append(n.str); return sb.toString();
-
removeAllTextNodes
public static int removeAllTextNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return removeAllTextNodes(page, 0, -1);
-
removeAllTextNodes
public static int removeAllTextNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return removeAllTextNodes(page, dp.start, dp.end + 1);
-
removeAllTextNodes
public static int removeAllTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allTextNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
TextNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TextNode
,removeNodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid text-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. removeNodesOPT(page, posArr); return posArr.length;
-
removeAllTagNodes
public static int removeAllTagNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return removeAllTagNodes(page, 0, -1);
-
removeAllTagNodes
public static int removeAllTagNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return removeAllTagNodes(page, dp.start, dp.end + 1);
-
removeAllTagNodes
public static int removeAllTagNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allTagNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
TagNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNode
,removeNodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid tag-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. removeNodesOPT(page, posArr); return posArr.length;
-
removeAllCommentNodes
public static int removeAllCommentNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return removeAllCommentNodes(page, 0, -1);
-
removeAllCommentNodes
public static int removeAllCommentNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return removeAllCommentNodes(page, dp.start, dp.end + 1);
-
removeAllCommentNodes
public static int removeAllCommentNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allCommentNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
CommentNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
CommentNode
,removeNodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid comment-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isCommentNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. removeNodesOPT(page, posArr); return posArr.length;
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return escapeTextNodes(html, 0, -1);
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return escapeTextNodes(html, dp.start, dp.end + 1);
-
escapeTextNodes
public static int escapeTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Will callHTML.Escape.replaceAll
on eachTextNode
in the range ofsPos ... ePos
- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TextNode's
that changed as a result of theEscape.replaceAll(n.str)
loop. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
Escape.replaceAll(String)
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); HTMLNode n = null; String s = null; int counter = 0; for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) if (! (s = Escape.replace(n.str)).equals(n.str)) { html.setElementAt(new TextNode(s), i); counter++; } return counter;
-
clone
public static java.util.Vector<HTMLNode> clone (java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return cloneRange(html, 0, -1);
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return cloneRange(html, dp.start, dp.end + 1);
-
cloneRange
public static java.util.Vector<HTMLNode> cloneRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Copies (clones!) a sub-range of the HTML page, stores the results in aVector
, and returns it.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The "cloned" (copied) sub-range specified by
'sPos'
and'ePos'.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
cloneRange(Vector, DotPair)
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<>(l.size()); // Copy the range specified into the return vector // // HOW THIS WAS DONE BEFORE NOTICING Vector.subList // // for (int i = l.start; i < l.end; i++) ret.addElement(html.elementAt(i)); ret.addAll(html.subList(l.start, l.end)); return ret;
-
removeAllInnerTags
public static int removeAllInnerTags(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return removeAllInnerTags(html, 0, -1);
-
removeAllInnerTags
public static int removeAllInnerTags (java.util.Vector<? super TagNode> html, DotPair dp)
- Code:
- Exact Method Body:
return removeAllInnerTags(html, dp.start, dp.end + 1);
-
removeAllInnerTags
public static int removeAllInnerTags (java.util.Vector<? super TagNode> html, int sPos, int ePos)
This method removes all inner-tags (all attributes) from everyTagNode
inside of an HTML page. It does this by replacing everyTagNode
in theVector
with the pre-instantiated, publicly-availableTagNode
which can be obtained by a call to the classHTMLTags.hasTag(token, TC)
.
NOTE: This method determines whether a freshTagNode
is to be inserted by measuring the length of the internalTagNode.str
(aString
) field. IfTagNode.str.length()
is not equal to the HTML tokenTagNode.tok
length plus 2, then a fresh, pre-instantiated, node is replaced. The'+2'
figure comes from the additional characters'<'
and'>'
that start and end every HTMLTagNode
- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TagNode
elements that have were replaced with zero-attribute HTML Element Tags. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
java.lang.ClassCastException
- If'html'
contains references that do not inheritHTMLNode
.- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(sPos, ePos, html); TagNode tn; for (int i = (l.end-1); i >= l.start; i--) if ((tn = ((HTMLNode) html.elementAt(i)).openTagPWA()) != null) { ret++; // HTMLTags.hasTag(tok, TC) gets an empty and pre-instantiated TagNode, // where TagNode.tok == 'tn.tok' and TagNode.isClosing = false html.setElementAt(HTMLTags.hasTag(tn.tok, TC.OpeningTags), i); } return ret;
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return textStrLength(html, dp.start, dp.end + 1);
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return textStrLength(html, 0, -1);
-
textStrLength
public static int textStrLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will return the length of the strings contained by all/only instances of'TextNode'
among the nodes of the input HTML-Vector
. This is identical to the behavior of the method with the same name, but includes starting and ending bounds on the htmlVector
:'sPos'
&'ePos'
.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The sum of the lengths of the text contained by text-nodes in the
Vector
between'sPos'
and'ePos'
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
HTMLNode n; int sum = 0; LV l = new LV(html, sPos, ePos); // Counts the length of each "String" in a "TextNode" between sPos and ePos for (int i=l.start; i < l.end; i++) if ((n = html.elementAt(i)).isTextNode()) sum += n.str.length(); return sum;
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return compactTextNodes(html, 0, html.size());
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return compactTextNodes(html, dp.start, dp.end + 1);
-
compactTextNodes
public static int compactTextNodes(java.util.Vector<HTMLNode> html, int sPos, int ePos)
Occasionally, when removing instances ofTagNode
from a vectorized-html page, certain instances ofTextNode
which were not adjacent / neighbours in theVector
, all of a sudden become adjacent. Although there are no major problems with contiguous instances ofTextNode
from the Search Algorithm's perspective, for programmer's, it can sometimes be befuddling to realize that the output text that is returned from a call toUtil.pageToString(html)
is not being found because the text that is left is broken amongst multiple instances of adjacent TextNodes.
This method merely combines "Adjacent" instances ofclass TextNode
in theVector
into single instances ofclass TextNode
- Parameters:
html
- Any vectorized-html web-page. If this page contain any contiguously placedTextNode's
, the extra's will be eliminated, and the internal-string's inside the node's (TextNode.str
) will be combined. This action will reduce the size of the actual html-Vector
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of nodes that were eliminated after being combined, or 0 if there were no text-nodes that were removed.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
HTMLNode.str
,TextNode
- Code:
- Exact Method Body:
LV l = new LV(html, sPos, ePos); boolean compacting = false; int firstPos = -1; int delta = 0; for (int i=l.start; i < (l.end - delta); i++) if (html.elementAt(i).isTextNode()) // Is a TextNode { if (compacting) continue; // Not in "Compacting Mode" compacting = true; firstPos = i; // Start "Compacting Mode" - this is a TextNode } else if (compacting && (firstPos < (i-1))) // Else - Must be a TagNode or CommentNode { // Save compacted TextNode String's into this StringBuilder StringBuilder compacted = new StringBuilder(); // Iterate all TextNodes that were adjacent, put them together into StringBuilder for (int j=firstPos; j < i; j++) compacted.append(html.elementAt(j).str); // Place this new "aggregate TextNode" at location of the first TextNode that // was compacted into this StringBuilder html.setElementAt(new TextNode(compacted.toString()), firstPos); // Remove the rest of the positions in the Vector that had TextNode's. These have // all been put together into the "Aggregate TextNode" at position "firstPos" Util.removeRange(html, firstPos + 1, i); // The change in the size of the Vector needs to be accounted for. delta += (i - firstPos - 1); // Change the loop-counter variable, too, since the size of the Vector has changed. i = firstPos + 1; // Since we just hit a CommentNode, or TagNode, exit "Compacting Mode." compacting = false; } // NOTE: This, ALSO, MUST BE a TagNode or CommentNode (just like the previous // if-else branch !) // TRICKY: Don't forget this 'else' ! else compacting = false; // Added - Don't forget the case where the Vector ends with a series of TextNodes // TRICKY TOO! (Same as the HTML Parser... The ending or 'trailing' nodes must be parsed int lastNodePos = html.size() - 1; if (html.elementAt(lastNodePos).isTextNode()) if (compacting && (firstPos < lastNodePos)) { StringBuilder compacted = new StringBuilder(); // Compact the TextNodes that were identified at the end of the Vector range. for (int j=firstPos; j <= lastNodePos; j++) compacted.append(html.elementAt(j).str); // Replace the group of TextNode's at the end of the Vector, with the single, aggregate html.setElementAt(new TextNode(compacted.toString()), firstPos); Util.removeRange(html, firstPos + 1, lastNodePos + 1); } return delta;
-
countNewLines
public static int countNewLines(java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return countNewLines(html, 0, -1);
-
countNewLines
public static int countNewLines(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return countNewLines(html, dp.start, dp.end + 1);
-
countNewLines
public static int countNewLines(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This will count the number of new-line symbols present - on the partial HTML page. The count will include a sum of everyHTMLNode.str
that contains the standard new-line symbols:\r\n, \r, \n
, meaning that UNIX, MSFT, Apple, etc. forms of text-line rendering should all be treated equally.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of new-line characters in all of the
HTMLNode's
that occur between vectorized-page positions'sPos'
and'ePos.'
NOTE: The regular-expression used here 'NEWLINEP' is as follows:
private static final Pattern NEWLINEP = Pattern.compile("\\r\\n|\\r|\\n");
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
StringParse.NEWLINEP
- Code:
- Exact Method Body:
int newLineCount = 0; LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) // Uses the Torello.Java.StringParse "New Line RegEx" for ( Matcher m = StringParse.NEWLINEP.matcher(html.elementAt(i).str); m.find(); newLineCount++); return newLineCount;
-
countTextNodes
public static int countTextNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return countTextNodes(page, 0, -1);
-
countTextNodes
public static int countTextNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return countTextNodes(page, dp.start, dp.end + 1);
-
countTextNodes
public static int countTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Counts the number ofTextNode's
in aVector<HTMLNode>
between the demarcated array /Vector
positions,'sPos'
and'ePos'
- Parameters:
page
- Any HTML page.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TextNode's
in theVector
between the demarcated indices. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
int counter = 0; LV l = new LV(page, sPos, ePos); // Iterates the entire page between sPos and ePos, incrementing the count for every // instance of text-node. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) counter++; return counter;
-
countCommentNodes
public static int countCommentNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return countCommentNodes(page, 0, -1);
-
countCommentNodes
public static int countCommentNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return countCommentNodes(page, dp.start, dp.end + 1);
-
countCommentNodes
public static int countCommentNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Counts the number ofCommentNode's
in anVector<HTMLNode>
between the demarcated array /Vector
positions.- Parameters:
page
- Any HTML page.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
CommentNode's
in theVector
between the demarcated indices. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
int counter = 0; LV l = new LV(page, sPos, ePos); // Iterates the entire page between sPos and ePos, incrementing the count for every // instance of comment-node. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isCommentNode()) counter++; return counter;
-
countTagNodes
public static int countTagNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return countTagNodes(page, 0, -1);
-
countTagNodes
public static int countTagNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return countTagNodes(page, dp.start, dp.end + 1);
-
countTagNodes
public static int countTagNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Counts the number ofTagNode's
in aVector<HTMLNode>
between the demarcated array /Vector
positions.- Parameters:
page
- Any HTML page.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TagNode's
in theVector
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
int counter = 0; LV l = new LV(page, sPos, ePos); // Iterates the entire page between sPos and ePos, incrementing the count for every // instance of TagNode. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) counter++; return counter;
-
strLength
public static int strLength(java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return strLength(html, 0, -1);
-
strLength
public static int strLength(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return strLength(html, dp.start, dp.end + 1);
-
strLength
public static int strLength(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method simply adds / sums theString
-length of everyHTMLNode.str
field in the passed page-Vector
. It only counts nodes between parameterssPos
(inclusive) andePos
(exclusive).- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The total length - in characters - of the sub-page of HTML between
'sPos'
and'ePos'
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
strLength(Vector)
- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(html, sPos, ePos); for (int i=l.start; i < l.end; i++) ret += html.elementAt(i).str.length(); return ret;
-
hashCode
public static int hashCode(java.util.Vector<? extends HTMLNode> html)
- Code:
- Exact Method Body:
return hashCode(html, 0, -1);
-
hashCode
public static int hashCode(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return hashCode(html, dp.start, dp.end + 1);
-
hashCode
public static int hashCode(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Generates a hash-code for a vectorized html page-Vector
.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- Returns the
String.hashCode()
of the partial HTML-page as if it were not being stored as aVector
, but rather as HTML inside of a Java-String
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
hashCode(Vector)
- Code:
- Exact Method Body:
int h = 0; LV lv = new LV(html, sPos, ePos); for (int j=lv.start; j < lv.end; j++) { String s = html.elementAt(j).str; int l = s.length(); // This line has been copied from the jdk8/jdk8 "String.hashCode()" method. // The difference is that it iterates over the entire vector for (int i=0; i < l; i++) h = 31 * h + s.charAt(i); } return h;
-
removeStyleNodeBlocks
public static int removeStyleNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all HTML'style'
Node blocks.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.- Returns:
- The number of
<STYLE>
-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "style") > 0) removeCount++; return removeCount;
-
removeScriptNodeBlocks
public static int removeScriptNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all'script'
Node blocks.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.- Returns:
- The number of
SCRIPT
-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "script") > 0) removeCount++; return removeCount;
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return getJSONScriptBlocks(html, 0, -1);
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return getJSONScriptBlocks(html, dp.start, dp.end + 1);
-
getJSONScriptBlocks
public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks (java.util.Vector<HTMLNode> html, int sPos, int ePos)
This method shall search for any and all<SCRIPT TYPE="json">
JSON TEXT</SCRIPT>
block present in a range of Vectorized HTML. The search method shall simply look for the toke"JSON"
in theTYPE
attribute of each and every<SCRIPT> TagNode
that is found on the page. The validity of theJSON
found within such blocks is not checked for validity, nor is it even guaranteed to beJSON
data!- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- This will return a
java.util.stream.Stream<String>
of each of theJSON
elements present in the specified range of the Vectorized HTML passed to parameter'html'
.Conversion-Target Stream-Method Invocation String[]
Stream.toArray(String[]::new);
List<String>
Stream.collect(Collectors.toList());
Vector<String>
Stream.collect(Collectors.toCollection(Vector::new));
TreeSet<String>
Stream.collect(Collectors.toCollection(TreeSet::new));
Iterator<String>
Stream.iterator();
- See Also:
StrTokCmpr.containsIgnoreCase(String, Predicate, String)
,rangeToString(Vector, int, int)
- Code:
- Exact Method Body:
// Whenever building lists, it is usually easiest to use a Stream.Builder Stream.Builder<String> b = Stream.builder(); // This Predicate simply tests that if the substring "json" (CASE INSENSITIVE) is found // in the TYPE attribute of a <SCRIPT TYPE=...> node, that the token-string is, indeed a // word - not a substring of some other word. For instance: TYPE="json" would PASS, but // TYPE="rajsong" would FAIL - because the token string is not surrounded by white-space final Predicate<String> tester = (String s) -> StrTokCmpr.containsIgnoreCase(s, (Character c) -> ! Character.isLetterOrDigit(c), "json"); // Find all <SCRIPT> node-blocks whose "TYPE" attribute abides by the tester String-predicate // named above. Vector<DotPair> jsonDPList = InnerTagFindInclusive.all (html, sPos, ePos, "script", "type", tester); // Convert each of these DotPair element into a java.lang.String // Add the String to the Stream.Builder<String> for (DotPair jsonDP : jsonDPList) if (jsonDP.size() > 2) b.accept(Util.rangeToString(html, jsonDP.start + 1, jsonDP.end)); // Build the Stream, and return it. return b.build();
-
insertNodes
public static void insertNodes(java.util.Vector<HTMLNode> html, int pos, HTMLNode... nodes)
Inserts nodes, and allows a 'varargs' parameter.- Parameters:
html
- Any HTML Pagepos
- The position in the originalVector
where the nodes shall be inserted.nodes
- A list of nodes to insert.- Code:
- Exact Method Body:
Vector<HTMLNode> nodesVec = new Vector<>(nodes.length); for (HTMLNode node : nodes) nodesVec.addElement(node); html.addAll(pos, nodesVec);
-
removeNodesOPT
public static <T extends HTMLNode> void removeNodesOPT (java.util.Vector<T> page, int... posArr)
OPT: Optimized
This method does the same thing asremoveNodes(boolean, Vector, int[])
but all error checking is skipped, and the input integer array is presumed to have been sorted. There are no guarantees about the behavior of this method if the input array'posArr'
is not sorted, least-to-greatest, or if there are duplicate or negative values in this array.
NOTE: If the var-args input integer-array parameter is empty, this method shall exit gracefully, and immediately.- Parameters:
page
- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens(...)
, but these may be obtained or created in any fashion so necessary.posArr
- An array of integers which list/identify the nodes in the page to be removed. Because this implementation has been optimized, no error checking will be performed on this input. It is presumed to be sorted, least-to-greatest, and that all values in the array are valid-indices into the vectorized-html parameter'page'
- Code:
- Exact Method Body:
if (posArr.length == 0) return; int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invokation: // Util.removeNodes(page, 4, 5, 6); ... // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invokation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
removeNodes
public static <T extends HTMLNode> void removeNodes (boolean preserveInputArray, java.util.Vector<T> page, int... nodeList)
This method remove each HTMLNode from the passed-parameter'page'
listed/identified by the input array'nodeList'
.
NOTE: If the var-args input integer-array parameter is empty, this method shall exit gracefully, and immediately.- Parameters:
preserveInputArray
- This is a convenience input parameter that allows a programmer to "preserve" the original input-parameter integer-array that is passed to this method. It could be argued this parameter is "superfluous" - however, keep in mind that the passed parameter'nodeList'
must be sorted before this method is able function properly. There is a sort that's performed within the body of this method. Just in case that the original order of the integer-array input-parameter must be preserved, its possible to request for the sort to operate on "a clone" of the input-parameter integer-array, instead of the original integer-array'nodeList'
itself.page
- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens(...)
, but these may be obtained or created in any fashion so necessary.nodeList
- An array of integers which list/identify the nodes in the page to be removed.- Throws:
java.lang.IllegalArgumentException
- If the'nodeList'
contains duplicate entries. Obviously, noHTMLNode
may be removed from theVector<HTMLNode>
more than once.java.lang.IndexOutOfBoundsException
- If the nodeList contains index-pointers / items that are not within the bounds of the passed HTML-PageVector
.- Code:
- Exact Method Body:
if (nodeList.length == 0) return; // @Safe Var Args int[] posArr = preserveInputArray ? nodeList.clone() : nodeList; int len = posArr.length; Arrays.sort(posArr); // Check for duplicates in the nodeList, no HTMLNode may be removed twice! for (int i=0; i < (len - 1); i++) if (posArr[i] == posArr[i+1]) throw new IllegalArgumentException( "The input array contains duplicate items, this is not allowed.\n" + "This is since each array-entry is intended to be a pointer/index for items to " + "be removed.\nNo item can possibly be removed twice.!" ); // Make sure all nodes are within the bounds of the original Vector. (no negative indexes, // no indexes greater than the size of the Vector) if ((posArr[0] < 0) || (posArr[len - 1] >= page.size())) throw new IndexOutOfBoundsException ( "The input array contains entries which are not within the bounds of the " + "original-passed Vector.\nHTMLPage Vector has: " + page.size() + " elements.\n" + "Maximum element in the nodeList is [" + posArr[len - 1] + "], and the minimum " + "element is: [" + posArr[0] + "]" ); int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invocation: // Util.removeNodes(page, 4, 5, 6); // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invocation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, DotPair range, java.util.Vector<HTMLNode> newNodes)
- Code:
- Exact Method Body:
replaceRange(page, range.start, range.end+1, newNodes);
-
replaceRange
public static void replaceRange(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.util.Vector<HTMLNode> newNodes)
Replaces any all and allHTMLNode's
located between theVector
locations'sPos'
(inclusive) and'ePos'
(exclusive). By exclusive, this means that theHTMLNode
located at positon'ePos'
will not be replaced, but the one at'sPos'
is replaced.
The size of theVector
will change bynewNodes.size() - (ePos + sPos)
. The contents situated betweenVector
locationsPos
andsPos + newNodes.size()
will, indeed, be the contents of the'newNodes'
parameter.- Parameters:
page
- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.newNodes
- Any Java HTML page-Vector
ofHTMLNode
.- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
pollRange(Vector, int, int)
,removeRange(Vector, int, int)
,replaceRange(Vector, DotPair, Vector)
- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); int oldSize = ePos - sPos; int newSize = newNodes.size(); int insertPos = sPos; int i = 0; while ((i < newSize) && (i < oldSize)) page.setElementAt(newNodes.elementAt(i++), insertPos++); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE ONE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** if (newSize == oldSize) return; // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE TWO: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is SMALLER than the old sub-range // The rest of the nodes just need to be trashed // // OLD-WAY: (Before realizing what Vector.subList is actually doing) // Util.removeRange(page, insertPos, ePos); if (newSize < oldSize) page.subList(insertPos, ePos).clear(); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // CASE THREE: // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // The new Vector is BIGGER than the old sub-range // There are still more nodes to insert. else page.addAll(ePos, newNodes.subList(i, newSize));
-
removeRange
public static <T extends HTMLNode> int removeRange (java.util.Vector<T> page, int sPos, int ePos)
Java'sjava.util.Vector
class does not allow public access to theremoveRange(start, end)
function. It is protected in Java's Documentation about theVector
class. This method does exactly that, nothing else.- Parameters:
page
- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- the number of nodes removed.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
pollRange(Vector, int, int)
,removeRange(Vector, DotPair)
- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); // According to the Sun-Oracle Docs, the returned sublist "mirros" the original vector, // which means that when it is changed, so is the original vector. page.subList(l.start, l.end).clear(); return l.size(); /* // BEFORE DISCOVERING THE METHOD Vector.subList(start, end), this is how this worked. // It seemed very inefficient before realizing what was actually happening. // Shift the nodes in position Vector[l.end through page.size()] to vector-position // Vector[l.start] int end = page.size() - l.end - 1; for (int i=0; i <= end; i++) page.setElementAt(page.elementAt(l.end + i), l.start + i); // Number of nodes to remove int numToRemove = l.end - l.start; // Remove the tail - all nodes starting at: // vector-position[page.size() - (l.end - l.start)] page.setSize(page.size() - numToRemove); return numToRemove; */
-
removeRange
public static int removeRange(java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return removeRange(html, dp.start, dp.end + 1);
-
pollRange
public static java.util.Vector<HTMLNode> pollRange (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
Java'sjava.util.Vector
class does not allow public access to theremoveRange(start, end)
function. It is listed as'protected'
in Java's Documentation about theclass Vector.
This method upstages that, and performs the'Poll'
operation, where the nodes are first removed, stored, and then return as a function result.
FURTHERMORE: The nodes that are removed are placed in a separate returnVector
, and returned as a result to this method.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- A complete list (
Vector<HTMLNode>
) of the nodes that were removed. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
removeRange(Vector, int, int)
,removeRange(Vector, DotPair)
,pollRange(Vector, DotPair)
- Code:
- Exact Method Body:
// The original version of this method is preserved inside comments at the bottom of this // method. Prior to seeing the Sun-Oracle Docs explaining that the return from the SubList // operation "mirrors changes" back to to the original vector, the code in the comments is // how this method was accomplished. LV l = new LV(html, sPos, ePos); Vector<HTMLNode> ret = new Vector<HTMLNode>(l.end - l.start); List<? extends HTMLNode> list = html.subList(l.start, l.end); // Copy the Nodes into the return Vector that the end-user receives ret.addAll(list); // Clear the nodes out of the original Vector. The Sun-Oracle Docs // state that the returned sub-list is "mirrored back into" the original list.clear(); // Return the Vector to the user. Note that the List<HTMLNode> CANNOT be returned, // because of it's mirror-qualities, and because this method expects a vector. return ret; /* // BEFORE READING ABOUT Vector.subList(...), this is how this was accomplished: // NOTE: It isn't so clear how the List<HTMLNode> works - likely it doesn't actually // create any new memory-allocated arrays, it is just an "overlay" // Copy the elements from the input vector into the return vector for (int i=l.start; i < l.end; i++) ret.add(html.elementAt(i)); // Remove the range from the input vector (this is the meaning of 'poll') Util.removeRange(html, sPos, ePos); return ret; */
-
pollRange
public static java.util.Vector<HTMLNode> pollRange (java.util.Vector<? extends HTMLNode> html, DotPair dp)
- Code:
- Exact Method Body:
return pollRange(html, dp.start, dp.end + 1);
-
split
public static java.util.Vector<HTMLNode> split (java.util.Vector<? extends HTMLNode> html, int pos)
This removes every element from theVector
beginning at position 0, all the way to position'pos'
(exclusive). TheelementAt(pos)
remains in the original page input-Vector
. This is the definition of 'exclusive'.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.pos
- Any position within the range of the inputVector
.- Returns:
- The elements in the
Vector
from position:0 ('zero')
all the way to position:'pos'
- Code:
- Exact Method Body:
return pollRange(html, 0, pos);
-
removeFirstLast
public static void removeFirstLast (java.util.Vector<? extends HTMLNode> html)
Removes the first and last element of a vectorized-HTML web-page, or sub-page. Generally, this could be used to remove the surrounding tag's'<DIV>' ... '</DIV>'
, or something similar.
IMPORTANT: This method WILL NOT CHECK whether there are matching HTML open-and-close tags at the end beginning and end of this sub-section. Generally, though, that is how this method may be used.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.- Throws:
java.lang.IllegalArgumentException
- If theVector
has fewer than two elements.- Code:
- Exact Method Body:
int size = html.size(); if (size < 2) throw new IllegalArgumentException( "You have requested that the first and last elements the input 'page' parameter (a vector) be removed. " + "However, the vector size is only [" + size + "], so this cannot be performed." ); // NOTE: *** This removes elementAt(0) and elementAt(size-1) // *** NOT ALL ELEMENTS BETWEEN 0 and (size-1) Util.removeNodesOPT(html, 0, size-1);
-
-