Package Torello.HTML
Class Util.Remove
- java.lang.Object
-
- Torello.HTML.Util.Remove
-
- Enclosing class:
- Util
public static class Util.Remove extends java.lang.Object
Hi-Lited Source-Code:- View Here: Torello/HTML/Util.java
- Open New Browser-Tab: Torello/HTML/Util.java
File Size: 28,877 Bytes Line Count: 647 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 22 Method(s), 22 declared static
- 0 Field(s)
-
-
Method Summary
Remove all CommentNode instances Modifier and Type Method static int
allCommentNodes(Vector<HTMLNode> page)
static int
allCommentNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
allCommentNodes(Vector<HTMLNode> page, DotPair dp)
Remove all TagNode instances Modifier and Type Method static int
allTagNodes(Vector<HTMLNode> page)
static int
allTagNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
allTagNodes(Vector<HTMLNode> page, DotPair dp)
Remove all TextNode instances Modifier and Type Method static int
allTextNodes(Vector<HTMLNode> page)
static int
allTextNodes(Vector<HTMLNode> page, int sPos, int ePos)
static int
allTextNodes(Vector<HTMLNode> page, DotPair dp)
Remove all Attributes from all TagNode instances Modifier and Type Method static int
allInnerTags(Vector<? super TagNode> html, int sPos, int ePos)
static int
allInnerTags(Vector<? super TagNode> html, DotPair dp)
static int
allInnerTags(Vector<HTMLNode> html)
Remove <SCRIPT> Node Blocks Modifier and Type Method static int
scriptNodeBlocks(Vector<? extends HTMLNode> html)
Remove <STYLE> Node Blocks Modifier and Type Method static int
styleNodeBlocks(Vector<? extends HTMLNode> html)
java.util.Vector Improvements: Remove a Sub-Range of Nodes Modifier and Type Method static int
range(Vector<? extends HTMLNode> html, DotPair dp)
static <T extends HTMLNode>
intrange(Vector<T> page, int sPos, int ePos)
java.util.Vector Improvements: Remove Nodes by Vector-Index Modifier and Type Method static <T extends HTMLNode>
voidnodes(boolean preserveInputArray, Vector<T> page, int... nodeList)
static <T extends HTMLNode>
voidnodesOPT(Vector<T> page, int... posArr)
Function for Removing 'Empty' Tags Modifier and Type Method static int
inclusiveEmpty(Vector<HTMLNode> page, int sPos, int ePos, String... htmlTags)
static int
inclusiveEmpty(Vector<HTMLNode> page, String... htmlTags)
static int
inclusiveEmpty(Vector<HTMLNode> page, DotPair dp, String... htmlTags)
Miscellaneous Remove Operations Modifier and Type Method static void
firstLast(Vector<? extends HTMLNode> html)
-
-
-
Method Detail
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page)
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page, DotPair dp)
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allTextNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
TextNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TextNode
,nodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid text-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page)
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page, DotPair dp)
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allTagNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
TagNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNode
,nodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid tag-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page)
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page, DotPair dp)
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVector
and removes allCommentNode
present- Parameters:
page
- Any HTML pagesPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of HTML
CommentNode's
that were removed - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
CommentNode
,nodesOPT(Vector, int[])
- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid comment-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isCommentNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allInnerTags
public static int allInnerTags(java.util.Vector<HTMLNode> html)
-
allInnerTags
public static int allInnerTags(java.util.Vector<? super TagNode> html, DotPair dp)
-
allInnerTags
public static int allInnerTags(java.util.Vector<? super TagNode> html, int sPos, int ePos)
This method removes all inner-tags (all attributes) from everyTagNode
inside of an HTML page. It does this by replacing everyTagNode
in theVector
with the pre-instantiated, publicly-availableTagNode
which can be obtained by a call to the classHTMLTags.hasTag(token, TC)
.
ReplacingTagNode's:
This method determines whether a freshTagNode
is to be inserted by measuring the length of the internalHTMLNode.str
field (aString
field). If the lengthTagNode.str
is not equal to the HTML tokenTagNode.tok
length plus 2, then a fresh, pre-instantiated, node is replaced.
The'+2'
figure comes from the additional characters'<'
and'>'
that start and end every HTMLTagNode
- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The number of
TagNode
elements that have were replaced with zero-attribute HTML Element Tags. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
java.lang.ClassCastException
- If'html'
contains references that do not inheritHTMLNode
.- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(sPos, ePos, html); TagNode tn; for (int i = (l.end-1); i >= l.start; i--) if ((tn = ((HTMLNode) html.elementAt(i)).openTagPWA()) != null) { ret++; // HTMLTags.hasTag(tok, TC) gets an empty and pre-instantiated TagNode, // where TagNode.tok == 'tn.tok' and TagNode.isClosing = false html.setElementAt(HTMLTags.hasTag(tn.tok, TC.OpeningTags), i); } return ret;
-
styleNodeBlocks
public static int styleNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all HTML'style'
Node blocks.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The number of
<STYLE>
-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "style") > 0) removeCount++; return removeCount;
-
scriptNodeBlocks
public static int scriptNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all'script'
Node blocks.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The number of
SCRIPT
-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "script") > 0) removeCount++; return removeCount;
-
range
public static <T extends HTMLNode> int range(java.util.Vector<T> page, int sPos, int ePos)
Java'sjava.util.Vector
class does not allow public access to theremoveRange(start, end)
function. It is protected in Java's Documentation about theVector
class. This method does exactly that, nothing else.- Parameters:
page
- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- the number of nodes removed.
- Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
Util.pollRange(Vector, int, int)
,range(Vector, DotPair)
- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); // According to the Sun-Oracle Docs, the returned sublist "mirros" the original vector, // which means that when it is changed, so is the original vector. page.subList(l.start, l.end).clear(); return l.size();
-
nodesOPT
public static <T extends HTMLNode> void nodesOPT(java.util.Vector<T> page, int... posArr)
OPT: Optimized
This method does the same thing asnodes(boolean, Vector, int[])
, but all error checking is skipped, and the input integer array is presumed to have been sorted. There are no guarantees about the behavior of this method if the input array'posArr'
is not sorted, least-to-greatest, or if there are duplicate or negative values in this array.
Empty Var-Args:
If the var-args input integer-array parameter is empty, this method shall exit gracefully (and immediately).- Parameters:
page
- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens
, but these may be obtained or created in any fashion so necessary.posArr
- An array of integers which list/identify the nodes in the page to be removed. Because this implementation has been optimized, no error checking will be performed on this input. It is presumed to be sorted, least-to-greatest, and that all values in the array are valid-indices into the vectorized-html parameter'page'
- Code:
- Exact Method Body:
if (posArr.length == 0) return; int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invokation: // Util.removeNodes(page, 4, 5, 6); ... // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invokation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
nodes
public static <T extends HTMLNode> void nodes(boolean preserveInputArray, java.util.Vector<T> page, int... nodeList)
This method remove each HTMLNode from the passed-parameter'page'
listed/identified by the input array'nodeList'
.
Empty Var-Args:
If the var-args input integer-array parameter is empty, this method shall exit gracefully (and immediately).- Parameters:
preserveInputArray
- This is a convenience input parameter that allows a programmer to "preserve" the original input-parameter integer-array that is passed to this method. It could be argued this parameter is "superfluous" - however, keep in mind that the passed parameter'nodeList'
must be sorted before this method is able function properly. There is a sort that's performed within the body of this method. Just in case that the original order of the integer-array input-parameter must be preserved, its possible to request for the sort to operate on "a clone" of the input-parameter integer-array, instead of the original integer-array'nodeList'
itself.page
- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens(...)
, but these may be obtained or created in any fashion so necessary.nodeList
- An array of integers which list/identify the nodes in the page to be removed.- Throws:
java.lang.IllegalArgumentException
- If the'nodeList'
contains duplicate entries. Obviously, noHTMLNode
may be removed from theVector<HTMLNode>
more than once.java.lang.IndexOutOfBoundsException
- If the nodeList contains index-pointers / items that are not within the bounds of the passed HTML-PageVector
.- Code:
- Exact Method Body:
if (nodeList.length == 0) return; // @Safe Var Args int[] posArr = preserveInputArray ? nodeList.clone() : nodeList; int len = posArr.length; Arrays.sort(posArr); // Check for duplicates in the nodeList, no HTMLNode may be removed twice! for (int i=0; i < (len - 1); i++) if (posArr[i] == posArr[i+1]) throw new IllegalArgumentException( "The input array contains duplicate items, this is not allowed.\n" + "This is since each array-entry is intended to be a pointer/index for items " + "to be removed.\nNo item can possibly be removed twice.!" ); // Make sure all nodes are within the bounds of the original Vector. (no negative // indexes, no indexes greater than the size of the Vector) if ((posArr[0] < 0) || (posArr[len - 1] >= page.size())) throw new IndexOutOfBoundsException ( "The input array contains entries which are not within the bounds of the " + "original-passed Vector.\nHTMLPage Vector has: " + page.size() + " elements.\n" + "Maximum element in the nodeList is [" + posArr[len - 1] + "], and the " + "minimum element is: [" + posArr[0] + "]" ); int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invocation: // Util.removeNodes(page, 4, 5, 6); // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invocation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, java.lang.String... htmlTags)
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, DotPair dp, java.lang.String... htmlTags)
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.lang.String... htmlTags)
This will do an "Inclusive Search" using the standard classTagNodeInclusiveIterator
in thepackage NodeSearch
. Then it will inspect the contents of the subsections. Any subsections that do not contain any instances ofHTMLNode
in between them, or any subsections that only contain "blank-text" (white-space) between them shall be removed.
Recursive Method:
The search logic shall perform multiple recursive iterations of itself, such that if, for instance, the user requested that all empty HTML divider (<DIV>
) elements be removed, if after removing a set a dividers resulted in more empty ones (nested<DIV>
elements), then an additional removal shall be called. This recursion shall continue until there are no empty HTML elements of the types listed by'htmlTags'
- Parameters:
page
- Any vectorized-html page or sub-page.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.htmlTags
- The list of inclusive (non-singleton) html elements to search for possibly being empty container tags.- Returns:
- The number of
HTMLNode's
that were removed. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- Code:
- Exact Method Body:
DotPair subList; int removed = 0; HNLIInclusive iter = TagNodeInclusiveIterator.iter(page, htmlTags); LV l = new LV(page, sPos, ePos); iter.restrictCursor(l); TOP: while (iter.hasNext()) // If there is only the opening & closing pair, with nothing in between, // then the pair must be removed because it is "Empty" (Inclusive Empty) if ((subList = iter.nextDotPair()).size() == 2) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } else { // If there is any TagNode in between the start-end pair, then this is NOT // EMPTY. In this case, skip to the next start-end opening-closing pair. for (int i=(subList.start + 1); i < subList.end; i++) if (! page.elementAt(i).isTextNode()) continue TOP; // If there were only TextNode's between an opening-closing TagNode Pair.... // **AND** those TextNode's are only white-space, then this also considered // Inclusively Empty. (Get all TextNode's, and if .trim() reduces the length() // to zero, then it was only white-space. if (Util.textNodesString(page, subList).trim().length() == 0) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } } // This process must be continued recursively, because if any inner, for instance, // <DIV> ... </DIV> was removed, then the outer list must be re-checked... if (removed > 0) return removed + Remove.inclusiveEmpty(page, sPos, ePos, htmlTags); else return 0;
-
firstLast
public static void firstLast(java.util.Vector<? extends HTMLNode> html)
Removes the first and last element of a vectorized-HTML web-page, or sub-page. Generally, this could be used to remove the surrounding tag's'<DIV>'
...'</DIV>'
, or something similar.
This method WILL NOT CHECK whether there are matching HTML open-and-close tags at the end beginning and end of this sub-section. Generally, though, that is how this method is intended to be used.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Throws:
java.lang.IllegalArgumentException
- If theVector
has fewer than two elements.- Code:
- Exact Method Body:
int size = html.size(); if (size < 2) throw new IllegalArgumentException( "You have requested that the first and last elements the input 'page' parameter " + "(a vector) be removed. However, the vector size is only [" + size + "], so " + "this cannot be performed." ); // NOTE: *** This removes elementAt(0) and elementAt(size-1) // *** NOT ALL ELEMENTS BETWEEN 0 and (size-1) Util.Remove.nodesOPT(html, 0, size-1);
-
-