Package Torello.HTML
Class Util.Remove
- java.lang.Object
-
- Torello.HTML.Util.Remove
-
- Enclosing class:
- Util
public static class Util.Remove extends java.lang.Object
Hi-Lited Source-Code:- View Here: Torello/HTML/Util.java
- Open New Browser-Tab: Torello/HTML/Util.java
File Size: 28,853 Bytes Line Count: 644 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctionalAnnotation may also be called 'The Spaghetti Report'.Static-Functionalclasses are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@StatelessAnnotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 22 Method(s), 22 declared static
- 0 Field(s)
-
-
Method Summary
Remove all CommentNode instances Modifier and Type Method static intallCommentNodes(Vector<HTMLNode> page)static intallCommentNodes(Vector<HTMLNode> page, int sPos, int ePos)static intallCommentNodes(Vector<HTMLNode> page, DotPair dp)Remove all TagNode instances Modifier and Type Method static intallTagNodes(Vector<HTMLNode> page)static intallTagNodes(Vector<HTMLNode> page, int sPos, int ePos)static intallTagNodes(Vector<HTMLNode> page, DotPair dp)Remove all TextNode instances Modifier and Type Method static intallTextNodes(Vector<HTMLNode> page)static intallTextNodes(Vector<HTMLNode> page, int sPos, int ePos)static intallTextNodes(Vector<HTMLNode> page, DotPair dp)Remove all Attributes from all TagNode instances Modifier and Type Method static intallInnerTags(Vector<? super TagNode> html, int sPos, int ePos)static intallInnerTags(Vector<? super TagNode> html, DotPair dp)static intallInnerTags(Vector<HTMLNode> html)Remove <SCRIPT> Node Blocks Modifier and Type Method static intscriptNodeBlocks(Vector<? extends HTMLNode> html)Remove <STYLE> Node Blocks Modifier and Type Method static intstyleNodeBlocks(Vector<? extends HTMLNode> html)java.util.Vector Improvements: Remove a Sub-Range of Nodes Modifier and Type Method static intrange(Vector<? extends HTMLNode> html, DotPair dp)static <T extends HTMLNode>
intrange(Vector<T> page, int sPos, int ePos)java.util.Vector Improvements: Remove Nodes by Vector-Index Modifier and Type Method static <T extends HTMLNode>
voidnodes(boolean preserveInputArray, Vector<T> page, int... nodeList)static <T extends HTMLNode>
voidnodesOPT(Vector<T> page, int... posArr)Function for Removing 'Empty' Tags Modifier and Type Method static intinclusiveEmpty(Vector<HTMLNode> page, int sPos, int ePos, String... htmlTags)static intinclusiveEmpty(Vector<HTMLNode> page, String... htmlTags)static intinclusiveEmpty(Vector<HTMLNode> page, DotPair dp, String... htmlTags)Miscellaneous Remove Operations Modifier and Type Method static voidfirstLast(Vector<? extends HTMLNode> html)
-
-
-
Method Detail
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return allTextNodes(page, 0, -1);
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return allTextNodes(page, dp.start, dp.end + 1);
-
allTextNodes
public static int allTextNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVectorand removes allTextNodepresent- Parameters:
page- Any HTML pagesPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of HTML
TextNode'sthat were removed - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
TextNode,nodesOPT(Vector, int[])- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid text-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return allTagNodes(page, 0, -1);
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return allTagNodes(page, dp.start, dp.end + 1);
-
allTagNodes
public static int allTagNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVectorand removes allTagNodepresent- Parameters:
page- Any HTML pagesPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of HTML
TagNode'sthat were removed - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
TagNode,nodesOPT(Vector, int[])- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid tag-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page)
- Code:
- Exact Method Body:
return allCommentNodes(page, 0, -1);
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page, DotPair dp)
- Code:
- Exact Method Body:
return allCommentNodes(page, dp.start, dp.end + 1);
-
allCommentNodes
public static int allCommentNodes(java.util.Vector<HTMLNode> page, int sPos, int ePos)
Takes a sub-section of an HTMLVectorand removes allCommentNodepresent- Parameters:
page- Any HTML pagesPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of HTML
CommentNode'sthat were removed - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
CommentNode,nodesOPT(Vector, int[])- Code:
- Exact Method Body:
IntStream.Builder b = IntStream.builder(); LV l = new LV(page, sPos, ePos); // Use Java-Streams to build the list of nodes that are valid comment-nodes. for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isCommentNode()) b.add(i); // Build the stream and convert it to an int[] (integer-array) int[] posArr = b.build().toArray(); // The integer array is guaranteed to be sorted, and contain valid vector-indices. nodesOPT(page, posArr); return posArr.length;
-
allInnerTags
public static int allInnerTags(java.util.Vector<HTMLNode> html)
- Code:
- Exact Method Body:
return allInnerTags(html, 0, -1);
-
allInnerTags
public static int allInnerTags(java.util.Vector<? super TagNode> html, DotPair dp)
- Code:
- Exact Method Body:
return allInnerTags(html, dp.start, dp.end + 1);
-
allInnerTags
public static int allInnerTags(java.util.Vector<? super TagNode> html, int sPos, int ePos)
This method removes all inner-tags (all attributes) from everyTagNodeinside of an HTML page. It does this by replacing everyTagNodein theVectorwith the pre-instantiated, publicly-availableTagNodewhich can be obtained by a call to the classHTMLTags.hasTag(token, TC).
ReplacingTagNode's:
This method determines whether a freshTagNodeis to be inserted by measuring the length of the internalHTMLNode.strfield (aStringfield). If the lengthTagNode.stris not equal to the HTML tokenTagNode.toklength plus 2, then a fresh, pre-instantiated, node is replaced.
The'+2'figure comes from the additional characters'<'and'>'that start and end every HTMLTagNode- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- The number of
TagNodeelements that have were replaced with zero-attribute HTML Element Tags. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
java.lang.ClassCastException- If'html'contains references that do not inheritHTMLNode.- Code:
- Exact Method Body:
int ret = 0; LV l = new LV(sPos, ePos, html); TagNode tn; for (int i = (l.end-1); i >= l.start; i--) if ((tn = ((HTMLNode) html.elementAt(i)).openTagPWA()) != null) { ret++; // HTMLTags.hasTag(tok, TC) gets an empty and pre-instantiated TagNode, // where TagNode.tok == 'tn.tok' and TagNode.isClosing = false html.setElementAt(HTMLTags.hasTag(tn.tok, TC.OpeningTags), i); } return ret;
-
styleNodeBlocks
public static int styleNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all HTML'style'Node blocks.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.- Returns:
- The number of
<STYLE>-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "style") > 0) removeCount++; return removeCount;
-
scriptNodeBlocks
public static int scriptNodeBlocks (java.util.Vector<? extends HTMLNode> html)
Removes all'script'Node blocks.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.- Returns:
- The number of
SCRIPT-Node Blocks that were removed - Code:
- Exact Method Body:
int removeCount = 0; while (TagNodeRemoveInclusive.first(html, "script") > 0) removeCount++; return removeCount;
-
range
public static <T extends HTMLNode> int range(java.util.Vector<T> page, int sPos, int ePos)
Java'sjava.util.Vectorclass does not allow public access to theremoveRange(start, end)function. It is protected in Java's Documentation about theVectorclass. This method does exactly that, nothing else.- Parameters:
page- Any Java HTML page, constructed ofHTMLNode (TagNode & TextNode)sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.- Returns:
- the number of nodes removed.
- Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- See Also:
Util.pollRange(Vector, int, int),range(Vector, DotPair)- Code:
- Exact Method Body:
// Torello.Java.LV LV l = new LV(sPos, ePos, page); // According to the Sun-Oracle Docs, the returned sublist "mirros" the original vector, // which means that when it is changed, so is the original vector. page.subList(l.start, l.end).clear(); return l.size();
-
range
-
nodesOPT
public static <T extends HTMLNode> void nodesOPT(java.util.Vector<T> page, int... posArr)
OPT: Optimized
This method does the same thing asnodes(boolean, Vector, int[]), but all error checking is skipped, and the input integer array is presumed to have been sorted. There are no guarantees about the behavior of this method if the input array'posArr'is not sorted, least-to-greatest, or if there are duplicate or negative values in this array.
Empty Var-Args:
If the var-args input integer-array parameter is empty, this method shall exit gracefully (and immediately).- Parameters:
page- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens, but these may be obtained or created in any fashion so necessary.posArr- An array of integers which list/identify the nodes in the page to be removed. Because this implementation has been optimized, no error checking will be performed on this input. It is presumed to be sorted, least-to-greatest, and that all values in the array are valid-indices into the vectorized-html parameter'page'- Code:
- Exact Method Body:
if (posArr.length == 0) return; int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invokation: // Util.removeNodes(page, 4, 5, 6); ... // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invokation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
nodes
public static <T extends HTMLNode> void nodes(boolean preserveInputArray, java.util.Vector<T> page, int... nodeList)
This method remove each HTMLNode from the passed-parameter'page'listed/identified by the input array'nodeList'.
Empty Var-Args:
If the var-args input integer-array parameter is empty, this method shall exit gracefully (and immediately).- Parameters:
preserveInputArray- This is a convenience input parameter that allows a programmer to "preserve" the original input-parameter integer-array that is passed to this method. It could be argued this parameter is "superfluous" - however, keep in mind that the passed parameter'nodeList'must be sorted before this method is able function properly. There is a sort that's performed within the body of this method. Just in case that the original order of the integer-array input-parameter must be preserved, its possible to request for the sort to operate on "a clone" of the input-parameter integer-array, instead of the original integer-array'nodeList'itself.page- Any HTML-Page, usually ones generated byHTMLPage.getPageTokens(...), but these may be obtained or created in any fashion so necessary.nodeList- An array of integers which list/identify the nodes in the page to be removed.- Throws:
java.lang.IllegalArgumentException- If the'nodeList'contains duplicate entries. Obviously, noHTMLNodemay be removed from theVector<HTMLNode>more than once.java.lang.IndexOutOfBoundsException- If the nodeList contains index-pointers / items that are not within the bounds of the passed HTML-PageVector.- Code:
- Exact Method Body:
if (nodeList.length == 0) return; // @Safe Var Args int[] posArr = preserveInputArray ? nodeList.clone() : nodeList; int len = posArr.length; Arrays.sort(posArr); // Check for duplicates in the nodeList, no HTMLNode may be removed twice! for (int i=0; i < (len - 1); i++) if (posArr[i] == posArr[i+1]) throw new IllegalArgumentException( "The input array contains duplicate items, this is not allowed.\n" + "This is since each array-entry is intended to be a pointer/index for items " + "to be removed.\nNo item can possibly be removed twice.!" ); // Make sure all nodes are within the bounds of the original Vector. (no negative // indexes, no indexes greater than the size of the Vector) if ((posArr[0] < 0) || (posArr[len - 1] >= page.size())) throw new IndexOutOfBoundsException ( "The input array contains entries which are not within the bounds of the " + "original-passed Vector.\nHTMLPage Vector has: " + page.size() + " elements.\n" + "Maximum element in the nodeList is [" + posArr[len - 1] + "], and the " + "minimum element is: [" + posArr[0] + "]" ); int endingInsertPos = page.size() - posArr.length; int posArrIndex = 0; int insertPos = posArr[0]; int retrievePos = posArr[0]; // There is very little that can be documented about these two loops. Took 3 hours // to figure out. Read the variables names for "best documentation" while (insertPos < endingInsertPos) { // This inner-loop is necessary for when the posArr has consecutive-elements that // are *ALSO* consecutive-pointers. // // For instance, this invocation: // Util.removeNodes(page, 4, 5, 6); // where 4, 5, and 6 are consecutive - the inner while-loop is required. // // For this invocation: // Util.removeNodes(page, 2, 4, 6); // the inner-loop is not entered. while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) { retrievePos++; posArrIndex++; } page.setElementAt(page.elementAt(retrievePos++), insertPos++); } // Remove all remaining elements in the tail of the array. page.setSize(page.size() - posArr.length);
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, java.lang.String... htmlTags)
- Code:
- Exact Method Body:
return inclusiveEmpty(page, 0, -1, htmlTags);
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, DotPair dp, java.lang.String... htmlTags)
- Code:
- Exact Method Body:
return inclusiveEmpty(page, dp.start, dp.end + 1, htmlTags);
-
inclusiveEmpty
public static int inclusiveEmpty(java.util.Vector<HTMLNode> page, int sPos, int ePos, java.lang.String... htmlTags)
This will do an "Inclusive Search" using the standard classTagNodeInclusiveIteratorin thepackage NodeSearch. Then it will inspect the contents of the subsections. Any subsections that do not contain any instances ofHTMLNodein between them, or any subsections that only contain "blank-text" (white-space) between them shall be removed.
Recursive Method:
The search logic shall perform multiple recursive iterations of itself, such that if, for instance, the user requested that all empty HTML divider (<DIV>) elements be removed, if after removing a set a dividers resulted in more empty ones (nested<DIV>elements), then an additional removal shall be called. This recursion shall continue until there are no empty HTML elements of the types listed by'htmlTags'- Parameters:
page- Any vectorized-html page or sub-page.sPos- This is the (integer)Vector-index that sets a limit for the left-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'inclusive' meaning that theHTMLNodeat thisVector-index will be visited by this method.If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.ePos- This is the (integer)Vector-index that sets a limit for the right-mostVector-position to inspect/search inside the inputVector-parameter. This value is considered 'exclusive' meaning that the'HTMLNode'at thisVector-index will not be visited by this method.If this value is larger than the size of input theVector-parameter, an exception will throw.
Passing a negative value to this parameter,'ePos', will cause its value to be reset to the size of the inputVector-parameter.htmlTags- The list of inclusive (non-singleton) html elements to search for possibly being empty container tags.- Returns:
- The number of
HTMLNode'sthat were removed. - Throws:
java.lang.IndexOutOfBoundsException- This exception shall be thrown if any of the following are true:- If
'sPos'is negative, or ifsPosis greater-than-or-equal-to thesizeof theVector - If
'ePos'is zero, or greater than the size of theVector - If the value of
'sPos'is a larger integer than'ePos'. If'ePos'was negative, it is first reset toVector.size(), before this check is done.
- If
- Code:
- Exact Method Body:
DotPair subList; int removed = 0; HNLIInclusive iter = TagNodeInclusiveIterator.iter(page, htmlTags); LV l = new LV(page, sPos, ePos); iter.restrictCursor(l); TOP: while (iter.hasNext()) // If there is only the opening & closing pair, with nothing in between, // then the pair must be removed because it is "Empty" (Inclusive Empty) if ((subList = iter.nextDotPair()).size() == 2) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } else { // If there is any TagNode in between the start-end pair, then this is NOT // EMPTY. In this case, skip to the next start-end opening-closing pair. for (int i=(subList.start + 1); i < subList.end; i++) if (! page.elementAt(i).isTextNode()) continue TOP; // If there were only TextNode's between an opening-closing TagNode Pair.... // **AND** those TextNode's are only white-space, then this also considered // Inclusively Empty. (Get all TextNode's, and if .trim() reduces the length() // to zero, then it was only white-space. if (Util.textNodesString(page, subList).trim().length() == 0) { iter.remove(); ePos -= subList.size(); removed += subList.size(); } } // This process must be continued recursively, because if any inner, for instance, // <DIV> ... </DIV> was removed, then the outer list must be re-checked... if (removed > 0) return removed + Remove.inclusiveEmpty(page, sPos, ePos, htmlTags); else return 0;
-
firstLast
public static void firstLast(java.util.Vector<? extends HTMLNode> html)
Removes the first and last element of a vectorized-HTML web-page, or sub-page. Generally, this could be used to remove the surrounding tag's'<DIV>'...'</DIV>', or something similar.
This method WILL NOT CHECK whether there are matching HTML open-and-close tags at the end beginning and end of this sub-section. Generally, though, that is how this method is intended to be used.- Parameters:
html- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'means that aVector<TagNode>, Vector<TextNode>orVector<CommentNode>will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'vpackage.- Throws:
java.lang.IllegalArgumentException- If theVectorhas fewer than two elements.- Code:
- Exact Method Body:
int size = html.size(); if (size < 2) throw new IllegalArgumentException( "You have requested that the first and last elements the input 'page' parameter " + "(a vector) be removed. However, the vector size is only [" + size + "], so " + "this cannot be performed." ); // NOTE: *** This removes elementAt(0) and elementAt(size-1) // *** NOT ALL ELEMENTS BETWEEN 0 and (size-1) Util.Remove.nodesOPT(html, 0, size-1);
-
-