java.lang.Object
- Torello.HTML.Surrounding

```
public class Surrounding
extends java.lang.Object
```
Class for finding ancestor & parent nodes of any selected HTMLNode.

Substitute for the DOM-Tree concepts of 'parent' and 'ancestor'

Class 'Surrounding' is intended to function in place of the Java-Script DOM Tree (Document Object Model Tree) concept known as "Parent" or "Ancestor". Generally, thinking about documents as trees perhaps makes some parts of java-script a little easier to program, unfortunately, for any content-based page - it is much more consistent with "the intentions" of the author or writer of the page to think of HTML as a list (TextNode's & TagNode's etc...). The name "HTML" means Hyper-Text Markup-Language, meaning that text documents are just "Marked" by HTML Elements, so using Java Vector's (instead of DOM Trees) is the guiding philosophy.

There are other HTML Parsers which build DOM Trees, and generally, those parser are quick to modify the HTML being parsed if any unmatched closing-tags ("Elements") are found. Instead, here the philosophy is that the HTML is presumed valid, and if an unmatched closing HTML Tags are present, an 'Inclusive' search would simply not produce the expected result. Since the vast majority of uses for this package would be scraping news & informational sites - all of which have automatically generated HTML - worrying about unclosed HTML is best left for the "Browser Pioneers" who write the rendering functions for web-pages, and mostly ignoring the concept here.

The following example will demonstrate how finding the parent and ancestor nodes at a particular index. This example parses one of the documentation pages found on the JavaDocs for this package. It then picks a particular TextNode instance, and asks for all of the HTML Elements whose opening and closing tags "enclose" the TextNode

Example:
```
// Load the documentation html page into vectorized-html
StringBuffer     sb      = new StringBuffer();
URL              url     = new URL("http://developer.torello.directory/JavaHTML/Version%201/1.4/javadoc/Torello/HTML/NodeSearch/CommentNodeCount.html");
Vector<HTMLNode> page    = HTMLPage.getPageTokens(url, false);

// Obtain a vector-index pointer to the text-node containing the indicated string:
// "a count of how many"
// This is a line of text from the JavaDoc HTML Page that was loaded above.
int pos = TextNodeFind.first(page, TextComparitor.CN_CI, "a count of how many");

// Print the output found above to a StringBuffer
sb.append("Text Node Found: [" + page.elementAt(pos) + "]\n");

// Find the first "ancestor node" or "parent node" of this TextNode
// Restrict the search to leave out: <LI>, <BODY> or <DIV>
DotPair dp = Surrounding.firstExcept(page, pos, "li", "body", "div");

// Print the output of this search to the StringBuffer / Log
sb.append("Index Found: " + pos + ", DotPair Found: " + dp.toString() + "\n");
sb.append(Debug.printJ(page, dp) + "\n");

// Now print all "ancestor nodes" (Surrounding nodes) - leave out <BODY>, <HTML> and <DIV>
// ancestors
Vector<DotPair> allDP = Surrounding.allExcept(page, pos, "body", "html", "div");

for (DotPair l : allDP)

    sb.append(
        C.BCYAN + 
        "************************************************************\n" +
        "************************************************************\n" + C.RESET +
        "Index Found: " + pos + ", DotPair Found: " + l.toString() + "\n" +
        "Starting Node: " + C.BRED + page.elementAt(l.start).str + C.RESET + "\n" +
        "Ending Node:" + C.BRED + page.elementAt(l.end).str + C.RESET + "\n"
    );

// Print the StringBuffer / Log to Standard-Out, and to the text-file "out.html"
String s = sb.toString();
System.out.println(s);

// NOTE: The above "Printing" uses the Shell.C class (which are UNIX Color-Codes)
//       This converts those color-codes to HTML <SPAN>...</SPAN> Elements
FileRW.writeFile(C.toHTML(s.replace("<", "&amp;lt;").replace(">", "&amp;gt;")), "out.html");
```
The above example would print these results to a UNIX terminal:

Text Node Found: [ returns a count of how many TextNode's were identified on the vectorized-page parameter
]
Index Found: 698, DotPair Found: [690, 705]
[<ol>][
][<li>][<b>][<code>][int][</code>][</b>][ returns a count of how many TextNode's were identified on the vectorized-page parameter
][<code>]['html'][</code>][ that contained text that matched the specified criteria][</li>][
][</ol>]
************************************************************
************************************************************
Index Found: 698, DotPair Found: [692, 703]
Starting Node: <li>
Ending Node:</li>
************************************************************
************************************************************
Index Found: 698, DotPair Found: [690, 705]
Starting Node: <ol>
Ending Node:</ol>
************************************************************
************************************************************
Index Found: 698, DotPair Found: [284, 817]
Starting Node: <ul class="blockList">
Ending Node:</ul>
Hi-Lited Source-Code:
This File's Source Code:
- View Here: Torello/HTML/Surrounding.java
- Open New Browser-Tab: Torello/HTML/Surrounding.java
File Size: 15,661 Bytes Line Count: 338 '\n' Characters Found
Surrounding Helper Class:
- View Here: HTML Processors/Surrounding/HTMLTagCounter.java
- Open New Browser-Tab: HTML Processors/Surrounding/HTMLTagCounter.java
File Size: 2,336 Bytes Line Count: 75 '\n' Characters Found
Stateless Class:
This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 7 Method(s), 7 declared static
- 0 Field(s)

Method Summary

First Ancestor

Modifier and Type	Method
`static DotPair`	`first(Vector<? extends HTMLNode> html, int index, String... htmlTags)`
`static DotPair`	`firstExcept(Vector<? extends HTMLNode> html, int index, String... htmlTags)`

All Ancestors
Modifier and Type	Method
`static Vector<DotPair>`	`all(Vector<? extends HTMLNode> html, int index, String... htmlTags)`
`static Vector<DotPair>`	`allExcept(Vector<? extends HTMLNode> html, int index, String... htmlTags)`

Protected, Internal Methods
Modifier and Type	Method
`protected static Vector<DotPair>`	`ALL(Vector<? extends HTMLNode> html, int index, Torello.HTML.HTMLTagCounter tagCounter)`
`protected static DotPair`	`FIRST(Vector<? extends HTMLNode> html, int index, Torello.HTML.HTMLTagCounter tagCounter)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - first
    
    🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static DotPair first(java.util.Vector<? extends HTMLNode> html, int index, java.lang.String... htmlTags)
    
    This will return the first ancestor node - along with it's closing element - as a DotPair - that matches.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is the index of the node for whose "ancestors" we are searching (to use a Java-Script DOM Tree term).
    
    htmlTags - If this list is empty, we shall look for any ancestor node. Since this method returns the first, if this list is left empty, and the index-node is surrounded by even a bold "<B>...</B>" then that will be the DotPair result that is returned. If this list is left non-empty, then the only ancestor nodes whose HTML Element Tag (usually referred to as "the Element") matches a tag from this list shall be returned.
    
    FOR INSTANCE: If "div", "p", and "a" were provided as values to this parameter - he search loop would skip over all ancestors that were not HTML divider, paragraph or anchor elements before selecting a result.
    
    Returns:
    
    This shall return the first sub-list, as a 'DotPair' (start & end index pair). If no matches are found, null will return. This sublist is nearly identical to the Java-Script DOM Tree concept of ancestor-node, though no trees are constructed by this method.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If index is not within the bounds of the passed vectorized-html parameter 'html'
    
    HTMLTokException - If any of the tags passed are null, or not found in the table of class HTMLTags - specifically if they are not valid HTML Elements.
    
    See Also:
    
    FIRST(Vector, int, HTMLTagCounter), ARGCHECK.index(Vector, int)
    
    Code:
    
    Exact Method Body:
    
    return FIRST( html, ARGCHECK.index(html, index), new HTMLTagCounter(htmlTags, HTMLTagCounter.NORMAL, HTMLTagCounter.FIRST) );
  - firstExcept
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static DotPair firstExcept (java.util.Vector<? extends HTMLNode> html, int index, java.lang.String... htmlTags)
    
    This will return the first ancestor node - along with it's closing element - as a DotPair - that matches the input-parameter 'htmlTags' In this case, the term 'except' shall mean that any matches whose HTML Token is among the list in parameter String... htmlTags will be skipped, and a "higher-level" ancestor will be returned instead.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is the index of the node for whose "ancestors" we are searching (to use a Java-Script DOM Tree term).
    
    htmlTags - When this list is non-empty (contains at least one token), the search loop will skip over ancestor nodes that are among the members of this var-args parameter list. If this method is invoked and this parameter is an empty list, then the search loop will return the first anestor node identified.
    
    FOR INSTANCE: If "B" and "P" were passed as parameters to this method, then the search-loop will continue looking for higher-level ancestors - until one was found that was not an HTML 'bold' or 'paragraph' element DotPair
    
    Returns:
    
    This shall return the first sub-list, as a 'DotPair' (start & end index pair). If no matches are found, null will return. This sublist is nearly identical to the Java-Script DOM Tree concept of ancestor-node, though no trees are constructed by this method.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If index is not within the bounds of the passed vectorized-html parameter 'html'
    
    HTMLTokException - If any of the tags passed are null, or not found in the table of class HTMLTags - specifically if they are not valid HTML Elements.
    
    See Also:
    
    FIRST(Vector, int, HTMLTagCounter), ARGCHECK.index(Vector, int)
    
    Code:
    
    Exact Method Body:
    
    return FIRST( html, ARGCHECK.index(html, index), new HTMLTagCounter(htmlTags, HTMLTagCounter.EXCEPT, HTMLTagCounter.FIRST) );
  - all
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<DotPair> all (java.util.Vector<? extends HTMLNode> html, int index, java.lang.String... htmlTags)
    
    This will find all ancestors of a given index. If parameter String... htmlTags is null, all HTML elements will be considered. If this parameter contains any elements, then only those elements shall be considered as match in the ancestor hierarchy tree.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is the index of the node for whose "ancestors" we are searching (to use a Java-Script DOM Tree term).
    
    htmlTags - If this list is empty, we shall look for all ancestor nodes. Since this method returns the first ancestor node-pair found, f this list is left non-empty, then the only ancestor nodes whose HTML Element Tag (usually referred to as "the token") are members of this varargs String parameter list shall be considered eligible as a return result for this method.
    
    FOR INSTANCE: If "DIV", "P", and "A" were listed - the search loop would skip over all ancestors that were not HTML divider, paragraph or anchor elements before selecting a result.
    
    Returns:
    
    This shall return every sub-list, as a 'DotPair' (start & end index pair). If no matches are found, an empty Vector of zero-elements shall return. These sublists are nearly identical to the Java-Script DOM Tree concept of ancestor-nodes, though no trees are constructed by this method.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If index is not within the bounds of the passed vectorized-html parameter 'html'
    
    HTMLTokException - If any of the tags passed are null, or not found in the table of class HTMLTags - specifically if they are not valid HTML Elements.
    
    See Also:
    
    ALL(Vector, int, HTMLTagCounter), ARGCHECK.index(Vector, int)
    
    Code:
    
    Exact Method Body:
    
    return ALL( html, ARGCHECK.index(html, index), new HTMLTagCounter(htmlTags, HTMLTagCounter.NORMAL, HTMLTagCounter.ALL) );
  - allExcept
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<DotPair> allExcept (java.util.Vector<? extends HTMLNode> html, int index, java.lang.String... htmlTags)
    
    This will find all ancestors of a given index. If parameter String... htmlTags is null, all HTML elements will be considered. If this parameter contains any elements, then those elements shall not be considered as a match in the ancestor hierarchy tree.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is the index of the node for whose "ancestors" we are searching (to use a Java-Script DOM Tree term).
    
    htmlTags - When this list is non-empty (contains at least one token), the search loop will skip over ancestor nodes that are among the members of this var-args parameter list. If this method is invoked and this parameter is an empty list, then the search loop will return all ancestor nodes of the index node.
    
    FOR INSTANCE: If "B" and "P" were passed as parameters to this method, then the search-loop which is saving all ancestor matches to it's result-set, would skip over any HTML 'bold' or 'paragraph' DotPair's.
    
    Returns:
    
    This shall return every sub-list, as a 'DotPair' (start & end index pair). If no matches are found, an empty Vector of zero-elements shall return. These sublists are nearly identical to the Java-Script DOM Tree concept of ancestor-nodes, though no trees are constructed by this method.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If index is not within the bounds of the passed vectorized-html parameter 'html'
    
    HTMLTokException - If any of the tags passed are null, or not found in the table of class HTMLTags - specifically if they are not valid HTML Elements.
    
    See Also:
    
    ALL(Vector, int, HTMLTagCounter), ARGCHECK.index(Vector, int)
    
    Code:
    
    Exact Method Body:
    
    return ALL( html, ARGCHECK.index(html, index), new HTMLTagCounter(htmlTags, HTMLTagCounter.EXCEPT, HTMLTagCounter.ALL) );
  - FIRST
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    protected static DotPair FIRST(java.util.Vector<? extends HTMLNode> html, int index, Torello.HTML.HTMLTagCounter tagCounter)
    
    Finds the first ancestor ("surrounding") node pair.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is any index within the bounds of the 'html' parameter.
    
    tagCounter - Any internally used counter, to optimize the search routine.
    
    Returns:
    
    The matching ancestor node's start-and-end index as a 'DotPair'.
    
    See Also:
    
    TagNode, HTMLNode, DotPair, DotPair.isInside(int), Util.Inclusive.dotPairOPT(Vector, int, int)
    
    Code:
    
    Exact Method Body:
    
    int size = html.size(); TagNode tn; DotPair ret; for ( int i=(index-1); (i >= 0) && (! tagCounter.allBanned()); i-- ) if ( ((tn = html.elementAt(i).openTag()) != null) && tagCounter.check(tn) && ((ret = Util.Inclusive.dotPairOPT(html, i, size)) != null) && ret.isInside(index) // isInside(...) Should never fail, but ) // This guarantees to prevent erroneous answers // If there is a match, return that match, and exit immediately. return ret; return null;
  - ALL
    
    🡅 ⇈ ⮫ 🗕 🗗 🗖
    protected static java.util.Vector<DotPair> ALL (java.util.Vector<? extends HTMLNode> html, int index, Torello.HTML.HTMLTagCounter tagCounter)
    
    Finds all ancestor ("surrounding"} node pairs.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    index - This is any index within the bounds of the 'html' parameter.
    
    tagCounter - Any internally used counter, to optimize the search routine.
    
    Returns:
    
    All matching ancestor nodes' start-and-end index pairs inside a Vector<DotPair>
    
    See Also:
    
    TagNode, HTMLNode, DotPair, DotPair.isInside(int), Util.Inclusive.dotPairOPT(Vector, int, int)
    
    Code:
    
    Exact Method Body:
    
    HTMLNode n; TagNode tn; DotPair dp; int size = html.size(); Vector<DotPair> ret = new Vector<>(); for (int i=(index-1); (i >= 0) && (! tagCounter.allBanned()); i--) if ( (n = html.elementAt(i)).isTagNode() && tagCounter.check(tn = (TagNode) n) ) { if ( ((dp = Util.Inclusive.dotPairOPT(html, i, size)) != null) && dp.isInside(index) ) // isInside(...) Should never fail, but // This guarantees to prevent erroneous answers ret.addElement(dp); else // If finding a token match fails, just ignore that token from now on... tagCounter.reportFailed(tn.tok); } return ret;

Methods
Vector	ALL(Vector html, int index, HTMLTagCounter tagCounter)
Vector	all(Vector html, int index, String[] htmlTags)
Vector	allExcept(Vector html, int index, String[] htmlTags)
DotPair	FIRST(Vector html, int index, HTMLTagCounter tagCounter)
DotPair	first(Vector html, int index, String[] htmlTags)
DotPair	firstExcept(Vector html, int index, String[] htmlTags)

Class Surrounding

Substitute for the DOM-Tree concepts of 'parent' and 'ancestor'

Method Summary

Methods inherited from class java.lang.Object

Method Detail

first

firstExcept

all

allExcept

FIRST

ALL