Torello.HTML.NodeSearch

The purpose of these classes is to allow a programmer to "search" through webpages that have been vectorized and downloaded to Java Vector<HTMLNode>.

The following key words are important to understand when deciding on an appropriate search class and search method:

InnerTag: This word means that the attributes inside an HTML TagNode element are used to search for and identify TagNode matches.
TagNode: This implies that only the HTML TagNode element's (TagNode.tok field) will be used to specify search criteria. InnerTag's - a.k.a. 'attributes' - will not be used to specify the search.
TextNode: Use of this word (in a class) shall mean that TagNode elements will be ignored completely, and instead, the "text" inside an HTML page or sub-page is searched by means of 'TextNode' elements.
CommentNode: Use of this word (in a class) shall mean that the search-specifier will ignore all TagNode and TextNode elements, and instead focus on the contents of HTML CommentNode's within an HTML page or sub-page.

The following key words are also important, and will explain some 'Nuances' for the HTML search methods:

Count: This implies that a count of the number of nodes that have matched a specified search criteria shall be computed. Methods in 'Count' classes will always return simple-integers that represent this count.
Find: This implies that integer-arrays, or simple-integers are returned by the methods in any of the classes with the word 'Find' in the class' name. These integers are intended to function as pointers into the underlying Java Vector<HTMLNode>.
Get: This implies that HTMLNode's, themselves (TagNode, TextNode etc...), are returned by the methods in any of these classes. Integer-pointers (a.k.a. the integer-index into the underlying Vector<HTMLNode) are not returned.
Peek: This implies that BOTH the Vector-index AND the HTMLNode found at-that-index-location are SIMULTANEOUSLY returned by the methods in a class having the word 'Peek' in its name. It is here that the (sort-of) 'simple' and 'extra' data-classes 'TagNodeIndex', 'TextNodeIndex', etc... are used. They are for the return values of the 'Peek' methods.
Poll: This refers to the operation of BOTH removing a node from the vectorized-html web-page, AND returning the node (or nodes) that were removed back to the programmer as a return value. Remember, for all methods in classes that have the word 'Poll' in their name, after the method is finished the Vector<HTMLNode> will, indeed, contain fewer elements.
Remove: This implies that neither nodes nor node-pointers are returned, and furthermore the nodes are simply removed from the page. An integer-value stating to the caller exactly how many nodes were removed is returned. Remember, after a 'remove' operation, the initial vectorized-html will contain fewer elements.

`Inclusive:` Similar to JavaScript `'.innerHTML'`

The key-word "inclusive" should probably be explained here. Mostly, "inclusive" is actually quite similar to the Java-Script concept of '.innerHTML'. This object-field is a field in most of the nodes within in a Java-Script DOM Tree. It used to retrieve every node between the opening element ('<DIV ..>' for example) and its corresponding closing-element ('</DIV>').

When a TagNode is searched using either an 'InnerTag-Search' (attribute key-value pair), or a simple TagNode-Search method, the the opening-tag, the closing-tag - and every HTMLNode between these two are returned by 'inclusive' methods.

Search Interfaces

Java Entity	Description
AVT	A functional-interface / lambda-target, and several `static`-builders for generating instances of them, which extends `java`
TextComparitor	A functional-interface / lambda-target that has many pre-instantiated instances that essentially serves a simplified wrappers for the many `String` search and comparison utilities provided in `Torello.Java`

HTMLNode Iterators (HNLI)
Java Entity	Description
AbstractHNLI<E extends HTMLNode,F>	Abstract parent-class for both of the types of HTML-`Iterator's`

HNLI<E extends HTMLNode>	A Java Generic `Iterator`-Class that for iterating `TagNode`, `TextNode` and `CommentNode` instances which match user-provided search-criteria
HNLIInclusive	Iterates 'Inclusive' `TagNode` sublist-matches, which would be similar to iterating the `'.innerHTML'` fields of elements in a JavaScript DOM-Tree

HNLI Builders
Java Entity	Description
CommentNodeIterator	`Static` methods for building and instantiating an `HNLI<CommentNode>` (which extends the basic iterator class) for iterating the comments inside of an HTML-`Vector`, with the help of a user-provided `String-Predicate`, Regular Expression or `TextComparitor` as a search-specifier
TagNodeIterator	`Static` methods for building and instantiating an `HNLI<TagNode>` (which extends the basic iterator class) for iterating the tags inside of an HTML-`Vector`, using explicitly provided match-specifications
TextNodeIterator	`Static` methods for building and instantiating an `HNLI<TextNode>` (which extends the basic iterator class) for iterating the text inside of an HTML-`Vector`, with the help of a user-provided `String-Predicate`, Regular Expression or `TextComparitor` as a search specifier
InnerTagIterator	`Static` methods for building and instantiating an `HNLI<TagNode>` (which extends the basic iterator class) for iterating the tags inside of an HTML-`Vector`, using match-critera which specify attribute name & value requirements

InnerTagInclusiveIterator	`Static` methods for building an `HNLIInclusive` (which also extends the basic `Iterator`) for retrieving sub HTML-Vector's using match-critera which specify HTML Tag Attribute name and value requirements
TagNodeInclusiveIterator	`Static` methods for building an `HNLIInclusive` (which also extends the basic `Iterator`) for retrieving sub HTML-Vector's via user-provided match-critera that specify the HTML Tag-Name

Simple HTML Search Classes
Java Entity	Description
Elements	A simple, demonstrative set of functions for retrieving `HTMLNode's` from a web-page (a 'Workbook Class')
JS	A 'work-book' class included in the Java-HTML JAR-Library, mostly in order to demonstrate the similarities between searching a Java Script DOM-Tree and searching Vectorized-HTML for tags, text and comments

ARGCHECK	This class is used internally to do argument validity-checks, and guarantees consistent exception-message reporting

Count: Matches
Java Entity	Description
TagNodeCount	Counts the number HTML-Tags inside of Vectorized-HTML which match a user-provided search based on the Tag's Name and whether the Tag is an Opening or Closing Tag
TextNodeCount	Counts instances of `TextNode` that match a search-criteria which may be specified using `String-Predicate's`, Regular-Expressions or a `TextComparitor`
CommentNodeCount	Counts HTML Comments which match a user-specified criteria, such as a `String-Predicate`, Regular-Expression or `TextComparitor`
InnerTagCount	Searches Vectorized-HTML for Tag-Matches using Inner-Tag (attribute) names & values as search-criteria, and returns the number of matches that were found

Find: Match-Indices
Java Entity	Description
TagNodeFind	Searches an HTML-Vector for HTML-Tag's that match a specified search-criteria based on Tag-Name and whether the Tag is an opening or closing tag, and returns the Vector-indices for those matches
TextNodeFind	Retrieves indices from Vectorized-HTML that index HTML-Text matching a search-criteria specified by `String-Predicate's`, Regular-Expressions, or a `TextComparitor`
CommentNodeFind	Identifies Vectorized-HTML indices having HTML Comments that match a user-provided criteria, specified using a `String-Predicate`, Regular-Expression or `TextComparitor`
InnerTagFind	Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) name & value, and returns the indices into the `Vector` that identify where those matches were found

Get: HTMLNode-Matches
Java Entity	Description
TagNodeGet	Searches an HTML-Vector for HTML-Tag matches based on Tag-Name and whether or not the Tag is an Opening-Tag, and then returns those matches as instances of `TagNode`
TextNodeGet	Retrieves HTML-Document Text from Vectorized-HTML, as instances of `TextNode`, using a search-criteria that may be specified with `String-Predicate's`, Regular-Expressions, or a `TextComparitor`
CommentNodeGet	Retrieves HTML Comments as `CommentNode`-instances, which match a search-criteria that may be specified using either a `String-Predicate`, Regular-Expression or `TextComparitor`
InnerTagGet	Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) name & value, and returns those matches as instances of `TagNode`

Peek: Retrieve Nodes & Indices
Java Entity	Description
TagNodePeek	Searches an HTML-`Vector` for HTML-Tag's that match a specified search-criteria based on Tag-Name and whether the Tag is an opening or closing tag, and returns both the `Vector`-index and the Tag itself (as instances of `TagNodeIndex`)
TextNodePeek	"Peeks" into Vectorized-HTML for text matching a search-criteria and returns the `Vector`-index where matches are found, and the `TextNode` at that `Vector`-location, as an instance of `TextNodeIndex`
CommentNodePeek	"Peeks" into Vectorized-HTML for Comments matching a search-criteria and returns the `Vector`-index where matches are found, and the `CommentNode` at that `Vector`-location, as an instance of `CommentNodeIndex`
InnerTagPeek	Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) names & values, and returns *both* the index-locatioon *and* the tag as instances of `TagNodeIndex`

Poll: Extract & Return Matches
Java Entity	Description
TagNodePoll	Extracts and returns Tags from an HTML-`Vector` that match a user-specified criteria based on Tag-Name and whether the tag is an openning or closing tag
TextNodePoll	*Both* extracts (removes) *and* returns, from an HTML-`Vector`, instances of `TextNode` that match a user provided `String-Predicate`, Regular Expression or `TextComparitor`
CommentNodePoll	Searches for, and extracts, HTML Comments from Vectorized-HTML that match a user-specified search-criteria, and returns those extracted node or nodes
InnerTagPoll	Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) names & values, and removes and returns the `TagNode` when a match is found

Remove: HTMLNode-Matches
Java Entity	Description
TagNodeRemove	Removes all HTML-Tag's from Vectorized-HTML that match a search-criteria based on Tag-Name and whether the tag is an Opening-Tag or Closing-Tag, then returns a count of the number of tags that were removed
TextNodeRemove	Removes `TextNode` instances from Vectorized-HTML that match a search-criteria specified by a `String-Predicate`, Regular Expression or `TextComparitor`
CommentNodeRemove	Finds and removes HTML Comments from an HTML-`Vector` that match a user-provided search-criteria specified with a `String-Predicate`, Regular-Expression or `TextComparitor`
InnerTagRemove	Removes tags from Vectorized-HTML by matching them using a search-criteria that is specified by the names & values of a tag's attributes (Inner-Tags)

'.innerHTML', By Attribute
Java Entity	Description
InnerTagFindInclusive	Searches for `Vector`-indices of `TagNode` matches using exactly the same criteria offered by class `InnerTagFind`, but also retrieves the corresponding Closing-Tag indices from the `Vector`, and returns both as an instance of `DotPair` (a sublist-pointer)
InnerTagGetInclusive	Searches for `TagNode` matches using exactly the same criteria offered by class `InnerTagGet`, but also retrieves the corresponding Closing-Tag from the `Vector`, and return a new HTML-`Vector` containing this sublist
InnerTagPeekInclusive	Searches for `TagNode` matches, using exactly the same criteria offered by class `InnerTagPeek`, but also retrieves the corresponding Closing-Tag from the `Vector`, and returns them, all nodes & between them, *and* their corresponding `Vector`-index locations as a `SubSection`
InnerTagPollInclusive	Searches for `TagNode` matches, using exactly the same criteria offered by class `InnerTagPoll`, but also obtains the corresponding Closing-Tags from the input-`Vector` and, subsequently, *extracts* these sublists from the input-`Vector` and then *returns* the sublists as new instances of `Vector<HTMLNode>`
InnerTagRemoveInclusive	Finds `TagNode` matches, and removes them with exactly the same means as class `InnerTagRemove` but, additionally, finds the corresponding matching Closing-`TagNode` and continues by removing that node, as well as every node situated between the two

'.innerHTML', By Tag
Java Entity	Description
TagNodeFindInclusive	Searches for `Vector`-indices of `TagNode` matches using exactly the same criteria offered by class `TagNodeFind`, but also retrieves the corresponding Closing-Tag indices from the `Vector`, and returns both as an instance of `DotPair` (a sublist-pointer)
TagNodeGetInclusive	Searches for `TagNode` matches using exactly the same criteria offered by class `TagNodeGet`, but also retrieves the corresponding Closing-Tag from the `Vector`, and return a new HTML-`Vector` containing this sublist
TagNodePeekInclusive	Searches for `TagNode` matches, using exactly the same criteria offered by class `TagNodePeek`, but also retrieves the corresponding Closing-Tag from the `Vector`, and returns them, all nodes & between them, *and* their corresponding `Vector`-index locations as a `SubSection`
TagNodePollInclusive	Searches for `TagNode` matches, using exactly the same criteria offered by class `TagNodePoll`, but also obtains the corresponding Closing-Tags from the input-`Vector` and, subsequently, *extracts* these sublists from the input-`Vector` and then *returns* the sublists as new instances of `Vector<HTMLNode>`
TagNodeRemoveInclusive	Finds `TagNode` matches, and removes them with exactly the same means as class `TagNodeRemove` but, additionally, finds the corresponding matching Closing-`TagNode` and continues by removing that node, as well as every node situated between the two

L1 Inclusive
Java Entity	Description
TagNodePeekL1Inclusive	Retrieves matches using exactly the same logic as both the class `TagNodeGetL1Inclusive` and `TagNodeFindL1Inclusive`, however this class returns both the sub-list end-points (`DotPair`) and the nodes themselves (`Vector<HTMLNode>`) as an instance of `SubSection`
TagNodeGetL1Inclusive	Similar to `TagNodeGetInclusive`, this class searches Vectorized-HTML for `TagNode` matches, and returns the Opening-Tag, the Closing-Tag and all nodes between them as sub-lists (instances of `Vector<HTMLNode>`); *however,* any matches that overlap each-other are eliminated from the result-list
TagNodeFindL1Inclusive	Similar to `TagNodeFindInclusive`, this class searches Vectorized-HTML for `TagNode` matches, and returns the opening and closing Tag-Indices of each match (as an instance of `DotPair`); *however,* any matches that overlap each-other are eliminated from the returned result-list

Exceptions
Java Entity	Description
CSSStrException	An exception used for problems generated by either HTML-Tag in-line CSS `STYLE='...'` attributes, or by explicitly declared CSS `<STYLE TYPE='text/css'>` style-blocks
CursorException	This exception is thrown when attempts to violate a `Cursor Boundary Window Contract` (possibly rendering the window invalid) are made inside any of the methods exported by the `AbstractHNLI, HNLI` or `HNLIInclusive` iterators
HTMLNotFoundException	Indicates that an HTML segment, or a single HTML-Tag, was not found at a particular location on an HTML Page-`Vector` where that piece of HTML was expected
InclusiveException	An Inclusive-Exception indicates that a user has tried to perform an "Inclusive Search" on an HTML Tag that cannot have sub-nodes or descendant-nodes
IteratorOutOfBoundsException	An exception thrown by the `HNLI` & `HNLIInclusive` iterators when, in the processes of modifying or reading the contents of the most recent `Iterator`-Match, a location is specified that's out of the bounds of the `Vector` itself
SecondModificationException	If a second modification is attempted on an HTML-Iterator, before a call to a `next(), previous(), first(), last()` has been invoked, then the HTML-Iterators will throw the `SecondModificationException`
TCCompareStrException	`'Text-Comparitor Compare-String Exception'` is thrown by the argument marshalling and validity checking code when attempts are made to search HTML-`Vector's` using an invalid input-`String[]` parameter

Package Torello.HTML.NodeSearch

Inclusive: Similar to JavaScript '.innerHTML'

`Inclusive:` Similar to JavaScript `'.innerHTML'`