Package Torello.HTML.NodeSearch
The purpose of these classes is to allow a programmer to "search" through webpages that have
been vectorized and downloaded to Java
The following key words are important to understand when deciding on an appropriate search class and search method:
The following key words are also important, and will explain some 'Nuances' for the HTML search methods:
The key-word "inclusive" should probably be explained here. Mostly, "inclusive" is actually quite similar to the Java-Script concept of
When a
Vector<HTMLNode>
.
The following key words are important to understand when deciding on an appropriate search class and search method:
-
InnerTag:
This word means that the attributes inside an HTMLTagNode
element are used to search for and identifyTagNode
matches. -
TagNode:
This implies that only the HTMLTagNode
element's (TagNode.tok
field) will be used to specify search criteria. InnerTag's - a.k.a. 'attributes' - will not be used to specify the search. -
TextNode:
Use of this word (in a class) shall mean thatTagNode
elements will be ignored completely, and instead, the "text" inside an HTML page or sub-page is searched by means of 'TextNode
' elements. -
CommentNode:
Use of this word (in a class) shall mean that the search-specifier will ignore allTagNode
andTextNode
elements, and instead focus on the contents of HTMLCommentNode
's
within an HTML page or sub-page.
The following key words are also important, and will explain some 'Nuances' for the HTML search methods:
Count:
This implies that a count of the number of nodes that have matched a specified search criteria shall be computed. Methods in'Count'
classes will always return simple-integers that represent this count.Find:
This implies that integer-arrays, or simple-integers are returned by the methods in any of the classes with the word'Find'
in the class' name. These integers are intended to function as pointers into the underlying JavaVector<HTMLNode>
.Get:
This implies thatHTMLNode's
, themselves (TagNode, TextNode
etc...), are returned by the methods in any of these classes. Integer-pointers (a.k.a. the integer-index into the underlyingVector<HTMLNode
) are not returned.Peek:
This implies that BOTH theVector
-index AND theHTMLNode
found at-that-index-location are SIMULTANEOUSLY returned by the methods in a class having the word'Peek'
in its name. It is here that the (sort-of) 'simple' and 'extra' data-classes'TagNodeIndex', 'TextNodeIndex'
, etc... are used. They are for the return values of the'Peek'
methods.Poll:
This refers to the operation of BOTH removing a node from the vectorized-html web-page, AND returning the node (or nodes) that were removed back to the programmer as a return value. Remember, for all methods in classes that have the word'Poll'
in their name, after the method is finished theVector<HTMLNode>
will, indeed, contain fewer elements.Remove:
This implies that neither nodes nor node-pointers are returned, and furthermore the nodes are simply removed from the page. An integer-value stating to the caller exactly how many nodes were removed is returned. Remember, after a'remove'
operation, the initial vectorized-html will contain fewer elements.
Inclusive:
Similar to JavaScript
'.innerHTML'
The key-word "inclusive" should probably be explained here. Mostly, "inclusive" is actually quite similar to the Java-Script concept of
'.innerHTML'
. This object-field
is a field in most of the nodes within in a Java-Script DOM Tree. It used to retrieve every
node between the opening element ('<DIV ..>'
for example) and its corresponding
closing-element ('</DIV>'
).
When a
TagNode
is searched using either an 'InnerTag-Search'
(attribute key-value pair), or a simple TagNode
-Search
method, the the opening-tag, the closing-tag - and every HTMLNode
between these
two are returned by 'inclusive'
methods.
-
Search Interfaces Java Entity Description AVT A functional-interface / lambda-target, and severalstatic
-builders for generating instances of them, which extendsjava
TextComparitor A functional-interface / lambda-target that has many pre-instantiated instances that essentially serves a simplified wrappers for the manyString
search and comparison utilities provided inTorello.Java
HTMLNode Iterators (HNLI) Java Entity Description AbstractHNLI<E extends HTMLNode,F> Abstract parent-class for both of the types of HTML-Iterator's
HNLI<E extends HTMLNode> HNLIInclusive Iterates 'Inclusive'TagNode
sublist-matches, which would be similar to iterating the'.innerHTML'
fields of elements in a JavaScript DOM-TreeHNLI Builders Java Entity Description CommentNodeIterator Static
methods for building and instantiating anHNLI
<
CommentNode
>
(which extends the basic iterator class) for iterating the comments inside of an HTML-Vector
, with the help of a user-providedString-Predicate
, Regular Expression orTextComparitor
as a search-specifierTagNodeIterator TextNodeIterator Static
methods for building and instantiating anHNLI
<
TextNode
>
(which extends the basic iterator class) for iterating the text inside of an HTML-Vector
, with the help of a user-providedString-Predicate
, Regular Expression orTextComparitor
as a search specifierInnerTagIterator InnerTagInclusiveIterator Static
methods for building anHNLIInclusive
(which also extends the basicIterator
) for retrieving sub HTML-Vector's using match-critera which specify HTML Tag Attribute name and value requirementsTagNodeInclusiveIterator Static
methods for building anHNLIInclusive
(which also extends the basicIterator
) for retrieving sub HTML-Vector's via user-provided match-critera that specify the HTML Tag-NameSimple HTML Search Classes Java Entity Description Elements A simple, demonstrative set of functions for retrievingHTMLNode's
from a web-page (a 'Workbook Class')JS A 'work-book' class included in the Java-HTML JAR-Library, mostly in order to demonstrate the similarities between searching a Java Script DOM-Tree and searching Vectorized-HTML for tags, text and commentsARGCHECK This class is used internally to do argument validity-checks, and guarantees consistent exception-message reportingCount: Matches Java Entity Description TagNodeCount Counts the number HTML-Tags inside of Vectorized-HTML which match a user-provided search based on the Tag's Name and whether the Tag is an Opening or Closing TagTextNodeCount Counts instances ofTextNode
that match a search-criteria which may be specified usingString-Predicate's
, Regular-Expressions or aTextComparitor
CommentNodeCount Counts HTML Comments which match a user-specified criteria, such as aString-Predicate
, Regular-Expression orTextComparitor
InnerTagCount Searches Vectorized-HTML for Tag-Matches using Inner-Tag (attribute) names & values as search-criteria, and returns the number of matches that were foundFind: Match-Indices Java Entity Description TagNodeFind Searches an HTML-Vector for HTML-Tag's that match a specified search-criteria based on Tag-Name and whether the Tag is an opening or closing tag, and returns the Vector-indices for those matchesTextNodeFind Retrieves indices from Vectorized-HTML that index HTML-Text matching a search-criteria specified byString-Predicate's
, Regular-Expressions, or aTextComparitor
CommentNodeFind Identifies Vectorized-HTML indices having HTML Comments that match a user-provided criteria, specified using aString-Predicate
, Regular-Expression orTextComparitor
InnerTagFind Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) name & value, and returns the indices into theVector
that identify where those matches were foundGet: HTMLNode-Matches Java Entity Description TagNodeGet Searches an HTML-Vector for HTML-Tag matches based on Tag-Name and whether or not the Tag is an Opening-Tag, and then returns those matches as instances ofTagNode
TextNodeGet Retrieves HTML-Document Text from Vectorized-HTML, as instances ofTextNode
, using a search-criteria that may be specified withString-Predicate's
, Regular-Expressions, or aTextComparitor
CommentNodeGet Retrieves HTML Comments asCommentNode
-instances, which match a search-criteria that may be specified using either aString-Predicate
, Regular-Expression orTextComparitor
InnerTagGet Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) name & value, and returns those matches as instances ofTagNode
Peek: Retrieve Nodes & Indices Java Entity Description TagNodePeek Searches an HTML-Vector
for HTML-Tag's that match a specified search-criteria based on Tag-Name and whether the Tag is an opening or closing tag, and returns both theVector
-index and the Tag itself (as instances ofTagNodeIndex
)TextNodePeek "Peeks" into Vectorized-HTML for text matching a search-criteria and returns theVector
-index where matches are found, and theTextNode
at thatVector
-location, as an instance ofTextNodeIndex
CommentNodePeek "Peeks" into Vectorized-HTML for Comments matching a search-criteria and returns theVector
-index where matches are found, and theCommentNode
at thatVector
-location, as an instance ofCommentNodeIndex
InnerTagPeek Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) names & values, and returns both the index-locatioon and the tag as instances ofTagNodeIndex
Poll: Extract & Return Matches Java Entity Description TagNodePoll Extracts and returns Tags from an HTML-Vector
that match a user-specified criteria based on Tag-Name and whether the tag is an openning or closing tagTextNodePoll Both extracts (removes) and returns, from an HTML-Vector
, instances ofTextNode
that match a user providedString-Predicate
, Regular Expression orTextComparitor
CommentNodePoll Searches for, and extracts, HTML Comments from Vectorized-HTML that match a user-specified search-criteria, and returns those extracted node or nodesInnerTagPoll Searches Vectorized-HTML for Tag-Matches by Inner-Tag (attribute) names & values, and removes and returns theTagNode
when a match is foundRemove: HTMLNode-Matches Java Entity Description TagNodeRemove Removes all HTML-Tag's from Vectorized-HTML that match a search-criteria based on Tag-Name and whether the tag is an Opening-Tag or Closing-Tag, then returns a count of the number of tags that were removedTextNodeRemove RemovesTextNode
instances from Vectorized-HTML that match a search-criteria specified by aString-Predicate
, Regular Expression orTextComparitor
CommentNodeRemove Finds and removes HTML Comments from an HTML-Vector
that match a user-provided search-criteria specified with aString-Predicate
, Regular-Expression orTextComparitor
InnerTagRemove Removes tags from Vectorized-HTML by matching them using a search-criteria that is specified by the names & values of a tag's attributes (Inner-Tags)'.innerHTML', By Attribute Java Entity Description InnerTagFindInclusive Searches forVector
-indices ofTagNode
matches using exactly the same criteria offered by classInnerTagFind
, but also retrieves the corresponding Closing-Tag indices from theVector
, and returns both as an instance ofDotPair
(a sublist-pointer)InnerTagGetInclusive Searches forTagNode
matches using exactly the same criteria offered by classInnerTagGet
, but also retrieves the corresponding Closing-Tag from theVector
, and return a new HTML-Vector
containing this sublistInnerTagPeekInclusive Searches forTagNode
matches, using exactly the same criteria offered by classInnerTagPeek
, but also retrieves the corresponding Closing-Tag from theVector
, and returns them, all nodes & between them, and their correspondingVector
-index locations as aSubSection
InnerTagPollInclusive Searches forTagNode
matches, using exactly the same criteria offered by classInnerTagPoll
, but also obtains the corresponding Closing-Tags from the input-Vector
and, subsequently, extracts these sublists from the input-Vector
and then returns the sublists as new instances ofVector<HTMLNode>
InnerTagRemoveInclusive FindsTagNode
matches, and removes them with exactly the same means as classInnerTagRemove
but, additionally, finds the corresponding matching Closing-TagNode
and continues by removing that node, as well as every node situated between the two'.innerHTML', By Tag Java Entity Description TagNodeFindInclusive Searches forVector
-indices ofTagNode
matches using exactly the same criteria offered by classTagNodeFind
, but also retrieves the corresponding Closing-Tag indices from theVector
, and returns both as an instance ofDotPair
(a sublist-pointer)TagNodeGetInclusive Searches forTagNode
matches using exactly the same criteria offered by classTagNodeGet
, but also retrieves the corresponding Closing-Tag from theVector
, and return a new HTML-Vector
containing this sublistTagNodePeekInclusive Searches forTagNode
matches, using exactly the same criteria offered by classTagNodePeek
, but also retrieves the corresponding Closing-Tag from theVector
, and returns them, all nodes & between them, and their correspondingVector
-index locations as aSubSection
TagNodePollInclusive Searches forTagNode
matches, using exactly the same criteria offered by classTagNodePoll
, but also obtains the corresponding Closing-Tags from the input-Vector
and, subsequently, extracts these sublists from the input-Vector
and then returns the sublists as new instances ofVector<HTMLNode>
TagNodeRemoveInclusive FindsTagNode
matches, and removes them with exactly the same means as classTagNodeRemove
but, additionally, finds the corresponding matching Closing-TagNode
and continues by removing that node, as well as every node situated between the twoL1 Inclusive Java Entity Description TagNodePeekL1Inclusive Retrieves matches using exactly the same logic as both the classTagNodeGetL1Inclusive
andTagNodeFindL1Inclusive
, however this class returns both the sub-list end-points (DotPair
) and the nodes themselves (Vector<
) as an instance ofHTMLNode
>SubSection
TagNodeGetL1Inclusive Similar toTagNodeGetInclusive
, this class searches Vectorized-HTML forTagNode
matches, and returns the Opening-Tag, the Closing-Tag and all nodes between them as sub-lists (instances ofVector<HTMLNode>
); however, any matches that overlap each-other are eliminated from the result-listTagNodeFindL1Inclusive Similar toTagNodeFindInclusive
, this class searches Vectorized-HTML forTagNode
matches, and returns the opening and closing Tag-Indices of each match (as an instance ofDotPair
); however, any matches that overlap each-other are eliminated from the returned result-listExceptions Java Entity Description CSSStrException An exception used for problems generated by either HTML-Tag in-line CSSSTYLE='...'
attributes, or by explicitly declared CSS<STYLE TYPE='text/css'>
style-blocksCursorException This exception is thrown when attempts to violate aCursor Boundary Window Contract
(possibly rendering the window invalid) are made inside any of the methods exported by theAbstractHNLI, HNLI
orHNLIInclusive
iteratorsHTMLNotFoundException Indicates that an HTML segment, or a single HTML-Tag, was not found at a particular location on an HTML Page-Vector
where that piece of HTML was expectedInclusiveException An Inclusive-Exception indicates that a user has tried to perform an "Inclusive Search" on an HTML Tag that cannot have sub-nodes or descendant-nodesIteratorOutOfBoundsException An exception thrown by theHNLI
&HNLIInclusive
iterators when, in the processes of modifying or reading the contents of the most recentIterator
-Match, a location is specified that's out of the bounds of theVector
itselfSecondModificationException If a second modification is attempted on an HTML-Iterator, before a call to anext(), previous(), first(), last()
has been invoked, then the HTML-Iterators will throw theSecondModificationException
TCCompareStrException 'Text-Comparitor Compare-String Exception'
is thrown by the argument marshalling and validity checking code when attempts are made to search HTML-Vector's
using an invalid input-String[]
parameter