Package Torello.HTML.NodeSearch
Class TagNodeFindL1Inclusive
- java.lang.Object
-
- Torello.HTML.NodeSearch.TagNodeFindL1Inclusive
-
public class TagNodeFindL1Inclusive extends java.lang.Object
TagNodeFindL1Inclusive 🠞
TagNode:
This implies that only HTMLTagNode's
will be used for searching. The fieldTagNode.tok
field is used as a search criteria. Thispublic, final String
field contains the name of the HTML Element - for instance,'div', 'p', 'span', 'img'
, etc...
InnerTag's - (a.k.a. 'attributes') - are not part of the search.
Find:
This implies that integer values are returned by these methods. These integers are intended to serve as pointers into the underlying input JavaVector
.
L1:
The term'L1'
is simply short forLevel-1
, and it refers to finding matches that occur inside or 'within' the bounds of a previous match. To skip-over or avoid matches that occur inside of another, previously identified and returned, match - use an'L1'
search. If a container or "branch" node from an HTMLVector
is wrapped inside another, the inner-container or "inner-branch" will not be included with the search results. This concept similar but not identical to (alludes to) the Java-Script term "sibling" vis-a-vis DOM (Document Object Model) Trees.
IMPORTANT NOTE: The classes in thisJava HTML JAR Library
do not buildDOM Trees
Inclusive:
The word "Inclusive" is used to indicate that allHTMLNode's
between an opening and closing HTML-tag is requested. The concept is extremely similar to the Java-Script feature / "term"'.innerHTML'
, although in this (JavaHTML) JAR Library, noDOM Trees
are ever constructed. This method will return all nodes between the first matchingTagNode
element, and its closingTagNode
element pair.
Similar toTagNodeFindInclusive
, this class searches Vectorized-HTML forTagNode
matches, and returns the opening and closing Tag-Indices of each match (as an instance ofDotPair
); however, any matches that overlap each-other are eliminated from the returned result-list.
The letters L1 literally are just an acronym for "Level 1". When an "Level 1 Inclusive" Get or Find is needed, the user is actually requesting, for instance, only matching HTML-Tags that (if this were a DOM-Tree implementation, which it is not!), matches from the same tree-depth, specifically: a depth of 1-level in the tree will be returned in the result set.
Example: If there were an HTML-Page that included the following TagNode's and TextNode'sFor the elements of the "Unordered List" (HTML<HTML> <HEAD><TITLE>Node SearchExample</TITLE></HEAD> <BODY> <B>In this example, we will see the difference between:</B> <UL> <LI>An 'Inclusive Search', Some HTML list-text here!</LI> <LI>Versus an 'L1 Inclusive Search', More HTML list-text</LI> </UL> <BR /><HR><BR /> <DIV>How are you doing today?<DIV>(Please provide an answer in the form below)</DIV></DIV> <DIV>If you have any questions or complaints, please let us know!</DIV> </BODY></HTML>
<UL>
tag) - an "Inclusive Search" for "<LI>
" Tag's and an "L1 Inclusive Search" for "<LI>
" Tag's would produce the exact same result set. HOWEVER An L1 Inclusive Search for HTML "<DIV>
" Tag's would produce two sublists in the above HTML-Example, but an plain-old Inclusive Search for the same DIV, would produce three sublists!
Example 1 (Inclusive-only, not L1) Results:// An ordinary "inclusive search" where the start-tag, end-tag - and everything between are returned // as two array-boundary end-points (specifically, a "DotPair"). Vector<DotPair> sublists = TagNodeFindInclusive.all(page, "li"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<LI>"; // HTMLNode 1: TextNode.str = "An 'Inclusive Search', Some HTML list-text here!"; // HTMLNode 2: TagNode.str = "</LI>"; // sublists.elementAt(1): // HTMLNode 0: TagNode.str = "<LI>"; // HTMLNode 1: TextNode.str = "Versus an 'L1 Inclusive Search', More HTML list-text"; // HTMLNode 2: TagNode.str = "</LI>";
Example 2 (Inclusive-only, not L1) Results:// Here, an "inclusive search" is performed. Again, the start-tag, end-tag, and everything between them // are returned between the DotPair (array start/end boundaries). // inner matches which are not HTML tree-siblings will also be included Vector<DotPair> l1Sublists = TagNodeFindInclusive.all(page, "div"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "How are you doing today?"; // HTMLNode 2: TagNode.str = "</DIV>"; // HTMLNode 3: TagNode.str = "<DIV>"; // HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 5: TagNode.str = "</DIV>"; // HTMLNode 6: TagNode.str = "</DIV>"; // sublists.elementAt(1): (*** Note that these HTMLNode's are also included in the previous result set) // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 2: TagNode.str = "</DIV>"; // sublists.elementAt(2): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!"; // HTMLNode 2: TagNode.str = "</DIV>";
Example 3 (L1 Inclusive) Results:// Here, an "L1 inclusive (sibling) search" is performed. Again, the start-tag, end-tag, and // everything between them are returned between the DotPair (array start/end boundaries), but inner // matches which are not HTML tree-siblings will be ignored. Vector<DotPair> l1Sublists = TagNodeFindL1Inclusive.all(page, "div"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "How are you doing today?"; // HTMLNode 2: TagNode.str = "</DIV>"; // HTMLNode 3: TagNode.str = "<DIV>"; // HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 5: TagNode.str = "</DIV>"; // HTMLNode 6: TagNode.str = "</DIV>"; // sublists.elementAt(1): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!"; // HTMLNode 2: TagNode.str = "</DIV>";
Another way to explain the "L1 Inclusive" or "Level 1 Inclusive" specification, is that the iterator-pointer that advances through the Java-Vector is advanced to the end of the closing-version of the HTML-tag, while a "plain old Inclusive" search-specification advances the loop-pointer or iterator-pointer to the very next HTMLNode whenever a match is found. This means that in the DIV example above, the "<DIV> ... </DIV> inside of a <DIV> ... </DIV>
" (sometimes called a "sub-div", or a DIV element with a tree-depth of two, would not be returned in the iterator or the vector!)
Methods Available
Method Explanation all (...) Obtain all integer-value node-pointer DotPair
open-and-closing tag-pairs from the HTML Vector that meet the criteria.Method Parameters
Parameter Explanation Vector<? extends HTMLNode> html
This represents any vectorized HTML page, sub-page, or list of partial-elements. int sPos, int ePos
When these parameters are present, only HTMLNode's
that are found between the specifiedVector
indices will be considered for matching with the search criteria.
NOTE: In every situation where the parametersint sPos, int ePos
are used, parameter'ePos'
will accept a negative value, but parameter'sPos'
will not. When'ePos'
is passed a negative-value, the internalLV
('Loop Variable Counter') will have itspublic final int end
field set to the length of the vectorized-html page that was passed. (html.size()
of parameterVector<HTMLNode> html
).
EXCEPTIONS: AnIndexOutOfBoundsException
will be thrown if:- If
sPos
is negative, or ifsPos
is greater-than or equal-to the size of the inputVector
- If
ePos
is zero, or greater than the size of the inputVector
. - If
sPos
is a larger integer thanePos
String htmlTag
When this parameter is present, only HTMLNode's
which are both instances ofclass TagNode
*and* have aTagNode.tok
field whose value is equal to this parameter'htmlTag'
, will be returned as matches.
COMMON EXAMPLES: Some common examples of valid htmlTags are:a, div, img, table, tr, meta
as well as all other valid HTML element-tokens.
NOTE: This comparison is performed using a case-insensitive compare-method.
EXCEPTIONS: If this parameter is not a valid HTML element, anHTMLTokException
will be thrown.Return Values:
Vector<DotPair>
This would be a "list of sub-lists" or an "array of sub-arrays" which are used when multiple results (multiple sub-lists) are needed to be returned to the calling procedure. Such aVector<DotPair>
represent a list of sub-list-pointers into the vectorized-page parameter'html'
, with each integer being a different position in the vector that has a matching TagNode- A zero-length
Vector<DotPair>
vector means no matches were found on the page or sub-page. Zero-length vectors are returned from any method where the possibility existed for multiple-matches being provided as a result-set. Iterator<DotPair>
Returns, one-at-a-time, index-pointersDotPair
of sub-lists or sub-pages into the vectorized-HTML page parameter'html'
.
Hi-Lited Source-Code:This File's Source Code:
- View Here: Torello/HTML/NodeSearch/TagNodeFindL1Inclusive.java
- Open New Browser-Tab: Torello/HTML/NodeSearch/TagNodeFindL1Inclusive.java
File Size: 1,213 Bytes Line Count: 30 '\n' Characters Found
Actual Search Loop Class:
- View Here: SearchLoops/L1Inclusive/TNFindL1Incl.java
- Open New Browser-Tab: SearchLoops/L1Inclusive/TNFindL1Incl.java
File Size: 1,284 Bytes Line Count: 40 '\n' Characters Found
Argument Checker Class:
- View Here: ARGCHECK.java
- Open New Browser-Tab: ARGCHECK.java
File Size: 17,862 Bytes Line Count: 425 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 2 Method(s), 2 declared static
- 0 Field(s)