Package Torello.HTML.NodeSearch
Class TagNodePeekL1Inclusive
- java.lang.Object
-
- Torello.HTML.NodeSearch.TagNodePeekL1Inclusive
-
public class TagNodePeekL1Inclusive extends java.lang.Object
TagNodePeekL1Inclusive 🠞TagNode:This implies that only HTMLTagNode'swill be used for searching. The fieldTagNode.tokfield is used as a search criteria. Thispublic, final Stringfield contains the name of the HTML Element - for instance,'div', 'p', 'span', 'img', etc...
InnerTag's - (a.k.a. 'attributes') - are not part of the search.
Peek:This implies that BOTH theVector-index / indices where a match occurred, AND the theHTMLNodeat that index are SIMULTANEOUSLY returned by these methods - using the data-type classesNodeIndexandSubSection.
L1:The term'L1'is simply short forLevel-1, and it refers to finding matches that occur inside or 'within' the bounds of a previous match. To skip-over or avoid matches that occur inside of another, previously identified and returned, match - use an'L1'search. If a container or "branch" node from an HTMLVectoris wrapped inside another, the inner-container or "inner-branch" will not be included with the search results. This concept similar but not identical to (alludes to) the Java-Script term "sibling" vis-a-vis DOM (Document Object Model) Trees.
IMPORTANT NOTE: The classes in thisJava HTML JAR Librarydo not buildDOM Trees
Inclusive:The word "Inclusive" is used to indicate that allHTMLNode'sbetween an opening and closing HTML-tag is requested. The concept is extremely similar to the Java-Script feature / "term"'.innerHTML', although in this (JavaHTML) JAR Library, noDOM Treesare ever constructed. This method will return all nodes between the first matchingTagNodeelement, and its closingTagNodeelement pair.
Retrieves matches using exactly the same logic as both the classTagNodeGetL1InclusiveandTagNodeFindL1Inclusive, however this class returns both the sub-list end-points (DotPair) and the nodes themselves (Vector<) as an instance ofHTMLNode>SubSection.
The letters L1 literally are just an acronym for "Level 1". When an "Level 1 Inclusive" Get or Find is needed, the user is actually requesting, for instance, only matching HTML-Tags that (if this were a DOM-Tree implementation, which it is not!), matches from the same tree-depth, specifically: a depth of 1-level in the tree will be returned in the result set.
AN EXAMPLE: If there were an HTML-Page that included the following TagNode's and TextNode'sFor the elements of the "Unordered List" (HTML<HTML> <HEAD><TITLE>Node SearchExample</TITLE></HEAD> <BODY> <B>In this example, we will see the difference between:</B> <UL> <LI>An 'Inclusive Search', Some HTML list-text here!</LI> <LI>Versus an 'L1 Inclusive Search', More HTML list-text</LI> </UL> <BR /><HR><BR /> <DIV>How are you doing today?<DIV>(Please provide an answer in the form below)</DIV></DIV> <DIV>If you have any questions or complaints, please let us know!</DIV> </BODY></HTML><UL>tag) - an "Inclusive Search" for "<LI>" Tag's and an "L1 Inclusive Search" for "<LI>" Tag's would produce the exact same result set. HOWEVER An L1 Inclusive Search for HTML "<DIV>" Tag's would produce two sublists in the above HTML-Example, but an plain-old Inclusive Search for the same DIV, would produce three sublists!
Example 1 (Inclusive-only, not L1) Results:// An ordinary "inclusive search" where the start-tag, end-tag - and everything between are returned // as two array-boundary end-points (specifically, a "DotPair"). Vector<DotPair> sublists = TagNodeFindInclusive.all(page, "li"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<LI>"; // HTMLNode 1: TextNode.str = "An 'Inclusive Search', Some HTML list-text here!"; // HTMLNode 2: TagNode.str = "</LI>"; // sublists.elementAt(1): // HTMLNode 0: TagNode.str = "<LI>"; // HTMLNode 1: TextNode.str = "Versus an 'L1 Inclusive Search', More HTML list-text"; // HTMLNode 2: TagNode.str = "</LI>";
Example 2 (Inclusive-only, not L1) Results:// Here, an "inclusive search" is performed. Again, the start-tag, end-tag, and everything between them // are returned between the DotPair (array start/end boundaries). // inner matches which are not HTML tree-siblings will also be included Vector<DotPair> l1Sublists = TagNodeFindInclusive.all(page, "div"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "How are you doing today?"; // HTMLNode 2: TagNode.str = "</DIV>"; // HTMLNode 3: TagNode.str = "<DIV>"; // HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 5: TagNode.str = "</DIV>"; // HTMLNode 6: TagNode.str = "</DIV>"; // sublists.elementAt(1): (*** Note that these HTMLNode's are also included in the previous result set) // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 2: TagNode.str = "</DIV>"; // sublists.elementAt(2): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!"; // HTMLNode 2: TagNode.str = "</DIV>";
Example 3 (L1 Inclusive) Results:// Here, an "L1 inclusive (sibling) search" is performed. Again, the start-tag, end-tag, and // everything between them are returned between the DotPair (array start/end boundaries), but inner // matches which are not HTML tree-siblings will be ignored. Vector<DotPair> l1Sublists = TagNodeFindL1Inclusive.all(page, "div"); // sublists would contain the following array/vector boundaries as dotted-pairs: // sublists.elementAt(0): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "How are you doing today?"; // HTMLNode 2: TagNode.str = "</DIV>"; // HTMLNode 3: TagNode.str = "<DIV>"; // HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)"; // HTMLNode 5: TagNode.str = "</DIV>"; // HTMLNode 6: TagNode.str = "</DIV>"; // sublists.elementAt(1): // HTMLNode 0: TagNode.str = "<DIV>"; // HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!"; // HTMLNode 2: TagNode.str = "</DIV>";
Another way to explain the "L1 Inclusive" or "Level 1 Inclusive" specification, is that the iterator-pointer that advances through the Java-Vector is advanced to the end of the closing-version of the HTML-tag, while a "plain old Inclusive" search-specification advances the loop-pointer or iterator-pointer to the very next HTMLNode whenever a match is found. This means that in the DIV example above, the "<DIV> ... </DIV> inside of a <DIV> ... </DIV>" (sometimes called a "sub-div", or a DIV element with a tree-depth of two, would not be returned in the iterator or the vector!)
Methods Available
Method Explanation all (...) Obtain all sub-lists which do not have any overlap (the meaning of 'L1') from the vectorized-html webpage that meet the criteria. Method Parameters
Parameter Explanation Vector<? extends HTMLNode> htmlThis represents any vectorized HTML page, sub-page, or list of partial-elements. int sPos, int ePosWhen these parameters are present, only HTMLNode'sthat are found between the specifiedVectorindices will be considered for matching with the search criteria.
NOTE: In every situation where the parametersint sPos, int ePosare used, parameter'ePos'will accept a negative value, but parameter'sPos'will not. When'ePos'is passed a negative-value, the internalLV('Loop Variable Counter') will have itspublic final int end;field set to the length of the vectorized-html page that was passed. (html.size()of parameterVector<HTMLNode> html).
EXCEPTIONS: AnIndexOutOfBoundsExceptionwill be thrown if:- If
sPosis negative, or ifsPosis greater-than or equal-to the size of the inputVector - If
ePosis zero, or greater than the size of the inputVector. - If
sPosis a larger integer thanePos
String htmlTagWhen this parameter is present, only HTMLNode'swhich are both instances ofclass TagNode*and* have aTagNode.tokfield whose value is equal to this parameter'htmlTag', will be returned as matches.
COMMON EXAMPLES: Some common examples of valid htmlTags are:a, div, img, table, tr, metaas well as all other valid HTML element-tokens.
NOTE: This comparison is performed using a case-insensitive compare-method.
EXCEPTIONS: If this parameter is not a valid HTML element, anHTMLTokExceptionwill be thrown.Return Values:
- Vector<Vector<HTMLNode>> This would be a "list of sub-lists" or an "array of sub-arrays" which are used when multiple results (multiple sub-lists) are needed to be returned to the calling procedure.
- A zero-length
Vector<Vector<HTMLNode>>vector means no matches were found on the page or sub-page. Zero-length vectors are returned from any method where the possibility existed for multiple-matches being provided as a result-set.
Hi-Lited Source-Code:This File's Source Code:
- View Here: Torello/HTML/NodeSearch/TagNodePeekL1Inclusive.java
- Open New Browser-Tab: Torello/HTML/NodeSearch/TagNodePeekL1Inclusive.java
File Size: 1,240 Bytes Line Count: 28 '\n' Characters Found
Actual Search Loop Class:
- View Here: SearchLoops/L1Inclusive/TNPeekL1Incl.java
- Open New Browser-Tab: SearchLoops/L1Inclusive/TNPeekL1Incl.java
File Size: 1,339 Bytes Line Count: 40 '\n' Characters Found
Argument Checker Class:
- View Here: ARGCHECK.java
- Open New Browser-Tab: ARGCHECK.java
File Size: 17,874 Bytes Line Count: 425 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctionalAnnotation may also be called 'The Spaghetti Report'.Static-Functionalclasses are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@StatelessAnnotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 2 Method(s), 2 declared static
- 0 Field(s)
-
-
Method Summary
All Matches Modifier and Type Method static Vector<SubSection>all(Vector<? extends HTMLNode> html, String htmlTag)All Matches, Range Limited Modifier and Type Method static Vector<SubSection>all(Vector<? extends HTMLNode> html, int sPos, int ePos, String htmlTag)
-