Package Torello.HTML.NodeSearch
Class Elements
- java.lang.Object
-
- Torello.HTML.NodeSearch.Elements
-
public class Elements extends java.lang.Object
A simple, demonstrative set of functions for retrievingHTMLNode's
from a web-page (a 'Workbook Class').
"Legacy Class," used early-on to help explain HTML Vectors
This was a preliminary demonstration-version of how to use the NodeSearch Package
The exact reason to have included this class is not so obvious. Yes, it is useful to traverse HTML tables in Java. However, to the novice user who doesn't quite understand how the words "Find" and "Get" really relate to HTMLNode Vectors, using thesenhigher-level search functions might make things easier. If the words "TagNode" (which is, sort-of, opposite a "TextNode") still doesn't make so much sense - here, all a programmer really ought to do is download an HTML page into a page vector to where it is in the formatVector<HTMLNode>
- and then try searching for any of the commonly found HTML elements in that page.
The actual purpose of this class is to see how to use the classes in Node-Search, with-ease. There are only a few methods (about 10), and they show the uses of the node-search operations by providing the code inside the method body inside this method-declarations of this Javadoc page. Think of this as a "work-book."
JavaScript:
// NOTE: Mostly, if you are familiar with JavaScript, this will make sense: // Java-Script for obtaining the HTML-Content of a divider "<DIV>" element. var html = document.getElementById("main-content").innerHTML; // for-example var nodes = document.getElementsByClassName("article-footer");
The script, as above, will essentially translate to calls such as:
// Java-HTML Scrape Package means of doing the same thing (almost, but not identical) Vector<HTMLNode> subPage = InnerTagGetInclusive.first(some_page, "id", TextComparitor.EQ_CI_TRM, "main-content"); Vector<TagNode> tn = InnerTagGet.all(some_page, "class", TextComparitor.C, "article-footer");
FIND & GET:
Node-search methods that use the term "Find" retrieve the node's integer-position inside the pageVector
, while methods that use the term "Get" return the node itself. There is no CSS-selector corollary to this difference, primarily because Java-Script'sDocument Object Model
a.k.a."the DOM-Tree"
), is, well, a tree!
This package uses array-like javaVector's
- instead of Tree's. Java-Vector's
provides an extreme amount of simplicity when dealing with web-pages that have any readable text. Primarily, because humans generally think in terms of "sentences" rather than "trees," looking, parsing and even translating content is much easier this way.
INCLUSIVE:
Node-search methods that use the term "Inclusive" retrieve the entire list of nodes (or integer node-pointers) between the opening and closing version of the tag and attributes that your are searching. They are "a tautology" to Java-Script's"someElement.innerHTML"
.
If the term "Inclusive" is not present, only the opening-TagNode
itself, or the opening-TagNode's
index in the HTML Page-Vector
will be returned.
If one callsDotPair dp = Elements.findTable(someHTMLPage);
theDotPair
variable that is returned from this function will delineate / demarcate the starting and ending positions within theVector<HTMLNode>
that constitute the first HTML-'Table'
structure found on the web-page.
FOR EXAMPLE:
If one calls the following method:
Vector<HTMLNode> list = Elements.getOL(someHTMLPage);
TheVector
that is returned will be the entire sub-set ofHTMLNode's
copied from the original page-Vector
(variable'someHTMLPage'
) that comprise the very first HTML<OL> ... </OL>
(Ordered List) Element found on this page.
Hi-Lited Source-Code:- View Here: Torello/HTML/NodeSearch/Elements.java
- Open New Browser-Tab: Torello/HTML/NodeSearch/Elements.java
File Size: 39,130 Bytes Line Count: 846 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 35 Method(s), 35 declared static
- 0 Field(s)
-
-
Method Summary
Retrieve Title-String Modifier and Type Method static String
titleString(Vector<? extends HTMLNode> html)
FIND: Retrieve Vector-Indices Modifier and Type Method static Vector<DotPair>
findAllLI(Vector<? extends HTMLNode> list)
static Vector<DotPair>
findAllOption(Vector<? extends HTMLNode> selectList)
static DotPair
findBody(Vector<? extends HTMLNode> html)
static DotPair
findHead(Vector<? extends HTMLNode> html)
static int[]
findLink(Vector<? extends HTMLNode> html)
static int[]
findMeta(Vector<? extends HTMLNode> html)
static DotPair
findOL(Vector<? extends HTMLNode> html)
static DotPair
findOL(Vector<? extends HTMLNode> html, int sPos, int ePos)
static DotPair
findSelect(Vector<? extends HTMLNode> html)
static DotPair
findSelect(Vector<? extends HTMLNode> html, int sPos, int ePos)
static DotPair
findTable(Vector<? extends HTMLNode> html)
static DotPair
findTable(Vector<? extends HTMLNode> html, int sPos, int ePos)
static DotPair
findTitle(Vector<? extends HTMLNode> html)
static DotPair
findUL(Vector<? extends HTMLNode> html)
static DotPair
findUL(Vector<? extends HTMLNode> html, int sPos, int ePos)
GET: Retrieve HTMLNode's Modifier and Type Method static Vector<Vector<HTMLNode>>
getAllLI(Vector<? extends HTMLNode> list)
static Vector<Vector<HTMLNode>>
getAllOption(Vector<? extends HTMLNode> selectList)
static Vector<HTMLNode>
getBody(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getHead(Vector<? extends HTMLNode> html)
static Vector<TagNode>
getLink(Vector<? extends HTMLNode> html)
static Vector<TagNode>
getMeta(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getOL(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getOL(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
getSelect(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getSelect(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
getTable(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getTable(Vector<? extends HTMLNode> html, int sPos, int ePos)
static Vector<HTMLNode>
getTitle(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getUL(Vector<? extends HTMLNode> html)
static Vector<HTMLNode>
getUL(Vector<? extends HTMLNode> html, int sPos, int ePos)
Protected Exception-Check Methods Modifier and Type Method protected static String
checkEndPoints(Vector<? extends HTMLNode> list, int sPos, int ePos, String... tokList)
protected static String
checkEndPoints(Vector<? extends HTMLNode> list, String... tokList)
protected static void
checkL1(Vector<? extends HTMLNode> list, int sPos, int ePos, Vector<DotPair> sublists)
protected static void
checkL1(Vector<? extends HTMLNode> list, Vector<DotPair> sublists)
-
-
-
Method Detail
-
findBody
public static DotPair findBody(java.util.Vector<? extends HTMLNode> html)
Retrieves the start and end points of the web-page body in the underlying HTML page-Vector
. All nodes between<BODY> ... </BODY>
will be included.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
InnerTagFindInclusive
- Code:
- Exact Method Body:
return InnerTagFindInclusive.first(html, "body");
-
getBody
public static java.util.Vector<HTMLNode> getBody (java.util.Vector<? extends HTMLNode> html)
Gets the nodes of the web-page body. All nodes between<BODY> ... </BODY>
will be included.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
InnerTagGetInclusive
- Code:
- Exact Method Body:
return InnerTagGetInclusive.first(html, "body");
-
findHead
public static DotPair findHead(java.util.Vector<? extends HTMLNode> html)
Retrieves the start and end points of the web-page header in the underlying HTML page-Vector
. All nodes between<HEAD> ... </HEAD>
will be included.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
InnerTagFindInclusive
- Code:
- Exact Method Body:
return InnerTagFindInclusive.first(html, "head");
-
getHead
public static java.util.Vector<HTMLNode> getHead (java.util.Vector<? extends HTMLNode> html)
Gets the nodes of the web-page header. All nodes between<HEAD> ... </HEAD>
will be included.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
InnerTagGetInclusive
- Code:
- Exact Method Body:
return InnerTagGetInclusive.first(html, "head");
-
findMeta
public static int[] findMeta(java.util.Vector<? extends HTMLNode> html)
Gets all<META NAME="..." CONTENT="...">
(or<META CHARSET="...">
and<META HTTP-EQUIV="...">
) elements in a web-page header - returned via their position in the page-Vector
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML Elements, as an integer-array list of index-pointers to
the underlying
Vector
. - See Also:
TagNodeFind
- Code:
- Exact Method Body:
return TagNodeFind.all(html, TC.OpeningTags, "meta");
-
getMeta
public static java.util.Vector<TagNode> getMeta (java.util.Vector<? extends HTMLNode> html)
Gets all<META NAME="..." CONTENT="...">
(or<META CHARSET="...">
and<META HTTP-EQUIV="...">
) elements in a web-page header.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML Elements, as
TagNode's
, in a returnVector
. - See Also:
TagNodeGet
- Code:
- Exact Method Body:
return TagNodeGet.all(html, TC.OpeningTags, "meta");
-
findLink
public static int[] findLink(java.util.Vector<? extends HTMLNode> html)
Gets all<LINK REL="..." HREF="...">
elements in a web-page header - returned via their position in the page-Vector
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML Elements, as an integer-array list of index-pointers to
the underlying
Vector
. - See Also:
TagNodeFind
- Code:
- Exact Method Body:
return TagNodeFind.all(html, TC.OpeningTags, "link");
-
getLink
public static java.util.Vector<TagNode> getLink (java.util.Vector<? extends HTMLNode> html)
Gets all<LINK REL="..." HREF="...">
elements in a web-page header.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML Elements, as
TagNode's
, in a returnVector
. - See Also:
TagNodeGet
- Code:
- Exact Method Body:
return TagNodeGet.all(html, TC.OpeningTags, "link");
-
findTitle
public static DotPair findTitle(java.util.Vector<? extends HTMLNode> html)
Returns the start and end positions in the page-Vector
of the HTML<TITLE>...</TITLE>
elements.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
InnerTagFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, "title");
-
getTitle
public static java.util.Vector<HTMLNode> getTitle (java.util.Vector<? extends HTMLNode> html)
Returns the<TITLE>...</TITLE>
elements sub-list from the HTML page-Vector
.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
InnerTagGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, "title");
-
titleString
public static java.lang.String titleString (java.util.Vector<? extends HTMLNode> html)
Returns theString
encapsulated by the HTML'HEAD'
-section's"<TITLE>...</TITLE>"
element, if there such an element. If there is no such element, null is returned. If there is a'TITLE'
element, but it has the empty-String
(zero-length-string) an emptyString
is returned.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage. Retrieves the'TITLE'
of an HTML page - by getting theString
-text between the'TITLE'
elements.- Returns:
- The title string
- Code:
- Exact Method Body:
Vector<HTMLNode> title = getTitle(html); if (title == null) return null; return Util.textNodesString(title);
-
findTable
public static DotPair findTable(java.util.Vector<? extends HTMLNode> html)
This method will find the very first HTML'TABLE'
(<TABLE> <TH>...</TH> <TR> <TD>..</TD> ... </TR> ... </TABLE>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, "table");
-
findTable
public static DotPair findTable(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first HTML'TABLE'
(<TABLE> <TH>...</TH> <TR> <TD>..</TD> ... </TR> ... </TABLE>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, sPos, ePos, "table");
-
getTable
public static java.util.Vector<HTMLNode> getTable (java.util.Vector<? extends HTMLNode> html)
This method will get the very first HTML'TABLE'
(<TABLE> <TR> <TH>...</TH> </TR> <TR> <TD>..</TD> ... </TR> ... </TABLE>
) element set. This returns a sub-Vector
(an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, "table");
-
getTable
public static java.util.Vector<HTMLNode> getTable (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will get the very first HTML'TABLE'
(<TABLE> <TH>...</TH> <TR> <TD>..</TD> ... </TR> ... </TABLE>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The requested HTML sublist, as a
Vector
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, sPos, ePos, "table");
-
findSelect
public static DotPair findSelect (java.util.Vector<? extends HTMLNode> html)
This method will find the very first first HTML'SELECT-OPTION'
set. (<SELECT> ... <OPTION> ... </OPTION> .. </SELECT>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, "select");
-
findSelect
public static DotPair findSelect (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first first HTML'SELECT-OPTION'
set. (<SELECT> ... <OPTION> ... </OPTION> .. </SELECT>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, sPos, ePos, "select");
-
getSelect
public static java.util.Vector<HTMLNode> getSelect (java.util.Vector<? extends HTMLNode> html)
This method will find the very first first HTML'SELECT-OPTION'
set. (<SELECT> ... <OPTION> ... </OPTION> .. </SELECT>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair.) This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, "select");
-
getSelect
public static java.util.Vector<HTMLNode> getSelect (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first first HTML'SELECT-OPTION'
set. (<SELECT> ... <OPTION> ... </OPTION> .. </SELECT>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The requested HTML sublist, as a
Vector
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, sPos, ePos, "select");
-
findUL
public static DotPair findUL(java.util.Vector<? extends HTMLNode> html)
This method will find the very first HTML Un-Ordered List (<UL> ..<LI>...</LI> ... </UL>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, "ul");
-
findUL
public static DotPair findUL(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first HTML Un-Ordered List (<UL> ..<LI>...</LI> ... </UL>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, sPos, ePos, "ul");
-
getUL
public static java.util.Vector<HTMLNode> getUL (java.util.Vector<? extends HTMLNode> html)
This method will find the very first HTML Un-Ordered List (<UL> ..<LI>...</LI> ... </UL>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, "ul");
-
getUL
public static java.util.Vector<HTMLNode> getUL (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first HTML Un-Ordered List (<UL> ..<LI>...</LI> ... </UL>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The requested HTML sublist, as a
Vector
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, sPos, ePos, "ul");
-
findOL
public static DotPair findOL(java.util.Vector<? extends HTMLNode> html)
This method will find the very first HTML Un-Ordered List (<OL> ..<LI>...</LI> ... </OL>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, "ol");
-
findOL
public static DotPair findOL(java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first HTML Un-Ordered List (<OL> ..<LI>...</LI> ... </OL>
) element set. This returns theVector
Position starting and ending boundariesDotPair.start, DotPair.end
rather than pointer-references to the nodes. This is what the'FIND'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The start and end index pointers, as a
DotPair
, of the HTML requested HTML sublist. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeFindInclusive
- Code:
- Exact Method Body:
return TagNodeFindInclusive.first(html, sPos, ePos, "ol");
-
getOL
public static java.util.Vector<HTMLNode> getOL (java.util.Vector<? extends HTMLNode> html)
This method will find the very first HTML Un-Ordered List (<OL> ..<LI>...</LI> ... </OL>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.- Returns:
- The requested HTML sublist, as a
Vector
. - See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, "ol");
-
getOL
public static java.util.Vector<HTMLNode> getOL (java.util.Vector<? extends HTMLNode> html, int sPos, int ePos)
This method will find the very first HTML Un-Ordered List (<OL> ..<LI>...</LI> ... </OL>
) element set. This returns a sub-vector (an actualVector<HTMLNode>
object, not aVector / array
starting and ending indices pair). This is what the'GET'
keyword usually means in this HTML-Scrape package.- Parameters:
html
- This may be any Vectorized-HTML Web-Page (or sub-page).
The Variable-Type Wild-Card Expression'? extends HTMLNode'
means that aVector<TagNode>, Vector<TextNode>
orVector<CommentNode>
will all be accepted by this paramter without causing an exception throw.
These 'sub-type' Vectors are often returned as search results from the classes in the'NodeSearch'
vpackage.sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Returns:
- The requested HTML sublist, as a
Vector
. - Throws:
java.lang.IndexOutOfBoundsException
- This exception shall be thrown if any of the following are true:- If
'sPos'
is negative, or ifsPos
is greater-than-or-equal-to thesize
of theVector
- If
'ePos'
is zero, or greater than the size of theVector
- If the value of
'sPos'
is a larger integer than'ePos'
. If'ePos'
was negative, it is first reset toVector.size()
, before this check is done.
- If
- See Also:
TagNodeGetInclusive
- Code:
- Exact Method Body:
return TagNodeGetInclusive.first(html, sPos, ePos, "ol");
-
findAllOption
public static java.util.Vector<DotPair> findAllOption (java.util.Vector<? extends HTMLNode> selectList) throws MalformedHTMLException
This will use the "L1 Inclusive" concept defined in this HTML package to provide a list (returned using the type:java.util.Vector<DotPair>
) of each element that fits the<OPTION> ... </OPTION>
HTML "select-option element" structure.- Parameters:
selectList
- An HTML list ofTagNode's
andTextNode's
that constitute an selection-option drop-down menu. This list cannot contain extraneousTagNode's
orTextNode's
, but rather, must begin and end with the open and close "select" HTML drop-down menu Tags.- Returns:
- A "list of lists" - specifically, a list of
Torello.HTML.DotPair
, each of which delineate a complete<OPTION> ... </OPTION>
sub-list that are present within this HTML "select" drop-down-menu structure. - Throws:
MalformedHTMLException
- This method in no way performs a complete evaluation of the HTML structure provided by the user in theVector<? extends HTMLNode> list
parameter that is passed. However rules that are related to the HTML elements "Select Option"<SELECT>...<OPTION> ... </OPTION> ... </SELECT>
are inspected.- If the passed list parameter does not start and end with the exact HTML
elements -
<SELECT>, </SELECT>
, then this exception is thrown. - If the passed list parameter contains "extraneous HTML tags" or "extraneous text"
in between the
<OPTION> ... </OPTION> or <SELECT> ... </SELECT>
list-start and list-end demarcated HTML TagNodes, then theTorello.HTML.MalformedHTMLException
will, again, be thrown
- If the passed list parameter does not start and end with the exact HTML
elements -
- See Also:
checkEndPoints(Vector, String[])
,checkL1(Vector, Vector)
,TagNodeFindL1Inclusive
- Code:
- Exact Method Body:
checkEndPoints(selectList, "select"); Vector<DotPair> ret = TagNodeFindL1Inclusive.all(selectList, "option"); checkL1(selectList, ret); return ret;
-
getAllOption
public static java.util.Vector<java.util.Vector<HTMLNode>> getAllOption (java.util.Vector<? extends HTMLNode> selectList) throws MalformedHTMLException
This does the exact same thing asfindAllOption(Vector)
but the returned value is converted from "sublist endpoints" (a vector of start/end pairs), and into a "List of Sub-Lists", which is specifically a list(java.util.Vector<>)
containing sub-lists (also:java.util.Vector<HTMLNode>
)
NOTE: All of the rules and conditions explained in the comments for methodfindAllOption(Vector)
apply to this method as well.- Parameters:
selectList
- An HTML list ofTagNode's
andTextNode's
that constitute an selection-option drop-down menu. This list cannot contain extraneousTagNode's
orTextNode's
, but rather, must begin and end with the open and close "select" HTML drop-down menu Tags.- Returns:
- A "list of lists" - specifically, a list of
java.util.Vector<HTMLNode>
(sublists), each of which delineate a complete<OPTION> ... </OPTION>
sub-list that are present within this HTML "select" drop-down-menu structure. - Throws:
MalformedHTMLException
- This method in no way performs a complete evaluation of the HTML structure provided by the user in theVector<? extends HTMLNode> list
parameter that is passed. However rules that are related to the HTML elements "Select Option"<SELECT>...<OPTION> ... </OPTION> ... </SELECT>
are inspected.- If the passed list parameter does not start and end with the exact HTML
elements -
<SELECT>, </SELECT>
, then this exception is thrown. - If the passed list parameter contains "extraneous HTML tags" or "extraneous
text" in between the
<OPTION> ... </OPTION> or <SELECT> ... </SELECT>
list-start and list-end demarcated HTML TagNodes, then theTorello.HTML.MalformedHTMLException
will, again, be thrown
- If the passed list parameter does not start and end with the exact HTML
elements -
- See Also:
DPUtil.toVectors(Vector, Iterable)
- Code:
- Exact Method Body:
return DPUtil.toVectors(selectList, findAllOption(selectList));
-
findAllLI
public static java.util.Vector<DotPair> findAllLI (java.util.Vector<? extends HTMLNode> list) throws MalformedHTMLException
This will use the "L1 Inclusive" concept defined in this HTML package to provide a list (returned using the type:java.util.Vector<DotPair>
) of each element that fits the<LI> ... </LI>
HTML "list element" structure.- Parameters:
list
- An HTML list ofTagNode's
andTextNode's
that constitute an ordered or unordered list. This list cannot contain extraneousTagNode's
orTextNode's
, but rather, must begin and end with the open and close list Tags.- Returns:
- A "list of lists" - specifically, a list of
Torello.HTML.DotPair
, each of which delineate a complete<LI> ... </LI>
sub-list that are present within this HTML list structure. - Throws:
MalformedHTMLException
- This method in no way performs a complete evaluation of the HTML structure provided by the user in theVector<? extends HTMLNode> list
parameter that is passed. However rules that are related to the HTML elements "Ordered List"<OL>...</OL>
and "unordered list"<UL>...</UL>
are inspected.- If the passed list parameter does not start and end with the same HTML
elements - specifically
<OL>, <UL>
, then this exception is thrown. - If the passed list parameter contains "extraneous HTML tags" or "extraneous text"
in between the
<OL> or <UL> ... </OL> or </UL>
list-start and list-end demarcated HTML TagNodes, then theTorello.HTML.MalformedHTMLException
will, again, be thrown
- If the passed list parameter does not start and end with the same HTML
elements - specifically
- See Also:
checkEndPoints(Vector, String[])
,checkL1(Vector, Vector)
,TagNodeFindL1Inclusive
- Code:
- Exact Method Body:
checkEndPoints(list, "ol", "ul"); Vector<DotPair> ret = TagNodeFindL1Inclusive.all(list, "li"); checkL1(list, ret); return ret;
-
getAllLI
public static java.util.Vector<java.util.Vector<HTMLNode>> getAllLI (java.util.Vector<? extends HTMLNode> list) throws MalformedHTMLException
This does the exact same thing asfindAllLI(Vector)
but the returned value is converted from "sublist endpoints" (a vector of start/end pairs), and into a "List of Sub-Lists", which is specifically a list(java.util.Vector<>)
containing sub-lists (also:java.util.Vector<HTMLNode>
)
NOTE: All of the rules and conditions explained in the comments for methodfindAllLI(Vector)
apply to this method as well.- Parameters:
list
- An HTML list ofTagNode's
andTextNode's
that constitute an ordered or unordered list. This list cannot contain extraneousTagNode's
orTextNode's
, but rather, must begin and end with the open and close list Tags.- Returns:
- A "list of lists" - specifically, a list of
java.util.Vector<HTMLNode>
(sublists), each of which delineate a complete <UL>...</UL> sub-list that are present within this HTML list structure. - Throws:
MalformedHTMLException
- This method in no way performs a complete evaluation of the HTML structure provided by the user in theVector<? extends HTMLNode> list
parameter that is passed. However rules that are related to the HTML elements "Ordered List" (<OL>...</OL>
) and "unordered list" (<UL>...</UL>
) are inspected.- If the passed list parameter does not start and end with the same HTML
elements - specifically
<OL>, <UL>
, then this exception is thrown. - If the passed list parameter contains "extraneous HTML tags" or "extraneous text"
in between the
<OL> or <UL> ... </OL> or </UL>
list-start and list-end demarcated HTMLTagNode's
, then theTorello.HTML.MalformedHTMLException
will, again, be thrown.
- If the passed list parameter does not start and end with the same HTML
elements - specifically
- See Also:
DPUtil.toVectors(Vector, Iterable)
- Code:
- Exact Method Body:
return DPUtil.toVectors(list, findAllLI(list));
-
checkEndPoints
protected static java.lang.String checkEndPoints (java.util.Vector<? extends HTMLNode> list, java.lang.String... tokList) throws MalformedHTMLException
This method is used to guarantee precisely two conditions to the passed HTML Tag list.- Condition 1: The
Vector<HTMLNode> list
parameter begins and ends with the exact same HTML Tag, (for instance:<H1> ... </H1>
, or perhaps<LI> ... </LI>
) - Condition 2: The HTML-Tag that is found at the start and end of this list is one
contained within the
'tokList'
variable-lengthString-array
parameter. (if the'tokList'
parameter was ajava.lang.String[] tokList = { "th", "tr" }
, then the passed "HTMLNode list" (Vector
) parameter would have to begin and end with either:<TH> ... </TH>
or with<TR> ... </TR>
Much of the java code in this method is used to provide some explanatory Exception message information.- Parameters:
list
- This is supposed to be a typical "open" and "close" HTML TagNode structure. It may be anything including:<DIV ID="..."> ... </DIV>
, or<TABLE ...> ... </TABLE>
, or even<BODY> ... </BODY>
tokList
- This is expected to be the possible set of tokens with which this HTML list may begin or end with.- Returns:
- If the passed list parameter passes both the conditions specified above, then the
token from the list of tokens that were provided is returned.
NOTE: If the list does not meet these conditions, aTorello.HTML.MalformedHTMLException
will be thrown with an explanatory exception-message (and, obviously, the method will not return anything!) - Throws:
MalformedHTMLException
- Some explanatory information is provided to the coder for what has failed with the input list.- Code:
- Exact Method Body:
return checkEndPoints(list, 0, list.size()-1, tokList);
- Condition 1: The
-
checkEndPoints
protected static java.lang.String checkEndPoints (java.util.Vector<? extends HTMLNode> list, int sPos, int ePos, java.lang.String... tokList) throws MalformedHTMLException
This method, functionally, does the exact same thing as "checkEndPoints" - but with the endpoints specified. It is being kept with protected access since it might be unclear what endpoints are being checked. The previous method has many java exception case strings laboriously typed out. Rather than retype this, this method is being introduced. Functionally, it does the same thing ascheckEndPoints(Vector, String)
- except it does not uselist.elementAt(0)
orlist.elementAt(element.size()-1)
as the starting and ending points.- Parameters:
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.tokList
- The list of valid HTML Element names (tokens).- Throws:
MalformedHTMLException
- See Also:
checkEndPoints(Vector, String[])
- Code:
- Exact Method Body:
HTMLNode n = null; String tok = null; if ((n = list.elementAt(sPos)).isTagNode()) tok = ((TagNode) n).tok; else throw new MalformedHTMLException( "This list does not begin an HTML TagNode, but rather a: " + n.getClass().getName() + "\n" + n.str ); if (! (n = list.elementAt(ePos)).isTagNode()) throw new MalformedHTMLException( "This list does not end with an HTML TagNode, but rather a : " + n.getClass().getName() + "\n" + n.str ); if (! ((TagNode) n).tok.equals(tok)) throw new MalformedHTMLException( "This list does not begin and end with the same HTML TagNode:\n" + "[OpeningTag: " + tok + "]\t[ClosingTag: " + ((TagNode) n).tok + "]" ); for (String t : tokList) if (t.equals(tok)) return tok; String expectedTokList = ""; for (String t: tokList) expectedTokList += " " + t; throw new MalformedHTMLException( "The opening and closing HTML Tag tokens for this list are not members of the " + "tokList parameter set...\n" + "Expected HTML Tag List: " + expectedTokList + "\nFound Tag: " + tok );
-
checkL1
protected static void checkL1(java.util.Vector<? extends HTMLNode> list, java.util.Vector<DotPair> sublists) throws MalformedHTMLException
This checks that the sublists demarcated by theVector<DotPair> htmlSubLists
parameter are properly formatted HTML. It would be easier to provide an example of "proper HTML formatting" and "improper HTML formatting" here, rather that trying to explain this using English.
PROPER HTML:
HTML Elements:
<UL> <LI> This is a list element.</LI> <LI> This is another list element.</LI> <LI> This list element contains <B><I> extra-tags</I></B> like "bold", "italics", and even a <A HREF="http://Torello.Directory">link!</A></LI> </UL>
IMPROPER HTML:
HTML Elements:
<UL> This text should not be here, and constitutes "malformed HTML" <LI> This LI element is just fine.</LI> <A HREF="http://ChineseNewsBoard.com">This link</A> should be between LI elements <LI> This LI element is also just fine!</LI> </UL>
In the above two lists, the latter would generate a MalformedHTMLException- Throws:
MalformedHTMLException
- whenever improper HTML is presented to this function- Code:
- Exact Method Body:
checkL1(list, 0, list.size()-1, sublists);
-
checkL1
protected static void checkL1(java.util.Vector<? extends HTMLNode> list, int sPos, int ePos, java.util.Vector<DotPair> sublists) throws MalformedHTMLException
This method, functionally, does the exact same thing as "checkEL1" - but with the endpoints specified. It is being kept with protected access since it might be unclear what endpoints are being checked. The previous method has many java exception caseString's
laboriously typed out. Rather than retype this, this method is being introduced. Functionally, it does the same thing ascheckL1(Vector, String)
- except it does not uselist.elementAt(0)
orlist.elementAt(element.size()-1)
as the starting and ending points.- Parameters:
sPos
- This is the (integer)Vector
-index that sets a limit for the left-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'inclusive' meaning that theHTMLNode
at thisVector
-index will be visited by this method.
NOTE: If this value is negative, or larger than the length of the input-Vector
, an exception will be thrown.ePos
- This is the (integer)Vector
-index that sets a limit for the right-mostVector
-position to inspect/search inside the inputVector
-parameter.
This value is considered 'exclusive' meaning that the'HTMLNode'
at thisVector
-index will not be visited by this method.
NOTE: If this value is larger than the size of input theVector
-parameter, an exception will throw.
ALSO: Passing a negative value to this parameter,'ePos'
, will cause its value to be reset to the size of the inputVector
-parameter.- Throws:
MalformedHTMLException
- See Also:
checkL1(Vector, Vector)
- Code:
- Exact Method Body:
int last = sPos; int t = ePos - 1; HTMLNode n = null; for (DotPair sublist : sublists) if (sublist.start == (last+1)) last = sublist.end; else { if ((sublist.start < (last+1)) || (sublist.start >= t)) throw new IllegalArgumentException( "The provided subLists parameter does not contain subLists that are in " + "order of the original list. The 'list of sublists' must contain " + "sublists that are in increasing sorted order.\n" + "Specifically, each sublist must contain start and end points that are " + "sequentially increasing. Also, they may not overlap." ); else { for (int i=(last+1); i < sublist.start; i++) if ((n = list.elementAt(i)).isTagNode()) throw new MalformedHTMLException( "There is a spurious HTML-Tag element at Vector position: " + i + "\n=>\t" + n.str ); else if (n.isTextNode() && (n.str.trim().length() > 0)) throw new MalformedHTMLException( "There is a spurious Text-Node element at Vector position: " + i + "\n=>\t" + n.str ); } }
-
-