java.lang.Object
- Torello.HTML.Links

public class Links
extends java.lang.Object

Utilities for de-refrencing 'partially-completed' URL's in a Web-Page Vector.

This is a utility class that helps 'complete' URLs that are often scraped from web-pages, and are 'relative' (partially completed) URLs. This is a common occurrence in browsers, when people do not need to present an entire directory and web-server DNS name for retrieving an image file or link that resides in the same directory as the web-page URL of the page in which that link resides.

CONTENT-NOTE:
These scrape-package classes were initially developed for scraping news-content from the Chinese Government Web-Portal, and redirecting over-seas news-content to a simple translation service for people interested in reading about news from over-seas. This is particularly interesting for a government such as China, were a huge percentage of our economic GDP based on products exported from factories in the Southern Region there to our strip-malls here in Dallas (and other places). Perhaps these URL examples may not seem relevant to a typical Internet-Programmer who is not presently studying languages, but they are staying here anyway.

Specifically: In addition to Java - Chinese, Spanish, German etc... are also interesting languages to study.

EXCEPTION SUPPRESSION:
Precisely half of these methods are designed to "sweep" an entire page of HTML. The methods that expect an vector of anchors, images, or other links and iterate over the entire HTML-Vector or page will catch any and all exception-throws of type MalformedURLException, and place null in the return-Vector position for that particular URL.

The value of this is, of course, that all links that can be resolved, by the nature of exception-suppression, will be resolved. Checking the return-Vector's for null-values is necessary when pages that contain broken links or image-sources is important. However, each method that ends with the letter 'KE' shall return a Vector that includes any thrown exception in the Java-HTML Tuple-Class Ret2<URL, MalformedURLException>.

This concept may seem 'unique,' but once this process is familiar - the value of not being forced to write try-catch blocks for every web-page URL-resolution-stage in your programs will hopefully become obvious.

EXAMLES TABLE:
The following table attempts to explain the rules for evaluating relative / partial URL's, such as an HTML '<A ...>' (Anchor-Tag) 'HREF=...' URL, or an <IMG SRC="..."> URL. The column on the left portrays the type of TagNode-input containing a URL - which could be a partial URL - while the column on the right hopefully demystifies how such a URL would be "decoded" (de-referenced) from a partial to a complete Uniform Resource Locator.

HTML TagNode	sourceURL: `http://english.gov.CN/article/01-01-2018/index.html`
`<IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.bmp">`	http://english.gov.CN/article/01-01-2018/image12345.bmp
`<IMG SRC="/article/01-01-2018/image12345.bmp">`	http://english.gov.CN/article/01-01-2018/image12345.bmp
`<IMG SRC="image12345.bmp">`	http://english.gov.CN/article/01-01-2018/image12345.bmp
`<IMG SRC="//some.other.url/a.bmp">`	http://some.other.url/a.bmp
`<A HREF="#sub-section">`	`null`
`<IMG SRC="../../pic2.bmp">`	http://english.gov.CN/pic2.bmp
`<A HREF="tel: (212) 555-6789">`	`null`
HTML TagNode	sourceURL: `http://english.gov.CN/article/12-31-2018/index.html`
`<IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.png">`	http://english.gov.CN/article/01-01-2018/image12345.png
`<IMG SRC="/article/01-01-2018/image12345.png">`	http://english.gov.CN/article/01-01-2018/image12345.png
`<IMG SRC="image12345.png">`	http://english.gov.CN/article/12-31-2018/image12345.png
`<IMG SRC="//some.other.url/a.bmp">`	http://some.other.url/a.bmp
`<A HREF="#sub-section">`	`null`
`<IMG SRC="../pic3.bmp">`	http://english.gov.CN/article/pic3.bmp
`<A HREF="mailto: [email protected]">`	`null`
HTML TagNode	sourceURL: `http://SpanishNewsBoard.com/article/10-12-2018/index.html`
`<IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.jpg">`	http://english.gov.CN/article/01-01-2018/image12345.jpg
`<IMG SRC="/article/01-01-2018/image12345.jpg">`	http://SpanishNewsBoard.com/article/01-01-2018/image12345.jpg
`<IMG SRC="image12345.jpg">`	http://SpanishNewsBoard.com/article/10-12-2018/image12345.jpg
`<IMG SRC="//some.other.url/a.bmp">`	http://some.other.url/a.bmp
`<A HREF="#sub-section">`	`null`
`<IMG SRC="../../../pic3.bmp">`	`null`
`<A HREF="javascript: alert("hello world);">`	`null`

The following example will find all HTML <A HREF="..."> (anchor-tags), and replace the HREF value it finds with an absolute url-link

Example:

// This fixes the body of a "web-page news-article" (or any web-site html, so to speak)
// It assures that (after scraping) any original Anchor URL's which contained "relative links"
// become "absolute links" - by completing the URL.

// The original web-site url
URL webSiteURL = new URL("https://some-web-site.com/News/Article-Numero-Uno.html");

// Here the HTML page is downloaded to a simple Java Vector. 
Vector<HTMLNode> page = HTMLPage.getPageTokens(webSiteURL, false);

// Any URL's which do not contain complete URI's - inclusive of a domain-name, directory,
// and file-name will be completed and inserted back into the page.

Links.resolveAllHREF(page, webSiteURL, SD.SingleQuotes, false);

COMMON SPECIAL CASES:
The following special cases for commonly found HREF-Attributes include URL-Links that are not intended to point to HTML pages. The following rather commonly found values for HTML Anchor Tag HREF-Attributes that will cause this class to return null and/or return an exception include these:

<A HREF="tel:<a-telephone-number>" ... >
<A HREF="javascript:<some-script-calls>" ... >
<A HREF="mailto:<an-email-address>" ... >
<A HREF="file:<file-for-download>" ... >
<A HREF="ftp:<ftp-file-transfer-protocol-address>" ... >
<A HREF="magnet:<bit-torrent-address>" ...>
<A HREF="data:<base64-encoded-image>" ... >
<A HREF="blob:<Binary-Large-Object>" ... >
<A HREF="#<this-page-subsection>" ... >

Any call to resolve an HTML Anchor element whose URL link begins with the above special-cases will return null, or, if the "Keep Exception" (_KE) version is requested a Torello.Java.Ret2<URL, HREFException> will be returned where the value of ret2.a is null, and the value of ret2.b is an instance of an HREFException

See Also:: ReplaceNodes, ReplaceFunction, HTMLPage, InnerTagFind, Ret2

Hi-Lited Source-Code:

View Here: Torello/HTML/Links.java
Open New Browser-Tab: Torello/HTML/Links.java

File Size: 61,318 Bytes Line Count: 1,451 '\n' Characters Found

Stateless Class:

This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

1 Constructor(s), 1 declared private, zero-argument constructor
26 Method(s), 26 declared static
1 Field(s), 1 declared static, 1 declared final

Field Summary

Fields
Modifier and Type Field

protected static String[] _NON_URL_HREFS

Method Summary

Resolve URL's

Modifier and Type	Method
`static URL`	`resolve(String src, URL sourcePage)`
`static Vector<URL>`	`resolve(Vector<String> src, URL sourcePage)`

Resolve URL's, but Suppress Exceptions, and Keep Them
Modifier and Type	Method
`static Ret2<URL, MalformedURLException>`	`resolve_KE(String src, URL sourcePage)`
`static Vector<Ret2<URL, MalformedURLException>>`	`resolve_KE(Vector<String> src, URL sourcePage)`

Resolve HREF-Attribute URL's
Modifier and Type	Method
`static URL`	`resolveHREF(TagNode tnWithHREF, URL sourcePage)`
`static TagNode`	`resolveHREFAndUpdate(TagNode tnWithHREF, URL sourcePage)`
`static Vector<URL>`	`resolveHREFs(Iterable<TagNode> tnListWithHREF, URL sourcePage)`
`static Vector<URL>`	`resolveHREFs(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)`

Resolve SRC-Attribute URL's
Modifier and Type	Method
`static URL`	`resolveSRC(TagNode tnWithSRC, URL sourcePage)`
`static TagNode`	`resolveSRCAndUpdate(TagNode tnWithSRC, URL sourcePage)`
`static Vector<URL>`	`resolveSRCs(Iterable<TagNode> tnListWithSRC, URL sourcePage)`
`static Vector<URL>`	`resolveSRCs(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)`

Resolve all HREF URL's on an HTML-Page, and Update the Page-Vector
Modifier and Type	Method
`static Ret3<int[],int[],int[]>`	`resolveAllHREF(Vector<? super TagNode> html, int sPos, int ePos, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`
`static Ret3<int[],int[],int[]>`	`resolveAllHREF(Vector<? super TagNode> html, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`
`static Ret3<int[],int[],int[]>`	`resolveAllHREF(Vector<? super TagNode> html, DotPair dp, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`

Resolve all SRC URL's on an HTML-Page, and Update the Page-Vector
Modifier and Type	Method
`static Ret3<int[],int[],int[]>`	`resolveAllSRC(Vector<? super TagNode> html, int sPos, int ePos, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`
`static Ret3<int[],int[],int[]>`	`resolveAllSRC(Vector<? super TagNode> html, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`
`static Ret3<int[],int[],int[]>`	`resolveAllSRC(Vector<? super TagNode> html, DotPair dp, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)`

Resolve HREF URL's, but Suppress Exceptions, and Keep Them
Modifier and Type	Method
`static Ret2<URL, MalformedURLException>`	`resolveHREF_KE(TagNode tnWithHREF, URL sourcePage)`
`static Vector<Ret2<URL, MalformedURLException>>`	`resolveHREFs_KE(Iterable<TagNode> tnListWithHREF, URL sourcePage)`
`static Vector<Ret2<URL, MalformedURLException>>`	`resolveHREFs_KE(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)`

Resolve SRC URL's, but Suppress Exceptions, and Keep Them
Modifier and Type	Method
`static Ret2<URL, MalformedURLException>`	`resolveSRC_KE(TagNode tnWithSRC, URL sourcePage)`
`static Vector<Ret2<URL, MalformedURLException>>`	`resolveSRCs_KE(Iterable<TagNode> tnListWithSRC, URL sourcePage)`
`static Vector<Ret2<URL, MalformedURLException>>`	`resolveSRCs_KE(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)`

More Methods
Modifier and Type	Method
`static URL`	`getBaseURL(Vector<? extends HTMLNode> page)`
`static String[]`	`NON_URL_HREFS()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - _NON_URL_HREFS
    
    🡇 ⇈ ⮫ 🗕 🗗 🗖
    protected static final java.lang.String[] _NON_URL_HREFS
    
    List of documented "starter-strings" that are sometimes used in Anchor URL 'HREF=...' attributes.
    
    See Also:
    
    NON_URL_HREFS()
    
    Code:
    
    Exact Field Declaration Expression:
    
    protected static final String[] _NON_URL_HREFS = { "tel:", "magnet:", "javascript:", "mailto:", "ftp:", "file:", "data:", "blog:", "#" };
- Method Detail
  - NON_URL_HREFS
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.lang.String[] NON_URL_HREFS()
    
    This small method just returns the complete list of commonly found Anchor 'HREF' String's that do not actually constitute an HTML 'URL'. This method actually returns a "clone" of an internally stored String[] Array. This is to protect and make sure that the list of potential HTML Anchor-Tag 'HREF' Attributes is not changed, doctored or modified
    
    Returns:
    
    A clone of the String-array '_NON_URL_HREFS'
    
    See Also:
    
    _NON_URL_HREFS
    
    Code:
    
    Exact Method Body:
    
    return _NON_URL_HREFS.clone();
  - getBaseURL
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.net.URL getBaseURL (java.util.Vector<? extends HTMLNode> page) throws MalformedHTMLException, java.net.MalformedURLException
    
    The methods in this class will not automatically extract any HTML <BASE HREF=URL> definitions that are found on this page. If the user wishes to dereference partial / relative URL definitions that exist on the input page, all the while respecting any <BASE HREF=URL> definitions found on the input page, then this method should be utilized.
    
    Parameters:
    
    page - This may be any HTML page or partial page. If this page has a valid HTML <BASE HREF=URL>, it will be extracted and returned as an instance of class URL.
    
    Returns:
    
    This shall return the HTML <BASE HREF="http://..."> element found available within the input-page parameter 'page'. If the page provided does not contain a BASE URL definition, then null shall be returned.
    
    NOTE: The HTML Specification clearly states that only one URL may be defined using the HTML Element <BASE>. Clearly, due to the browser wars, unspecified / non-deterministic behavior is possible if multiple definitions are provided. For the purposes of this class, if such a situation arises, an exception is thrown.
    
    Throws:
    
    MalformedHTMLException - If the HTML page provided contains multiple definitions of the element <BASE HREF=URL>, then this exception will throw.
    
    java.net.MalformedURLException - If the <BASE HREF=URL> found / identified within the input page, but that URL is invalid, then this exception shall throw.
    
    See Also:
    
    TagNodeFind, Attributes.retrieve(Vector, int[], String)
    
    Code:
    
    Exact Method Body:
    
    int[] posArr = TagNodeFind.all(page, TC.OpeningTags, "base"); if (posArr.length == 0) return null; // NOTE: The cast is all right because 'posArr' only points to TagNode's // Attributes expects to avoid processing Vector<TextNode>, and Vector<CommentNode> // Above, there will be nothing in the 'posArr' if either of those was passed. @SuppressWarnings("unchecked") String[] urls = Attributes.retrieve((Vector<HTMLNode>) page, posArr, "href"); boolean found = false; String ret = null; for (String url : urls) if ((url != null) && (url.length() > 0)) if (found) throw new MalformedHTMLException( "The page you have provided has multiple <BASE HREF=URL> definitions. " + "However, the HTML Specifications state that pages may provide just one " + "definition. If you wish to proceed, retrieve the definitions manually " + "using class TagNodeFind.all and Attributes.retrieve, as explained in " + "the JavaDoc pages for this class." ); else { found = true; ret = url; } return new URL(ret);
  - resolveAllSRC
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllSRC (java.util.Vector<? super TagNode> html, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    Convenience Method
    Invokes: resolveAllSRC(Vector, int, int, URL, SD, boolean)
  - resolveAllSRC
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllSRC (java.util.Vector<? super TagNode> html, DotPair dp, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    Convenience Method
    Accepts: DotPair.
    Invokes: resolveAllSRC(Vector, int, int, URL, SD, boolean)
  - resolveAllSRC
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllSRC (java.util.Vector<? super TagNode> html, int sPos, int ePos, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    This method shall resolve all partial URL addresses that are found within TagNode elements having 'SRC=...' attributes. Each instance of TagNode found in the input HTML Vector that has an 'SRC' attribute - if the 'URL' is only partially resolve - shall be updated and replaced with a new TagNode with a fully resolved URL.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.
    
    Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
    
    sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.
    
    This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.
    
    NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
    
    ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.
    
    This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.
    
    NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.
    
    ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the HTML-Vector will be resolved.
    
    quote - A choice for the quotes to use. In most cases, URL attribute values do not contain quotation-marks. So likely either choice would work just fine, without exceptions.
    
    NOTE: null may be passed to this parameter, and if it is the original quotation marks found in the TagNode's 'SRC' attribute will be reused. Passing null to this parameter should almost always be easiest, safest.
    
    askForReturnArraysOrReturnNull - This (long-named) parameter is merely here to facilitate retrieving more information from this method - if necessary. When this parameter receives the following values:
    
    TRUE: Three integer int[] arrays will be returned as listed in the Returns: section of this method's documentation.
    
    FALSE: This method shall return null.
    
    Returns:
    
    If input parameter 'askForReturnArraysOrReturnNull' has been passed FALSE, this method shall return null. Otherwise, (if passed TRUE), then this method shall return an instance of 'Ret3<int[], int[], int[]>' - which is returning three separate integer-arrays about what was found, and what has occurred.
    
    Three arrays are returned as a result of this method's invocation. Keep in mind that though the information might be superfluous, rejecting these arrays away is easy. They are provided as a matter of convenience for cases where more details information is mandatory for ensuring that long lists of HTMLNode's were properly updated.
    
    Ret3.a (int[])
    
    The first int[] array shall contain a list of the index of every TagNode in the input-Vector parameter's range that contained a non-null HTML 'SRC' Attribute.
    
    Ret3.b (int[])
    
    The second int[] array will contain an index-list of the indices which contained TagNode's that were replaced by the internal-resolve logic.
    
    Ret3.c (int[])
    
    The third int[] array will contain an index-list of the indices which contained TagNode's whose 'SRC=...' attribute failed to be resolved by the internal-resolve logic, or caused a QuotesException to throw.
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:
    
    If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
    
    If 'ePos' is zero, or greater than the size of the Vector
    
    If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String), TagNode.setAV(String, String, SD)
    
    Code:
    
    Exact Method Body:
    
    // Retrieve the Vector-location of any TagNode on the page that has // a "SRC=..." attribute. These are almost always HTML <IMG> elements. // NOTE: FIND Method's are "READ ONLY" - the Cast will make no difference at run-time. // The @SuppressWarnings is to overcome the cast of 'html' @SuppressWarnings("unchecked") int[] hasSrcPosArr = InnerTagFind.all((Vector<HTMLNode>) html, sPos, ePos, "src"); // Java Stream's are convenient for keeping "Growing Lists" of return values. // This builder shall keep a list of all URL's that failed to update - for any reason // **UNLESS** the reason is that the URL was already a fully-resolved, non-partial URL IntStream.Builder failedUpdate = askForReturnArraysOrReturnNull ? IntStream.builder() : null; // This stream will keep a list of all URL's that were updated, and whose TagNode's // were replaced inside the input HTML Vector IntStream.Builder replaced = askForReturnArraysOrReturnNull ? IntStream.builder() : null; for (int pos : hasSrcPosArr) { // Get the node at the index TagNode tn = (TagNode) html.elementAt(pos); // 1) Retrieve the SRC Attribute // 2) if it is a partial-URL resolve it // 3) Convert to a String String oldURL = tn.AV("src"); URL newURL = resolve(oldURL, sourcePage); // Some URL's cannot be resolved, if so, just skip this TagNode. // Log the index to the stream (if requested), and continue. if (newURL == null) { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; } // If the URL was already a fully-resolved-URL, continue - don't replace the TagNode; // No logging needed here, the URL was *already* resolved... if (oldURL.length() == newURL.toString().length()) continue; // Replace the SRC Attribute in the TagNode. This builds a new instance of TagNode // If there is an exception, log the index to the stream (if requested), and continue. try { tn = tn.setAV("src", newURL.toString(), quote); } catch (QuotesException qex) { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; } // Replace the index in the Vector containing the old TagNode with the new one. html.setElementAt(tn , pos); // The Vector-Index at this position had it's old TagNode removed and replaced with a // new updated one. Log this to the stream-list so to allow the user to know. if (askForReturnArraysOrReturnNull) replaced.accept(pos); } return askForReturnArraysOrReturnNull ? new Ret3<int[], int[], int[]> (hasSrcPosArr, replaced.build().toArray(), failedUpdate.build().toArray()) : null;
  - resolveAllHREF
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllHREF (java.util.Vector<? super TagNode> html, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    Convenience Method
    Invokes: resolveAllHREF(Vector, int, int, URL, SD, boolean)
  - resolveAllHREF
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllHREF (java.util.Vector<? super TagNode> html, DotPair dp, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    Convenience Method
    Accepts: DotPair.
    Invokes: resolveAllHREF(Vector, int, int, URL, SD, boolean)
  - resolveAllHREF
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret3<int[],int[],int[]> resolveAllHREF (java.util.Vector<? super TagNode> html, int sPos, int ePos, java.net.URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
    
    This method shall resolve all partial URL addresses that are found within TagNode elements having 'HREF=...' attributes. Each instance of TagNode found in the input HTML Vector that has an 'HREF' attribute - if the 'URL' is only partially resolve - shall be updated and replaced with a new TagNode with a fully resolved URL.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.
    
    Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
    
    sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.
    
    This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.
    
    NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
    
    ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.
    
    This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.
    
    NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.
    
    ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the HTML-Vector will be resolved.
    
    quote - A choice for the quotes to use. In most cases, URL attribute values do not contain quotation-marks. So likely either choice would work just fine, without exceptions.
    
    NOTE: null may be passed to this parameter, and if it is the original quotation marks found in the TagNode's 'HREF' attribute will be reused. Passing null to this parameter should almost always be easiest, safest.
    
    askForReturnArraysOrReturnNull - This (long-named) parameter is merely here to facilitate retrieving more information from this method - if necessary. When this parameter receives the following values:
    
    TRUE: Three integer int[] arrays will be returned as listed in the Returns: section of this method's documentation.
    
    FALSE: This method shall return null.
    
    Returns:
    
    If input parameter 'askForReturnArraysOrReturnNull' has been passed FALSE, this method shall return null. Otherwise, (if passed TRUE), then this method shall return an instance of 'Ret3<int[], int[], int[]>' - which is returning three separate integer-arrays about what was found, and what has occurred.
    
    Three arrays are returned as a result of this method's invocation. Keep in mind that though the information might be superfluous, rejecting these arrays away is easy. They are provided as a matter of convenience for cases where more details information is mandatory for ensuring that long lists of HTMLNode's were properly updated.
    
    Ret3.a (int[])
    
    The first int[] array shall contain a list of the index of every TagNode in the input-Vector parameter's range that contained a non-null HTML 'HREF' Attribute.
    
    Ret3.b (int[])
    
    The second int[] array will contain an index-list of the indices which contained TagNode's that were replaced by the internal-resolve logic.
    
    Ret3.c (int[])
    
    The third int[] array will contain an index-list of the indices which contained TagNode's whose 'HREF=...' attribute failed to be resolved by the internal-resolve logic, or caused a QuotesException to throw.
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:
    
    If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
    
    If 'ePos' is zero, or greater than the size of the Vector
    
    If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String), TagNode.setAV(String, String, SD)
    
    Code:
    
    Exact Method Body:
    
    // Retrieve the Vector-location of any TagNode on the page that has // a "HREF=..." attribute. These are almost always HTML <IMG> elements. // NOTE: FIND Method's are "READ ONLY" - the Cast will make no difference at run-time. // The @SuppressWarnings is to overcome the cast of 'html' @SuppressWarnings("unchecked") int[] hasHRefPosArr = InnerTagFind.all((Vector<HTMLNode>) html, sPos, ePos, "href"); // Java Stream's are convenient for keeping "Growing Lists" of return values. // This builder shall keep a list of all URL's that failed to update - for any reason // **UNLESS** the reason is that the URL was already a fully-resolved, non-partial URL IntStream.Builder failedUpdate = askForReturnArraysOrReturnNull ? IntStream.builder() : null; // This stream will keep a list of all URL's that were updated, and whose TagNode's // were replaced inside the input HTML Vector IntStream.Builder replaced = askForReturnArraysOrReturnNull ? IntStream.builder() : null; for (int pos : hasHRefPosArr) { // Get the node at the index TagNode tn = (TagNode) html.elementAt(pos); // 1) Retrieve the HREF Attribute // 2) if it is a partial-URL resolve it // 3) Convert to a String String oldURL = tn.AV("HREF"); URL newURL = resolve(oldURL, sourcePage); // Some URL's cannot be resolved, if so, just skip this TagNode. // Log the index to the stream (if requested), and continue. if (newURL == null) { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; } // If the URL was already a fully-resolved-URL, continue - don't replace the TagNode; // No logging needed here, the URL was *already* resolved... if (oldURL.length() == newURL.toString().length()) continue; // Replace the HREF Attribute in the TagNode. This builds a new instance of TagNode // If there is an exception, log the index to the stream (if requested), and continue. try { tn = tn.setAV("href", newURL.toString(), quote); } catch (QuotesException qex) { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; } // Replace the index in the Vector containing the old TagNode with the new one. html.setElementAt(tn , pos); // The Vector-Index at this position had it's old TagNode removed and replaced with a // new updated one. Log this to the stream-list so to allow the user to know. if (askForReturnArraysOrReturnNull) replaced.accept(pos); } return askForReturnArraysOrReturnNull ? new Ret3<int[], int[], int[]> (hasHRefPosArr, replaced.build().toArray(), failedUpdate.build().toArray()) : null;
  - resolveHREFAndUpdate
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static TagNode resolveHREFAndUpdate(TagNode tnWithHREF, java.net.URL sourcePage)
    
    Convenience Method
    Invokes: resolveHREF(TagNode, URL).
    And-Then: TagNode.setAV(String, String, SD)
  - resolveHREF
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.net.URL resolveHREF(TagNode tnWithHREF, java.net.URL sourcePage)
    
    This should be used for TagNode's that contain an 'HREF' inner-tag (attribute).
    
    Parameters:
    
    tnWithHREF - This may be any HTML Element that contains an 'HREF' attribute.
    
    NOTE: An HTML 'anchor' element (< HREF=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
    
    sourcePage - This is the source page URL from which the TagNode (possibly-relative) URL will be resolved.
    
    Returns:
    
    A complete-URL without any missing "presumed data" - such as host/domain or directory. Null is returned if attempting to build the URL generated a MalformedURLException.
    
    SPECIFICALLY: This method shall catch all MalformedURLException's.
    
    Throws:
    
    HREFException - If the TagNode passed to parameter 'tnWithHREF' does not actually contain an HREF attribute, then this exception shall throw.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    String href = tnWithHREF.AV("href"); if (href == null) throw new HREFException( "The TagNode passed to parameter tnWithHREF does not actually contain an " + "HREF attribute." ); return resolve(href, sourcePage);
  - resolveSRCAndUpdate
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static TagNode resolveSRCAndUpdate(TagNode tnWithSRC, java.net.URL sourcePage)
    
    Convenience Method
    Invokes: resolveSRC(TagNode, URL)
    And-Then: TagNode.setAV(String, String, SD)
  - resolveSRC
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.net.URL resolveSRC(TagNode tnWithSRC, java.net.URL sourcePage)
    
    This should be used for TagNode's that contain a 'SRC' inner-tag (attribute).
    
    Parameters:
    
    tnWithSRC - This may be any HTML Element that contains a 'SRC' attribute.
    
    NOTE: An HTML 'image' element (<IMG SRC=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
    
    sourcePage - This is the source page URL from which the TagNode (possibly-relative) URL will be resolved.
    
    Returns:
    
    A complete-URL without any missing "presumed data" - such as host/domain or directory. Null is returned if attempting to build the URL generated a MalformedURLException.
    
    SPECIFICALLY: This method shall catch all MalformedURLException's.
    
    Throws:
    
    SRCException - If the TagNode passed to parameter 'tnWithSRC' does not actually contain a SRC attribute, then this exception shall throw.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    String src = tnWithSRC.AV("src"); if (src == null) throw new SRCException( "The TagNode passed to parameter tnWithSRC does not actually contain a " + "SRC attribute." ); return resolve(src, sourcePage);
  - resolveHREFs
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<java.net.URL> resolveHREFs (java.lang.Iterable<TagNode> tnListWithHREF, java.net.URL sourcePage)
    
    This should be used for lists of TagNode's, each of which contain an 'HREF' inner-tag (attribute).
    
    Parameters:
    
    tnListWithHREF - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'HREF' attribute.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector.
    
    SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    Vector<URL> ret = new Vector<>(); for (TagNode tn : tnListWithHREF) ret.addElement(resolve(tn.AV("href"), sourcePage)); return ret;
  - resolveSRCs
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<java.net.URL> resolveSRCs (java.lang.Iterable<TagNode> tnListWithSRC, java.net.URL sourcePage)
    
    This should be used for lists of TagNode's, each of which contain a 'SRC' inner-tag (attribute).
    
    Parameters:
    
    tnListWithSRC - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'SRC' attribute.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector.
    
    SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    Vector<URL> ret = new Vector<>(); for (TagNode tn : tnListWithSRC) ret.addElement(resolve(tn.AV("src"), sourcePage)); return ret;
  - resolveHREFs
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<java.net.URL> resolveHREFs (java.util.Vector<? extends HTMLNode> html, int[] nodePosArr, java.net.URL sourcePage)
    
    This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode's to which this pointer-array points - must each contain an HREF inner-tag with a URL, or a partial URL.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
    
    nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain HREF attributes. Integer-pointer Arrays are usually returned from the package 'NodeSearch' "Find" methods.
    
    Example:
    
    // Retrieve 'pointers' to all the '<A HREF=...>' TagNode's. The term 'pointer' refers to // integer-indices into the vectorized-html variable 'page' int[] anchorPosArr = TagNodeFind.all(page, TC.OpeningTags, "a"); // Extract each HREF inner-tag, and construct a {@code URL}. Use the 'sourcePage' parameter // if the URL is only partially-resolved Vector<URL> urls = Links.resolveHREFs(page, anchorPosArr, mySourcePage);
    
    which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<A ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shortened URL's.
    
    sourcePage - This is the source page URL from whence the (possibly relative) TagNode URL's in the Vector are to be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector. However, if any of the nodes pointed to by the 'nodePosArr' parameter do not contain opening TagNode elements, then this mistake shall generate TagNodeExpectedException's.
    
    SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
    
    OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its isClosing-Field is set to TRUE, then this exception shall throw.
    
    When passing int[]-Array parameter 'posArr', that array should contain a list of Vector-indices. The code which checks for this exception checks to ensure that each of the locations in that array point to Opening TagNode's, and if or when they don't, this exception throws.
    
    TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    // Return Vector Vector<URL> ret = new Vector<>(); for (int nodePos : nodePosArr) { HTMLNode n = html.elementAt(nodePos); // Must be an HTML TagNode if (! n.isTagNode()) throw new TagNodeExpectedException(nodePos); TagNode tn = (TagNode) n; // Must be an "Opening" HTML TagNode if (tn.isClosing) throw new OpeningTagNodeExpectedException(nodePos); // Resolve the 'HREF', save the URL ret.addElement(resolve(tn.AV("href"), sourcePage)); } return ret;
  - resolveSRCs
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<java.net.URL> resolveSRCs (java.util.Vector<? extends HTMLNode> html, int[] nodePosArr, java.net.URL sourcePage)
    
    This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode's to which this pointer-array points - must each contain a SRC inner-tag with a URL, or a partial URL.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage. Any HTML page (or sub-page)
    
    nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain SRC attributes. Integer-pointer Arrays are usually returned from the package 'NodeSearch' "Find" methods.
    
    Example:
    
    // Retrieve 'pointers' to all the '<IMG SRC=...>' TagNode's. The term 'pointer' refers to // integer-indices into the vectorized-html variable 'page' int[] picturePosArr = TagNodeFind.all(page, TC.OpeningTags, "img"); // Extract each SRC inner-tag, and construct a {@code URL}. Use the 'sourcePage' parameter // if the URL is only partially-resolved Vector<URL> urls = Links.resolveSRCs(page, picturePosArr, mySourcePage);
    
    which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<IMG ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shorted image URL's.
    
    sourcePage - This is the source page URL from whence the (possibly relative) TagNode URL's in the Vector are to be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector. However, if any of the nodes pointed to by the 'nodePosArr' parameter do not contain opening TagNode elements, then this mistake shall generate TagNodeExpectedException's.
    
    SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
    
    OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its isClosing-Field is set to TRUE, then this exception shall throw.
    
    When passing int[]-Array parameter 'posArr', that array should contain a list of Vector-indices. The code which checks for this exception checks to ensure that each of the locations in that array point to Opening TagNode's, and if or when they don't, this exception throws.
    
    TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
    
    See Also:
    
    resolve(String, URL), TagNode.AV(String)
    
    Code:
    
    Exact Method Body:
    
    // Return Vector Vector<URL> ret = new Vector<>(); for (int nodePos : nodePosArr) { HTMLNode n = html.elementAt(nodePos); // Must be an HTML TagNode if (! n.isTagNode()) throw new TagNodeExpectedException(nodePos); TagNode tn = (TagNode) n; // Must be an "Opening" HTML TagNode if (tn.isClosing) throw new OpeningTagNodeExpectedException(nodePos); // Resolve the "SRC", save the URL ret.addElement(resolve(tn.AV("src"), sourcePage)); } return ret;
  - resolve
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<java.net.URL> resolve (java.util.Vector<java.lang.String> src, java.net.URL sourcePage)
    
    This will convert a list of simple java String's to a list/Vector of URL's, de-referencing any missing information using the 'sourcePage' parameter.
    
    Parameters:
    
    src - a list of strings - usually partially or totally completed Internet URL's
    
    sourcePage - This is the source page URL from which the String's (possibly-relative) URL's in the Vector will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any String's that were zero-length or null, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will be null.
    
    See Also:
    
    resolve(String, URL)
    
    Code:
    
    Exact Method Body:
    
    Vector<URL> ret = new Vector<>(); for (String s : src) ret.addElement(resolve(s, sourcePage)); return ret;
  - resolve
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.net.URL resolve(java.lang.String src, java.net.URL sourcePage)
    
    This will convert a simple java String to a URL, de-referencing any missing information using the 'sourcePage' parameter.
    
    Parameters:
    
    src - Any java String, usually one which was scraped from an HTML-Page, and needs to be "completed."
    
    sourcePage - This is the source page URL from which the String (possibly-relative) URL will be resolved.
    
    Returns:
    
    A URL, which has been completed/resolved with the 'sourcePage' parameter. If parameter 'src' is null or zero-length, then this method will also return null. If a MalformedURLException is generated, null will also be returned.
    
    Code:
    
    Exact Method Body:
    
    if (sourcePage == null) throw new NullPointerException( "Though you may provide null to the partial-URL to dereference parameter, null " + "may not be passed to the Source-Page Parameter. The purpose of the 'resolve' " + "operation is to resolve partial-URLs against a source-page (root) URL. " + "Therefore this is not allowed." ); if (src == null) return null; src = src.trim(); if (src.length() == 0) return null; String srcLC = src.toLowerCase(); if (StrCmpr.startsWithXOR(srcLC, _NON_URL_HREFS)) return null; if (srcLC.startsWith("http://") || srcLC.startsWith("https://")) try { return new URL(src); } catch (MalformedURLException e) { return null; } if (src.startsWith("//") && (src.charAt(3) != '/')) try { return new URL(sourcePage.getProtocol().toLowerCase() + ":" + src); } catch (MalformedURLException e) { return null; } if (src.startsWith("/")) try { return new URL( sourcePage.getProtocol().toLowerCase() + "://" + sourcePage.getHost().toLowerCase() + src ); } catch (MalformedURLException e) { return null; } if (src.startsWith("../")) { String sourcePageStr = sourcePage.toString(); short nLevels = 0; do { nLevels++; src = src.substring(3); } while (src.startsWith("../")); String directory = StringParse.dotDotParentDirectory(sourcePage.toString(), nLevels); try { return new URL(directory + src); } catch (Exception e) { return null; } } String root = sourcePage.getProtocol().toLowerCase() + "://" + sourcePage.getHost().toLowerCase(); String path = sourcePage.getPath().trim(); int pos = StringParse.findLastFrontSlashPos(path); if (pos == -1) throw new StringIndexOutOfBoundsException( "The URL you have provided: " + sourcePage.toString() + " does not have a '/' " + "front-slash character in it's path. Cannot proceed resolving relative-URL's " + "without this." ); path = path.substring(0, pos + 1); try { return new URL(root + path + src); } catch (MalformedURLException e) { return null; }
  - resolveHREF_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret2<java.net.URL,java.net.MalformedURLException> resolveHREF_KE (TagNode tnWithHREF, java.net.URL sourcePage)
    
    This should be used for TagNode's that contain an 'HREF' inner-tag (attribute).
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    tnWithHREF - This may be any HTML Element that contains an 'HREF' attribute.
    
    NOTE: An HTML 'anchor' element (< HREF=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL will be resolved.
    
    Returns:
    
    A complete-URL without any missing "presumed data" - such as host/domain or directory. If there were no HREF tag, then null is returned. If the TagNode causes a MalformedURLException, that is returned in Ret2.b
    
    SPECIFICALLY: This method shall catch all MalformedURLException's.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    Throws:
    
    HREFException - If the TagNode passed to parameter 'tnWithHREF' does not actually contain an HREF attribute, then this exception shall throw.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    String href = tnWithHREF.AV("href"); if (href == null) throw new HREFException( "The TagNode passed to parameter tnWithHREF does not actually contain an " + "HREF attribute." ); return resolve_KE(href, sourcePage);
  - resolveSRC_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret2<java.net.URL,java.net.MalformedURLException> resolveSRC_KE (TagNode tnWithSRC, java.net.URL sourcePage)
    
    This should be used for TagNode's that contain a 'SRC' inner-tag (attribute).
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    tnWithSRC - This may be any HTML Element that contains a 'SRC' attribute.
    
    NOTE: An HTML 'image' element (<IMG SRC=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL will be resolved.
    
    Returns:
    
    A complete-URL without any missing "presumed data" - such as host/domain or directory. If there were no SRC tag, then null is returned. If the TagNode causes a MalformedURLException, that is returned in Ret2.b
    
    SPECIFICALLY: This method shall catch all MalformedURLException's.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    Throws:
    
    SRCException - If the TagNode passed to parameter 'tnWithSRC' does not actually contain a SRC attribute, then this exception shall throw.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    String src = tnWithSRC.AV("src"); if (src == null) throw new SRCException( "The TagNode passed to parameter tnWithSRC does not actually contain a " + "SRC attribute." ); return resolve_KE(src, sourcePage);
  - resolveHREFs_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<Ret2<java.net.URL,java.net.MalformedURLException>> resolveHREFs_KE (java.lang.Iterable<TagNode> tnListWithHREF, java.net.URL sourcePage)
    
    This should be used for lists of TagNode's, each of which contain an 'HREF' inner-tag (attribute).
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    tnListWithHREF - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'HREF' attribute.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no HREF tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b
    
    SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>(); for (TagNode tn : tnListWithHREF) ret.addElement(resolve_KE(tn.AV("href"), sourcePage)); return ret;
  - resolveSRCs_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<Ret2<java.net.URL,java.net.MalformedURLException>> resolveSRCs_KE (java.lang.Iterable<TagNode> tnListWithSRC, java.net.URL sourcePage)
    
    This should be used for lists of TagNode's, each of which contain a 'SRC' inner-tag (attribute).
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    tnListWithSRC - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'SRC' attribute.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no SRC tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b
    
    SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>(); for (TagNode tn : tnListWithSRC) ret.addElement(resolve_KE(tn.AV("src"), sourcePage)); return ret;
  - resolveHREFs_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<Ret2<java.net.URL,java.net.MalformedURLException>> resolveHREFs_KE (java.util.Vector<? extends HTMLNode> html, int[] nodePosArr, java.net.URL sourcePage)
    
    This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode to which this pointer-array points - must contain HREF inner-tags with URL's, or partial URL's.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage. Any HTML page (or sub-page)
    
    nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain HREF attributes. Integer-pointer Arrays are usually return from the package 'NodeSearch' "Find" methods.
    
    Example:
    
    // Retrieve 'pointers' to all the '<A HREF=...>' TagNode's. The term 'pointer' refers to // integer-indices into the vectorized-html variable 'page' int[] anchorPosArr = TagNodeFind.all(page, TC.OpeningTags, "a"); // Extract each HREF inner-tag, and construct a URL. Use the 'sourcePage' parameter if // the URL is only partially-resolved. If any URL's on the original-page are invalid, the // method shall not crash, but save the exception instead. Vector<Ret2<URL, MalformedURLException> urlsWithEx = Links.resolveHREFs_KE(page, picturePosArr, mySourcePage); // Print out any "failed" urls for (Ret2<URL, MalformedURLException> r : urlsWithEx) if (r.b != null) System.out.println("There was an exception: " + r.b.toString());
    
    which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<A ...>" element that was available in the HTML page-Vector parameter 'html'., and then resolve any shortened URL's.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Vector will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no HREF tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b
    
    SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
    
    OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its isClosing-Field is set to TRUE, then this exception shall throw.
    
    When passing int[]-Array parameter 'posArr', that array should contain a list of Vector-indices. The code which checks for this exception checks to ensure that each of the locations in that array point to Opening TagNode's, and if or when they don't, this exception throws.
    
    TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    // Return Vector Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>(); for (int nodePos : nodePosArr) { HTMLNode n = html.elementAt(nodePos); // Must be an HTML TagNode if (! n.isTagNode()) throw new TagNodeExpectedException(nodePos); TagNode tn = (TagNode) n; // Must be an "Opening" HTML TagNode if (tn.isClosing) throw new OpeningTagNodeExpectedException(nodePos); // Resolve the "HREF", keep the URL ret.addElement(resolve_KE(tn.AV("href"), sourcePage)); } return ret;
  - resolveSRCs_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<Ret2<java.net.URL,java.net.MalformedURLException>> resolveSRCs_KE (java.util.Vector<? extends HTMLNode> html, int[] nodePosArr, java.net.URL sourcePage)
    
    This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode to which this pointer-array points - must contain SRC inner-tags with URL's, or partial URL's.
    
    HTML's <BASE HREF=...>
    Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.
    
    If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.
    
    More recently, HTML-Pages are making less use of <BASE> HTML-Tag.
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    html - This may be any Vectorized-HTML Web-Page (or sub-page).
    
    The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.
    
    These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage. Any HTML page (or sub-page)
    
    nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain SRC attributes. Integer-pointer Arrays are usually return from the package 'NodeSearch' "Find" methods.
    
    Example:
    
    // Retrieve 'pointers' to all the '<IMG SRC=...>' TagNode's. The term 'pointer' refers to // integer-indices into the vectorized-html variable 'page' int[] picturePosArr = TagNodeFind.all(page, TC.OpeningTags, "img"); // Extract each SRC inner-tag, and construct a URL. Use the 'sourcePage' parameter if // the URL is only partially-resolved. If any URL's on the original-page are invalid, // the method shall not crash, but save the exception instead. Vector<Ret2<URL, MalformedURLException> urlsWithEx = Links.resolveSRCs_KE(page, picturePosArr, mySourcePage); // Print out any "failed" urls for (Ret2<URL, MalformedURLException> r : urlsWithEx) if (r.b != null) System.out.println("There was an exception: " + r.b.toString());
    
    which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<IMG ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shortened URL's.
    
    sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Vector will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no SRC tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b
    
    SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    Throws:
    
    java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
    
    OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its isClosing-Field is set to TRUE, then this exception shall throw.
    
    When passing int[]-Array parameter 'posArr', that array should contain a list of Vector-indices. The code which checks for this exception checks to ensure that each of the locations in that array point to Opening TagNode's, and if or when they don't, this exception throws.
    
    TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
    
    See Also:
    
    resolve_KE(String, URL), TagNode.AV(String), Ret2
    
    Code:
    
    Exact Method Body:
    
    // Return Vector Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>(); for (int nodePos : nodePosArr) { HTMLNode n = html.elementAt(nodePos); // Must be an HTML TagNode if (! n.isTagNode()) throw new TagNodeExpectedException(nodePos); TagNode tn = (TagNode) n; // Must be an "Opening" HTML TagNode if (tn.isClosing) throw new OpeningTagNodeExpectedException(nodePos); // Resolve "SRC" and keep URL's ret.addElement(resolve_KE(tn.AV("src"), sourcePage)); } return ret;
  - resolve_KE
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public static java.util.Vector<Ret2<java.net.URL,java.net.MalformedURLException>> resolve_KE (java.util.Vector<java.lang.String> src, java.net.URL sourcePage)
    
    Resolve all URL's, represented as String's, inside of a Vector.
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    src - a list of String's - usually partially or totally completed Internet URL's
    
    sourcePage - This is the source page URL from which the String's (possibly-relative) URL's in the Vector will be resolved.
    
    Returns:
    
    A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any String's that were zero-length or null, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    See Also:
    
    resolve_KE(String, URL), Ret2
    
    Code:
    
    Exact Method Body:
    
    Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>(); for (String s : src) ret.addElement(resolve_KE(s, sourcePage)); return ret;
  - resolve_KE
    
    🡅 ⇈ ⮫ 🗕 🗗 🗖
    public static Ret2<java.net.URL,java.net.MalformedURLException> resolve_KE (java.lang.String src, java.net.URL sourcePage)
    
    This will convert a simple java String to a URL, de-referencing any missing information using the 'sourcePage' parameter.
    
    KE: - Keep Exceptions
    If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).
    
    Within the Pair-Tuple, Ret2<URL, MalformedURLException>:, precisely one of the two references will be non-null. If the URL was properly resolved, then the URL field (field Ret2.a) will be non-null. Otherwise, the MalformedURLException field (field Ret2.b) will be non-null.
    
    Parameters:
    
    src - Any java String, usually one which was scraped from an HTML-Page, and needs to be "completed."
    
    sourcePage - This is the source page URL from which the String (possibly relative) URL will be resolved.
    
    Returns:
    
    A URL, which has been completed/resolved with the 'sourcePage' parameter. If parameter 'src' is null or zero-length, null will be returned. If a MalformedURLException is thrown, that will be included with the Ret2<> result.
    
    Ret2.a (URL)
    
    This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.
    
    Ret2.b (MalformedURLException)
    
    If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
    
    See Also:
    
    Ret2
    
    Code:
    
    Exact Method Body:
    
    if (sourcePage == null) throw new NullPointerException( "Though you may provide null to the partial-URL to dereference parameter, null " + "may not be passed to the Source-Page Parameter. The purpose of the 'resolve' " + "operation is to resolve partial-URLs against a source-page (root) URL. " + "Therefore this is not allowed." ); if (src == null) return null; src = src.trim(); if (src.length() == 0) return null; String srcLC = src.toLowerCase(); if (StrCmpr.startsWithXOR (srcLC, "tel:", "javascript:", "mailto:", "magnet:", "file:", "ftp:", "#")) return new Ret2<URL, MalformedURLException> (null, new MalformedURLException( "InnerTag/Attribute begins with: " + src.substring(0, 1 + src.indexOf(":")) + ", so it is not a hyper-link." )); // Includes the first few characters of the URL - for reporting/convenience. // If this is an "image", the image-type & name will be included if (StrCmpr.startsWithXOR(srcLC, "data:", "blob:")) return new Ret2<URL, MalformedURLException>(null, new MalformedURLException( "InnerTag/Attribute begins with: " + ((src.length() > 25) ? src.substring(0, 25) : src) + ", not a URL." )); if (srcLC.startsWith("http://") || srcLC.startsWith("https://")) try { return new Ret2<URL, MalformedURLException>(new URL(src), null); } catch (MalformedURLException e) { return new Ret2<URL, MalformedURLException>(null, e); } if (src.startsWith("//") && (src.charAt(3) != '/')) try { return new Ret2<URL, MalformedURLException> (new URL( sourcePage.getProtocol().toLowerCase() + ":" + src), null); } catch (MalformedURLException e) { return new Ret2<URL, MalformedURLException>(null, e); } if (src.startsWith("/")) try { return new Ret2<URL, MalformedURLException>(new URL( sourcePage.getProtocol().toLowerCase() + "://" + sourcePage.getHost().toLowerCase() + src), null ); } catch (MalformedURLException e) { return new Ret2<URL, MalformedURLException>(null, e); } if (src.startsWith("../")) { String sourcePageStr = sourcePage.toString(); short nLevels = 0; do { nLevels++; src = src.substring(3); } while (src.startsWith("../")); String directory = StringParse.dotDotParentDirectory(sourcePage.toString(), nLevels); try { return new Ret2<URL, MalformedURLException>(new URL(directory + src), null); } catch (MalformedURLException e) { return new Ret2<URL, MalformedURLException>(null, e); } catch (Exception e) { return new Ret2<URL, MalformedURLException> (null, new MalformedURLException(e.getClass().getCanonicalName() + ":" + e.getMessage()) ); } } String root = sourcePage.getProtocol().toLowerCase() + "://" + sourcePage.getHost().toLowerCase(); String path = sourcePage.getPath().trim(); int pos = StringParse.findLastFrontSlashPos(path); if (pos == -1) throw new StringIndexOutOfBoundsException( "The URL you have provided: " + sourcePage.toString() + " does not have a '/' front-slash character in it's path." + "Cannot proceed resolving relative-URL's without this." ); path = path.substring(0, pos + 1); try { return new Ret2<URL, MalformedURLException>(new URL(root + path + src), null); } catch (MalformedURLException e) { return new Ret2<URL, MalformedURLException>(null, e); }

Class Links

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

_NON_URL_HREFS

Method Detail

NON_URL_HREFS

getBaseURL

resolveAllSRC

resolveAllSRC

resolveAllSRC

resolveAllHREF

resolveAllHREF

resolveAllHREF

resolveHREFAndUpdate

resolveHREF

resolveSRCAndUpdate

resolveSRC

resolveHREFs

resolveSRCs

resolveHREFs

resolveSRCs

resolve

resolve

resolveHREF_KE

resolveSRC_KE

resolveHREFs_KE

resolveSRCs_KE

resolveHREFs_KE

resolveSRCs_KE

resolve_KE

resolve_KE