Package Torello.HTML

Class Links


  • public class Links
    extends java.lang.Object
    Utilities for de-refrencing 'partially-completed' URL's in a Web-Page Vector.

    This is a utility class that helps 'complete' URLs that are often scraped from web-pages, and are 'relative' (partially completed) URLs. This is a common occurrence in browsers, when people do not need to present an entire directory and web-server DNS name for retrieving an image file or link that resides in the same directory as the web-page URL of the page in which that link resides.

    NOTE ABOUT CONTENT: These scrape package classes were initially developed for scraping news-content from the Chinese Government Web-Portal, and redirecting over-seas news-content to a simple translation service for people interested in reading about news from over-seas. This is particularly interesting for a government such as China, were a huge percentage of our economic GDP based on products exported from factories in the Southern Region there to our strip-malls here in Dallas (and other places). Perhaps these URL examples may not seem relevant to a typical Internet-Programmer who is not presently studying languages, but they are staying here anyway.

    EXCEPTION SUPPRESSION: Precisely half of these methods are designed to "sweep" an entire page of HTML. The methods that expect an vector of anchors, images, or other links and iterate over the entire vector or page will catch any and all exception throws of the Java MalformedURLException, and place null in the return vector position for that particular URL. The value of this is, of course, that all links that can be resolved, by the nature of exception-suppression, will be resolved. Checking the return vectors for null-values is necessary when pages that contain broken links or image-sources is important. However, each method that ends with the letter 'KE' shall return a vector that includes any thrown exception in the Java-HTML Tuple-Class Ret2<URL, MalformedURLException>. This concept may seem 'different,' but once this process is familiar - the value of not being forced to write try-catch blocks for every web-page URL-resolve stage will hopefully become obvious.

    NOTE ABOUT TABLE: The following table is attempting to explain the rules for evaluating relative / partial URL's like an HTML '<A ...>' (Anchor-Tag) 'HREF=...' URL, or an <IMG SRC="..."> URL. The column on the left symbolizes the type of TagNode input that contains a URL - which might be a partial URL - while the column on the left hopefully demystifies how such a URL would be "decoded" (de-referenced) from a partial to a complete Uniform Resource Locator.

    HTML TagNode sourceURL:
    http://english.gov.CN/article/01-01-2018/index.html
    <IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.bmp"> http://english.gov.CN/article/01-01-2018/image12345.bmp
    <IMG SRC="/article/01-01-2018/image12345.bmp"> http://english.gov.CN/article/01-01-2018/image12345.bmp
    <IMG SRC="image12345.bmp"> http://english.gov.CN/article/01-01-2018/image12345.bmp
    <IMG SRC="//some.other.url/a.bmp"> http://some.other.url/a.bmp
    <A HREF="#sub-section"> null
    <IMG SRC="../../pic2.bmp"> http://english.gov.CN/pic2.bmp
    <A HREF="tel: (212) 555-6789"> null
    HTML TagNode sourceURL:
    http://english.gov.CN/article/12-31-2018/index.html
    <IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.png"> http://english.gov.CN/article/01-01-2018/image12345.png
    <IMG SRC="/article/01-01-2018/image12345.png"> http://english.gov.CN/article/01-01-2018/image12345.png
    <IMG SRC="image12345.png"> http://english.gov.CN/article/12-31-2018/image12345.png
    <IMG SRC="//some.other.url/a.bmp"> http://some.other.url/a.bmp
    <A HREF="#sub-section"> null
    <IMG SRC="../pic3.bmp"> http://english.gov.CN/article/pic3.bmp
    <A HREF="mailto: sales@acme.com"> null
    HTML TagNode sourceURL:
    http://SpanishNewsBoard.com/article/10-12-2018/index.html
    <IMG SRC="http://english.gov.CN/article/01-01-2018/image12345.jpg"> http://english.gov.CN/article/01-01-2018/image12345.jpg
    <IMG SRC="/article/01-01-2018/image12345.jpg"> http://SpanishNewsBoard.com/article/01-01-2018/image12345.jpg
    <IMG SRC="image12345.jpg"> http://SpanishNewsBoard.com/article/10-12-2018/image12345.jpg
    <IMG SRC="//some.other.url/a.bmp"> http://some.other.url/a.bmp
    <A HREF="#sub-section"> null
    <IMG SRC="../../../pic3.bmp"> null
    <A HREF="javascript: alert("hello world);"> null




    Example Code: The following example will find all HTML <A HREF="..."> (anchor-tags), and replace the HREF value it finds with an absolute url-link

    Example:
    // This fixes the body of a "web-page news-article" (or any web-site html, so to speak)
    // It assures that (after scraping) any original Anchor URL's which contained "relative links"
    // become "absolute links" - by completing the URL.
    
    // The original web-site url
    URL webSiteURL = new URL("https://some-web-site.com/News/Article-Numero-Uno.html");
    
    // Here the HTML page is downloaded to a simple Java Vector. 
    Vector<HTMLNode> page = HTMLPage.getPageTokens(webSiteURL, false);
    
    // Any URL's which do not contain complete URI's - inclusive of a domain-name, directory,
    // and file-name will be completed and inserted back into the page.
    
    Links.resolveAllHREF(page, webSiteURL, SD.SingleQuotes, false);
    


    Common Special Cases: The following special cases for commonly found HREF attributes include URL links that are not intended to point to HTML pages. The following rather commonly found values for HTML Anchor Tag HREF Attributes that will cause this class to return null and/or return an exception include these:

    • <A HREF="tel:<a-telephone-number>" ... >
    • <A HREF="javascript:<some-script-calls>" ... >
    • <A HREF="mailto:<an-email-address>" ... >
    • <A HREF="file:<file-for-download>" ... >
    • <A HREF="ftp:<ftp-file-transfer-protocol-address>" ... >
    • <A HREF="magnet:<bit-torrent-address>" ...>
    • <A HREF="data:<base64-encoded-image>" ... >
    • <A HREF="blob:<Binary-Large-Object>" ... >
    • <A HREF="#<this-page-subsection>" ... >

    Any call to resolve an HTML Anchor element whose URL link begins with the above special-cases will return null, or, if the "Keep Exception" (_KE) version is requested a Torello.Java.Ret2<URL, HREFException> will be returned where the value of ret2.a is null, and the value of ret2.b is an instance of an HREFException
    See Also:
    ReplaceNodes, ReplaceFunction, HTMLPage, InnerTagFind, Ret2


Stateless Class: This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member field. It is very similar to the Java-Bean @Stateless Annotation.
  • 1 Constructor(s), 1 declared private, zero-argument constructor
  • 26 Method(s), 26 declared static
  • 1 Field(s), 1 declared static, 1 declared final


    • Field Summary

      Fields 
      Modifier and Type Field
      protected static String[] _NON_URL_HREFS
    • Method Summary

       
      Resolve URL's
      Modifier and Type Method
      static URL resolve​(String src, URL sourcePage)
      static Vector<URL> resolve​(Vector<String> src, URL sourcePage)
       
      Resolve URL's, but Suppress Exceptions, and Keep Them
      Modifier and Type Method
      static Ret2<URL,
           ​MalformedURLException>
      resolve_KE​(String src, URL sourcePage)
      static Vector<Ret2<URL,
           ​MalformedURLException>>
      resolve_KE​(Vector<String> src, URL sourcePage)
       
      Resolve HREF-Attribute URL's
      Modifier and Type Method
      static URL resolveHREF​(TagNode tnWithHREF, URL sourcePage)
      static TagNode resolveHREFAndUpdate​(TagNode tnWithHREF, URL sourcePage)
      static Vector<URL> resolveHREFs​(Iterable<TagNode> tnListWithHREF, URL sourcePage)
      static Vector<URL> resolveHREFs​(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)
       
      Resolve SRC-Attribute URL's
      Modifier and Type Method
      static URL resolveSRC​(TagNode tnWithSRC, URL sourcePage)
      static TagNode resolveSRCAndUpdate​(TagNode tnWithSRC, URL sourcePage)
      static Vector<URL> resolveSRCs​(Iterable<TagNode> tnListWithSRC, URL sourcePage)
      static Vector<URL> resolveSRCs​(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)
       
      Resolve all HREF URL's on an HTML-Page, and Update the Page-Vector
      Modifier and Type Method
      static Ret3<int[],​int[],​int[]> resolveAllHREF​(Vector<? super TagNode> html, int sPos, int ePos, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
      static Ret3<int[],​int[],​int[]> resolveAllHREF​(Vector<? super TagNode> html, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
      static Ret3<int[],​int[],​int[]> resolveAllHREF​(Vector<? super TagNode> html, DotPair dp, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
       
      Resolve all SRC URL's on an HTML-Page, and Update the Page-Vector
      Modifier and Type Method
      static Ret3<int[],​int[],​int[]> resolveAllSRC​(Vector<? super TagNode> html, int sPos, int ePos, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
      static Ret3<int[],​int[],​int[]> resolveAllSRC​(Vector<? super TagNode> html, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
      static Ret3<int[],​int[],​int[]> resolveAllSRC​(Vector<? super TagNode> html, DotPair dp, URL sourcePage, SD quote, boolean askForReturnArraysOrReturnNull)
       
      Resolve HREF URL's, but Suppress Exceptions, and Keep Them
      Modifier and Type Method
      static Ret2<URL,
           ​MalformedURLException>
      resolveHREF_KE​(TagNode tnWithHREF, URL sourcePage)
      static Vector<Ret2<URL,
           ​MalformedURLException>>
      resolveHREFs_KE​(Iterable<TagNode> tnListWithHREF, URL sourcePage)
      static Vector<Ret2<URL,
           ​MalformedURLException>>
      resolveHREFs_KE​(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)
       
      Resolve SRC URL's, but Suppress Exceptions, and Keep Them
      Modifier and Type Method
      static Ret2<URL,
           ​MalformedURLException>
      resolveSRC_KE​(TagNode tnWithSRC, URL sourcePage)
      static Vector<Ret2<URL,
           ​MalformedURLException>>
      resolveSRCs_KE​(Iterable<TagNode> tnListWithSRC, URL sourcePage)
      static Vector<Ret2<URL,
           ​MalformedURLException>>
      resolveSRCs_KE​(Vector<? extends HTMLNode> html, int[] nodePosArr, URL sourcePage)
       
      More Methods
      Modifier and Type Method
      static URL getBaseURL​(Vector<? extends HTMLNode> page)
      static String[] NON_URL_HREFS()
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • _NON_URL_HREFS

        🡇    
        protected static final java.lang.String[] _NON_URL_HREFS
        List of documented "starter-strings" that are sometimes used in Anchor URL 'HREF=...' attributes.
        See Also:
        NON_URL_HREFS()
        Code:
        Exact Field Declaration Expression:
        protected static final String[] _NON_URL_HREFS =
                { "tel:", "magnet:", "javascript:", "mailto:", "ftp:", "file:", "data:", "blog:", "#" };
        
    • Method Detail

      • NON_URL_HREFS

        🡅  🡇    
        public static java.lang.String[] NON_URL_HREFS()
        This small method just returns the complete list of commonly found Anchor 'HREF' String's that do not actually constitute an HTML 'URL'. This method actually returns a "clone" of an internally stored String[] Array. This is to protect and make sure that the list of potential HTML Anchor-Tag 'HREF' Attributes is not changed, doctored or modified
        Returns:
        A clone of the String-array '_NON_URL_HREFS'
        See Also:
        _NON_URL_HREFS
        Code:
        Exact Method Body:
         return _NON_URL_HREFS.clone();
        
      • getBaseURL

        🡅  🡇    
        public static java.net.URL getBaseURL​
                    (java.util.Vector<? extends HTMLNode> page)
                throws MalformedHTMLException,
                       java.net.MalformedURLException
        
        The methods in this class will not automatically extract any HTML <BASE HREF=URL> definitions that are found on this page. If the user wishes to dereference partial / relative URL definitions that exist on the input page, all the while respecting any <BASE HREF=URL> definitions found on the input page, then this method should be utilized.
        Parameters:
        page - This may be any HTML page or partial page. If this page has a valid HTML <BASE HREF=URL>, it will be extracted and returned as an instance of class URL.
        Returns:
        This shall return the HTML <BASE HREF="http://..."> element found available within the input-page parameter 'page'. If the page provided does not contain a BASE URL definition, then null shall be returned.

        NOTE: The HTML Specification clearly states that only one URL may be defined using the HTML Element <BASE>. Clearly, due to the browser wars, unspecified / non-deterministic behavior is possible if multiple definitions are provided. For the purposes of this class, if such a situation arises, an exception is thrown.
        Throws:
        MalformedHTMLException - If the HTML page provided contains multiple definitions of the element <BASE HREF=URL>, then this exception will throw.
        java.net.MalformedURLException - If the <BASE HREF=URL> found / identified within the input page, but that URL is invalid, then this exception shall throw.
        See Also:
        TagNodeFind
        Code:
        Exact Method Body:
         int[] posArr = TagNodeFind.all(page, TC.OpeningTags, "base");
        
         if (posArr.length == 0) return null;
        
         // NOTE: The cast is all right because 'posArr' only points to TagNode's
         // Attributes expects to avoid processing Vector<TextNode>, and Vector<CommentNode>
         // Above, there will be nothing in the 'posArr' if either of those was passed.
         @SuppressWarnings("unchecked")
         String[]    urls    = Attributes.retrieve((Vector<HTMLNode>) page, posArr, "href");
        
         boolean     found   = false;
         String      ret     = null;
        
         for (String url : urls)
             if ((url != null) && (url.length() > 0))
                 if (found)
                     throw new MalformedHTMLException(
                         "The page you have provided has multiple <BASE HREF=URL> definitions.  " +
                         "However, the HTML Specifications state that pages may provide just one " +
                         "definition.  If you wish to proceed, retrieve the definitions manually " +
                         "using class TagNodeFind.all and Attributes.retrieve, as explained in " +
                         "the JavaDoc pages for this class."
                     );
                 else 
                 {
                     found = true;
                     ret = url;
                 }
        
         return new URL(ret);
        
      • resolveAllSRC

        🡅  🡇    
        public static Ret3<int[],​int[],​int[]> resolveAllSRC​
                    (java.util.Vector<? super TagNode> html,
                     int sPos,
                     int ePos,
                     java.net.URL sourcePage,
                     SD quote,
                     boolean askForReturnArraysOrReturnNull)
        
        This method shall resolve all partial URL addresses that are found within TagNode elements having 'SRC=...' attributes. Each instance of TagNode found in the input HTML Vector that has an 'SRC' attribute - if the 'URL' is only partially resolve - shall be updated and replaced with a new TagNode with a fully resolved URL.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the HTML- will be resolved.
        quote - A choice for the quotes to use. In most cases, URL attribute values do not contain quotation-marks. So likely either choice would work just fine, without exceptions.

        NOTE: null may be passed to this parameter, and if it is the original quotation marks found in the TagNode's 'SRC' attribute will be reused. Passing null to this parameter should almost always be easiest, safest.
        askForReturnArraysOrReturnNull - This (long-named) parameter is merely here to facilitate retrieving more information from this method - if necessary. When this parameter receives the following values:

        • TRUE: Three integer int[] arrays will be returned as listed in the Returns: section of this method's documentation.
        • FALSE: This method shall return null.
        Returns:
        If input parameter 'askForReturnArraysOrReturnNull' has been passed FALSE, this method shall return null. Otherwise, (if passed TRUE), then this method shall return an instance of 'Ret3<int[], int[], int[]>' - which is returning three separate integer-arrays about what was found, and what has occurred.

        Three arrays are returned as a result of this method's invocation. Keep in mind that though the information might be superfluous, rejecting these arrays away is easy. They are provided as a matter of convenience for cases where more details information is mandatory for ensuring that long lists of HTMLNode's were properly updated.

        1. Ret3.a (int[])

          The first int[] array shall contain a list of the index of every TagNode in the input-Vector parameter's range that contained a non-null HTML 'SRC' Attribute.

        2. Ret3.b (int[])

          The second int[] array will contain an index-list of the indices which contained TagNode's that were replaced by the internal-resolve logic.

        3. Ret3.c (int[])

          The third int[] array will contain an index-list of the indices which contained TagNode's whose 'SRC=...' attribute failed to be resolved by the internal-resolve logic, or caused a QuotesException to throw.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        resolve(String, URL), TagNode.AV(String), TagNode.setAV(String, String, SD)
        Code:
        Exact Method Body:
         // Retrieve the Vector-location of any TagNode on the page that has
         // a "SRC=..." attribute.  These are almost always HTML <IMG> elements.
         // NOTE: FIND Method's are "READ ONLY" - the Cast will make no difference at run-time.
         //       The @SuppressWarnings is to overcome the cast of 'html'
         @SuppressWarnings("unchecked")
         int[] hasSrcPosArr = InnerTagFind.all((Vector<HTMLNode>) html, sPos, ePos, "src");
        
         // Java Stream's are convenient for keeping "Growing Lists" of return values.
         // This builder shall keep a list of all URL's that failed to update - for any reason
         // **UNLESS** the reason is that the URL was already a fully-resolved, non-partial URL,
         IntStream.Builder failedUpdate = askForReturnArraysOrReturnNull
             ? IntStream.builder() 
             : null;
        
         // This stream will keep a list of all URL's that were updated, and whose TagNode's
         // were replaced inside the input HTML Vector
         IntStream.Builder replaced = askForReturnArraysOrReturnNull
             ? IntStream.builder()
             : null;
        
         for (int pos : hasSrcPosArr)
         {
             // Get the node at the index
             TagNode tn = (TagNode) html.elementAt(pos);
        
             // 1) Retrieve the SRC Attribute
             // 2) if it is a partial-URL resolve it
             // 3) Convert to a String
             String  oldURL = tn.AV("src");
             URL     newURL = resolve(oldURL, sourcePage);
        
             // Some URL's cannot be resolved, if so, just skip this TagNode.
             // Log the index to the stream (if requested), and continue.
             if (newURL == null)
             { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; }
        
             // If the URL was already a fully-resolved-URL, continue - don't replace the TagNode;
             // No logging needed here, the URL was *already* resolved...
             if (oldURL.length() == newURL.toString().length()) continue;
        
             // Replace the SRC Attribute in the TagNode.  This builds a new instance of TagNode
             // If there is an exception, log the index to the stream (if requested), and continue.
             try
                 { tn = tn.setAV("src", newURL.toString(), quote); }
             catch (QuotesException qex)
                 { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; }
        
             // Replace the index in the Vector containing the old TagNode with the new one.
             html.setElementAt(tn , pos);
        
             // The Vector-Index at this position had it's old TagNode removed and replaced with a
             // new updated one.  Log this to the stream-list so to allow the user to know.
             if (askForReturnArraysOrReturnNull) replaced.accept(pos);
         }
        
         return askForReturnArraysOrReturnNull
             ? new Ret3<int[], int[], int[]>
                 (hasSrcPosArr, replaced.build().toArray(), failedUpdate.build().toArray())
             : null;
        
      • resolveAllHREF

        🡅  🡇    
        public static Ret3<int[],​int[],​int[]> resolveAllHREF​
                    (java.util.Vector<? super TagNode> html,
                     int sPos,
                     int ePos,
                     java.net.URL sourcePage,
                     SD quote,
                     boolean askForReturnArraysOrReturnNull)
        
        This method shall resolve all partial URL addresses that are found within TagNode elements having 'HREF=...' attributes. Each instance of TagNode found in the input HTML Vector that has an 'HREF' attribute - if the 'URL' is only partially resolve - shall be updated and replaced with a new TagNode with a fully resolved URL.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the HTML- will be resolved.
        quote - A choice for the quotes to use. In most cases, URL attribute values do not contain quotation-marks. So likely either choice would work just fine, without exceptions.

        NOTE: null may be passed to this parameter, and if it is the original quotation marks found in the TagNode's 'HREF' attribute will be reused. Passing null to this parameter should almost always be easiest, safest.
        askForReturnArraysOrReturnNull - This (long-named) parameter is merely here to facilitate retrieving more information from this method - if necessary. When this parameter receives the following values:

        • TRUE: Three integer int[] arrays will be returned as listed in the Returns: section of this method's documentation.
        • FALSE: This method shall return null.
        Returns:
        If input parameter 'askForReturnArraysOrReturnNull' has been passed FALSE, this method shall return null. Otherwise, (if passed TRUE), then this method shall return an instance of 'Ret3<int[], int[], int[]>' - which is returning three separate integer-arrays about what was found, and what has occurred.

        Three arrays are returned as a result of this method's invocation. Keep in mind that though the information might be superfluous, rejecting these arrays away is easy. They are provided as a matter of convenience for cases where more details information is mandatory for ensuring that long lists of HTMLNode's were properly updated.

        1. Ret3.a (int[])

          The first int[] array shall contain a list of the index of every TagNode in the input-Vector parameter's range that contained a non-null HTML 'HREF' Attribute.

        2. Ret3.b (int[])

          The second int[] array will contain an index-list of the indices which contained TagNode's that were replaced by the internal-resolve logic.

        3. Ret3.c (int[])

          The third int[] array will contain an index-list of the indices which contained TagNode's whose 'HREF=...' attribute failed to be resolved by the internal-resolve logic, or caused a QuotesException to throw.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        resolve(String, URL), TagNode.AV(String), TagNode.setAV(String, String, SD)
        Code:
        Exact Method Body:
         // Retrieve the Vector-location of any TagNode on the page that has
         // a "HREF=..." attribute.  These are almost always HTML <IMG> elements.
         // NOTE: FIND Method's are "READ ONLY" - the Cast will make no difference at run-time.
         //       The @SuppressWarnings is to overcome the cast of 'html'
         @SuppressWarnings("unchecked")
         int[] hasHRefPosArr = InnerTagFind.all((Vector<HTMLNode>) html, sPos, ePos, "href");
        
         // Java Stream's are convenient for keeping "Growing Lists" of return values.
         // This builder shall keep a list of all URL's that failed to update - for any reason
         // **UNLESS** the reason is that the URL was already a fully-resolved, non-partial URL,
         IntStream.Builder failedUpdate = askForReturnArraysOrReturnNull
             ? IntStream.builder() 
             : null;
        
         // This stream will keep a list of all URL's that were updated, and whose TagNode's
         // were replaced inside the input HTML Vector
         IntStream.Builder replaced = askForReturnArraysOrReturnNull
             ? IntStream.builder()
             : null;
        
         for (int pos : hasHRefPosArr)
         {
             // Get the node at the index
             TagNode tn = (TagNode) html.elementAt(pos);
        
             // 1) Retrieve the HREF Attribute
             // 2) if it is a partial-URL resolve it
             // 3) Convert to a String
             String  oldURL = tn.AV("HREF");
             URL     newURL = resolve(oldURL, sourcePage);
        
             // Some URL's cannot be resolved, if so, just skip this TagNode.
             // Log the index to the stream (if requested), and continue.
             if (newURL == null)
             { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; }
        
             // If the URL was already a fully-resolved-URL, continue - don't replace the TagNode;
             // No logging needed here, the URL was *already* resolved...
             if (oldURL.length() == newURL.toString().length()) continue;
        
             // Replace the HREF Attribute in the TagNode.  This builds a new instance of TagNode
             // If there is an exception, log the index to the stream (if requested), and continue.
             try
                 { tn = tn.setAV("href", newURL.toString(), quote); }
             catch (QuotesException qex)
                 { if (askForReturnArraysOrReturnNull) failedUpdate.accept(pos); continue; }
        
             // Replace the index in the Vector containing the old TagNode with the new one.
             html.setElementAt(tn , pos);
        
             // The Vector-Index at this position had it's old TagNode removed and replaced with a
             // new updated one.  Log this to the stream-list so to allow the user to know.
             if (askForReturnArraysOrReturnNull) replaced.accept(pos);
         }
        
         return askForReturnArraysOrReturnNull
             ? new Ret3<int[], int[], int[]>
                 (hasHRefPosArr, replaced.build().toArray(), failedUpdate.build().toArray())
             : null;
        
      • resolveHREF

        🡅  🡇    
        public static java.net.URL resolveHREF​(TagNode tnWithHREF,
                                               java.net.URL sourcePage)
        This should be used for TagNode's that contain an 'HREF' inner-tag (attribute).
        Parameters:
        tnWithHREF - This may be any HTML Element that contains an 'HREF' attribute.

        NOTE: An HTML 'anchor' element (< HREF=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
        sourcePage - This is the source page URL from which the TagNode (possibly-relative) URL will be resolved.
        Returns:
        A complete-URL without any missing "presumed data" - such as host/domain or directory. Null is returned if attempting to build the URL generated a MalformedURLException.

        SPECIFICALLY: This method shall catch all MalformedURLException's.
        Throws:
        HREFException - If the TagNode passed to parameter 'tnWithHREF' does not actually contain an HREF attribute, then this exception shall throw.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         String href = tnWithHREF.AV("href");
        
         if (href == null) throw new HREFException(
             "The TagNode passed to parameter tnWithHREF does not actually contain an " +
             "HREF attribute."
         );
        
         return resolve(href, sourcePage);
        
      • resolveSRC

        🡅  🡇    
        public static java.net.URL resolveSRC​(TagNode tnWithSRC,
                                              java.net.URL sourcePage)
        This should be used for TagNode's that contain a 'SRC' inner-tag (attribute).
        Parameters:
        tnWithSRC - This may be any HTML Element that contains a 'SRC' attribute.

        NOTE: An HTML 'image' element (<IMG SRC=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
        sourcePage - This is the source page URL from which the TagNode (possibly-relative) URL will be resolved.
        Returns:
        A complete-URL without any missing "presumed data" - such as host/domain or directory. Null is returned if attempting to build the URL generated a MalformedURLException.

        SPECIFICALLY: This method shall catch all MalformedURLException's.
        Throws:
        SRCException - If the TagNode passed to parameter 'tnWithSRC' does not actually contain a SRC attribute, then this exception shall throw.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         String src = tnWithSRC.AV("src");
        
         if (src == null) throw new SRCException(
             "The TagNode passed to parameter tnWithSRC does not actually contain a " +
             "SRC attribute."
         );
        
         return resolve(src, sourcePage);
        
      • resolveHREFs

        🡅  🡇    
        public static java.util.Vector<java.net.URL> resolveHREFs​
                    (java.lang.Iterable<TagNode> tnListWithHREF,
                     java.net.URL sourcePage)
        
        This should be used for lists of TagNode's, each of which contain an 'HREF' inner-tag (attribute).
        Parameters:
        tnListWithHREF - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'HREF' attribute.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector.

        SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         Vector<URL> ret = new Vector<>();
        
         for (TagNode tn : tnListWithHREF) ret.addElement(resolve(tn.AV("href"), sourcePage));
        
         return ret;
        
      • resolveSRCs

        🡅  🡇    
        public static java.util.Vector<java.net.URL> resolveSRCs​
                    (java.lang.Iterable<TagNode> tnListWithSRC,
                     java.net.URL sourcePage)
        
        This should be used for lists of TagNode's, each of which contain a 'SRC' inner-tag (attribute).
        Parameters:
        tnListWithSRC - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'SRC' attribute.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector.

        SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         Vector<URL> ret = new Vector<>();
        
         for (TagNode tn : tnListWithSRC) ret.addElement(resolve(tn.AV("src"), sourcePage));
        
         return ret;
        
      • resolveHREFs

        🡅  🡇    
        public static java.util.Vector<java.net.URL> resolveHREFs​
                    (java.util.Vector<? extends HTMLNode> html,
                     int[] nodePosArr,
                     java.net.URL sourcePage)
        
        This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode's to which this pointer-array points - must each contain an HREF inner-tag with a URL, or a partial URL.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain HREF attributes. Integer-pointer Arrays are usually returned from the package 'NodeSearch' "Find" methods.

        Example:
         // Retrieve 'pointers' to all the '<A HREF=...>' TagNode's.  The term 'pointer' refers to
         // integer-indices into the vectorized-html variable 'page'
         int[] anchorPosArr = TagNodeFind.all(page, TC.OpeningTags, "a");
         
         // Extract each HREF inner-tag, and construct a {@code URL}.  Use the 'sourcePage' parameter if
         // the URL is only partially-resolved
         Vector<URL> urls = Links.resolveHREFs(page, anchorPosArr, mySourcePage);
        

        which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<A ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shortened URL's.
        sourcePage - This is the source page URL from whence the (possibly relative) TagNode URL's in the Vector are to be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector. However, if any of the nodes pointed to by the 'nodePosArr' parameter do not contain opening TagNode elements, then this mistake shall generate TagNodeExpectedException's.

        SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its boolean 'isClosing' field set to TRUE, then this exception shall throw. When passing an int[] posArr (integer-array) of Vector-indices, there is code which expects that each of those locations pointed-to to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         Vector<URL> ret = new Vector<>();                           // Return Vector
        
         for (int nodePos : nodePosArr)
         {
             HTMLNode n = html.elementAt(nodePos);
             if (! n.isTagNode())                                    // Must be an HTML TagNode
                 throw new TagNodeExpectedException(nodePos);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                       // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(nodePos);
        
             ret.addElement(resolve(tn.AV("href"), sourcePage));     // Resolve the 'HREF', save the URL
         }
        
         return ret;
        
      • resolveSRCs

        🡅  🡇    
        public static java.util.Vector<java.net.URL> resolveSRCs​
                    (java.util.Vector<? extends HTMLNode> html,
                     int[] nodePosArr,
                     java.net.URL sourcePage)
        
        This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode's to which this pointer-array points - must each contain a SRC inner-tag with a URL, or a partial URL.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>. Any HTML page (or sub-page)
        nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain SRC attributes. Integer-pointer Arrays are usually returned from the package 'NodeSearch' "Find" methods.

        Example:
         // Retrieve 'pointers' to all the '<IMG SRC=...>' TagNode's.  The term 'pointer' refers to
         // integer-indices into the vectorized-html variable 'page'
         int[] picturePosArr = TagNodeFind.all(page, TC.OpeningTags, "img");
         
         // Extract each SRC inner-tag, and construct a {@code URL}.  Use the 'sourcePage' parameter if
         // the URL is only partially-resolved
         Vector<URL> urls = Links.resolveSRCs(page, picturePosArr, mySourcePage);
        

        which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<IMG ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shorted image URL's.
        sourcePage - This is the source page URL from whence the (possibly relative) TagNode URL's in the Vector are to be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. Any TagNode which generated an exception, will result in a null value in the Vector. However, if any of the nodes pointed to by the 'nodePosArr' parameter do not contain opening TagNode elements, then this mistake shall generate TagNodeExpectedException's.

        SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its boolean 'isClosing' field set to TRUE, then this exception shall throw. When passing an int[] posArr (integer-array) of Vector-indices, there is code which expects that each of those locations pointed-to to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        resolve(String, URL), TagNode.AV(String)
        Code:
        Exact Method Body:
         Vector<URL> ret = new Vector<>();                               // Return Vector
        
         for (int nodePos : nodePosArr)
         {
             HTMLNode n = html.elementAt(nodePos);
             if (! n.isTagNode())                                        // Must be an HTML TagNode
                 throw new TagNodeExpectedException(nodePos);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                           // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(nodePos);
        
             ret.addElement(resolve(tn.AV("src"), sourcePage));          // Resolve the "SRC", save the URL
         }
        
         return ret;
        
      • resolve

        🡅  🡇    
        public static java.util.Vector<java.net.URL> resolve​
                    (java.util.Vector<java.lang.String> src,
                     java.net.URL sourcePage)
        
        This will convert a list of simple java String's to a list/Vector of URL's, de-referencing any missing information using the 'sourcePage' parameter.
        Parameters:
        src - a list of strings - usually partially or totally completed Internet URL's
        sourcePage - This is the source page URL from which the String's (possibly-relative) URL's in the Vector will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any String's that were zero-length or null, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will be null.
        See Also:
        resolve(String, URL)
        Code:
        Exact Method Body:
         Vector<URL> ret = new Vector<>();
        
         for (String s : src) ret.addElement(resolve(s, sourcePage));
        
         return ret;
        
      • resolve

        🡅  🡇    
        public static java.net.URL resolve​(java.lang.String src,
                                           java.net.URL sourcePage)
        This will convert a simple java String to a URL, de-referencing any missing information using the 'sourcePage' parameter.
        Parameters:
        src - Any java String, usually one which was scraped from an HTML-Page, and needs to be "completed."
        sourcePage - This is the source page URL from which the String (possibly-relative) URL will be resolved.
        Returns:
        A URL, which has been completed/resolved with the 'sourcePage' parameter. If parameter 'src' is null or zero-length, then this method will also return null. If a MalformedURLException is generated, null will also be returned.
        Code:
        Exact Method Body:
         if (sourcePage == null) throw new NullPointerException(
             "Though you may provide null to the partial-URL to dereference parameter, null " +
             "may not be passed to the Source-Page Parameter.  The purpose of the 'resolve' " +
             "operation is to resolve partial-URLs against a source-page (root) URL. " +
             "Therefore this is not allowed."
         );
        
         if (src == null) return null;
        
         src = src.trim();
        
         if (src.length() == 0) return null;
        
         String srcLC = src.toLowerCase();
        
         if (StrCmpr.startsWithXOR(srcLC, _NON_URL_HREFS)) return null;
        
         if (srcLC.startsWith("http://") || srcLC.startsWith("https://"))
             try
                 { return new URL(src); }
             catch (MalformedURLException e)
                 { return null; }
        
         if (src.startsWith("//") && (src.charAt(3) != '/'))
             try
                 { return new URL(sourcePage.getProtocol().toLowerCase() + ":" + src); }
             catch (MalformedURLException e)
                 { return null; }
                
         if (src.startsWith("/"))
             try
             { 
                 return new URL(
                     sourcePage.getProtocol().toLowerCase() + "://" +
                     sourcePage.getHost().toLowerCase() +
                     src
                 );
             }
             catch (MalformedURLException e)
                 { return null; }
         
         if (src.startsWith("../"))
         {
             String  sourcePageStr   = sourcePage.toString();
             short   nLevels         = 0;
        
             do      { nLevels++;  src = src.substring(3); }
             while   (src.startsWith("../"));
        
             String  directory = StringParse.dotDotParentDirectory(sourcePage.toString(), nLevels);
        
             try     { return new URL(directory + src); }
             catch   (Exception e) { return null; }
         }
        
         String  root    = sourcePage.getProtocol().toLowerCase() + "://" + 
                             sourcePage.getHost().toLowerCase();
         String  path    = sourcePage.getPath().trim();
         int     pos     = StringParse.findLastFrontSlashPos(path);
        
         if (pos == -1) throw new StringIndexOutOfBoundsException(
             "The URL you have provided: " + sourcePage.toString() + " does not have a '/' " +
             "front-slash character in it's path.  Cannot proceed resolving relative-URL's " +
             "without this."
         );
        
         path = path.substring(0, pos + 1);
        
         try     { return new URL(root + path + src); }
         catch   (MalformedURLException e) { return null; }
        
      • resolveHREF_KE

        🡅  🡇    
        public static Ret2<java.net.URL,​java.net.MalformedURLException> resolveHREF_KE​
                    (TagNode tnWithHREF,
                     java.net.URL sourcePage)
        
        This should be used for TagNode's that contain an 'HREF' inner-tag (attribute).

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        tnWithHREF - This may be any HTML Element that contains an 'HREF' attribute.

        NOTE: An HTML 'anchor' element (< HREF=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL will be resolved.
        Returns:
        A complete-URL without any missing "presumed data" - such as host/domain or directory. If there were no HREF tag, then null is returned. If the TagNode causes a MalformedURLException, that is returned in Ret2.b

        SPECIFICALLY: This method shall catch all MalformedURLException's.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        Throws:
        HREFException - If the TagNode passed to parameter 'tnWithHREF' does not actually contain an HREF attribute, then this exception shall throw.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         String href = tnWithHREF.AV("href");
        
         if (href == null) throw new HREFException(
             "The TagNode passed to parameter tnWithHREF does not actually contain an " +
             "HREF attribute."
         );
        
         return resolve_KE(href, sourcePage);
        
      • resolveSRC_KE

        🡅  🡇    
        public static Ret2<java.net.URL,​java.net.MalformedURLException> resolveSRC_KE​
                    (TagNode tnWithSRC,
                     java.net.URL sourcePage)
        
        This should be used for TagNode's that contain a 'SRC' inner-tag (attribute).

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        tnWithSRC - This may be any HTML Element that contains a 'SRC' attribute.

        NOTE: An HTML 'image' element (<IMG SRC=...>) will contain these. Often the URL's found here contain "relative" rather than "absolute" addresses.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL will be resolved.
        Returns:
        A complete-URL without any missing "presumed data" - such as host/domain or directory. If there were no SRC tag, then null is returned. If the TagNode causes a MalformedURLException, that is returned in Ret2.b

        SPECIFICALLY: This method shall catch all MalformedURLException's.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        Throws:
        SRCException - If the TagNode passed to parameter 'tnWithSRC' does not actually contain a SRC attribute, then this exception shall throw.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         String src = tnWithSRC.AV("src");
        
         if (src == null) throw new SRCException(
             "The TagNode passed to parameter tnWithSRC does not actually contain a " +
             "SRC attribute."
         );
        
         return resolve_KE(src, sourcePage);
        
      • resolveHREFs_KE

        🡅  🡇    
        public static java.util.Vector<Ret2<java.net.URL,​java.net.MalformedURLException>> resolveHREFs_KE​
                    (java.lang.Iterable<TagNode> tnListWithHREF,
                     java.net.URL sourcePage)
        
        This should be used for lists of TagNode's, each of which contain an 'HREF' inner-tag (attribute).

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        tnListWithHREF - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'HREF' attribute.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no HREF tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b

        SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>();
        
         for (TagNode tn : tnListWithHREF) ret.addElement(resolve_KE(tn.AV("href"), sourcePage));
        
         return ret;
        
      • resolveSRCs_KE

        🡅  🡇    
        public static java.util.Vector<Ret2<java.net.URL,​java.net.MalformedURLException>> resolveSRCs_KE​
                    (java.lang.Iterable<TagNode> tnListWithSRC,
                     java.net.URL sourcePage)
        
        This should be used for lists of TagNode's, each of which contain a 'SRC' inner-tag (attribute).

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        tnListWithSRC - This may be any list of HTML Elements, each of which must be instances of class TagNode and all of which must have a 'SRC' attribute.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Iterable will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no SRC tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b

        SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>();
        
         for (TagNode tn : tnListWithSRC) ret.addElement(resolve_KE(tn.AV("src"), sourcePage));
        
         return ret;
        
      • resolveHREFs_KE

        🡅  🡇    
        public static java.util.Vector<Ret2<java.net.URL,​java.net.MalformedURLException>> resolveHREFs_KE​
                    (java.util.Vector<? extends HTMLNode> html,
                     int[] nodePosArr,
                     java.net.URL sourcePage)
        
        This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode to which this pointer-array points - must contain HREF inner-tags with URL's, or partial URL's.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>. Any HTML page (or sub-page)
        nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain HREF attributes. Integer-pointer Arrays are usually return from the package 'NodeSearch' "Find" methods.

        Example:
         // Retrieve 'pointers' to all the '<A HREF=...>' TagNode's.  The term 'pointer' refers to
         // integer-indices into the vectorized-html variable 'page'
         int[] anchorPosArr = TagNodeFind.all(page, TC.OpeningTags, "a");
         
         // Extract each HREF inner-tag, and construct a URL.  Use the 'sourcePage' parameter if
         // the URL is only partially-resolved.  If any URL's on the original-page are invalid, the
         // method shall not crash, but save the exception instead.
         Vector<Ret2<URL, MalformedURLException> urlsWithEx = Links.resolveHREFs_KE(page, picturePosArr, mySourcePage);
        
         // Print out any "failed" urls
         for (Ret2<URL, MalformedURLException> r : urlsWithEx)
             if (r.b != null) 
                 System.out.println("There was an exception: " + r.b.toString());
        

        which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<A ...>" element that was available in the HTML page-Vector parameter 'html'., and then resolve any shortened URL's.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Vector will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no HREF tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b

        SPECIFICALLY: If any of the elements in tnListWithHREF do not contain an HREF inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its boolean 'isClosing' field set to TRUE, then this exception shall throw. When passing an int[] posArr (integer-array) of Vector-indices, there is code which expects that each of those locations pointed-to to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>();
                                                                     // Return Vector
        
         for (int nodePos : nodePosArr)
         {
             HTMLNode n = html.elementAt(nodePos);
             if (! n.isTagNode())                                    // Must be an HTML TagNode
                 throw new TagNodeExpectedException(nodePos);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                       // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(nodePos);
        
             ret.addElement(resolve_KE(tn.AV("href"), sourcePage));  // Resolve the "HREF", keep the URL
         }
        
         return ret;
        
      • resolveSRCs_KE

        🡅  🡇    
        public static java.util.Vector<Ret2<java.net.URL,​java.net.MalformedURLException>> resolveSRCs_KE​
                    (java.util.Vector<? extends HTMLNode> html,
                     int[] nodePosArr,
                     java.net.URL sourcePage)
        
        This will use a "pointer array" - an array containing indexes into the downloaded page to retrieve TagNode's. The TagNode to which this pointer-array points - must contain SRC inner-tags with URL's, or partial URL's.

        <BASE HREF=URL> NOTE: Methods in this class which accept a complete (or partial) HTML Vector (using a parameter such as Vector<HTMLNode>) must take care to check if the page provided has a definition for HTML Element <BASE HREF=URL>.

        If the input page has such a definition, none of the methods in this class will actually heed it (at all), and therefore the user must manually invoke the method getBaseURL(Vector) in order to retrieve that URL, and then pass that result to input-parameter sourcePage.

        ALSO: Few modern HTML pages make use of this definition.

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>. Any HTML page (or sub-page)
        nodePosArr - An array of pointers into the page or sub-page. The pointers must reference TagNode's that contain SRC attributes. Integer-pointer Arrays are usually return from the package 'NodeSearch' "Find" methods.

        Example:
         // Retrieve 'pointers' to all the '<IMG SRC=...>' TagNode's.  The term 'pointer' refers to
         // integer-indices into the vectorized-html variable 'page'
         int[] picturePosArr = TagNodeFind.all(page, TC.OpeningTags, "img");
         
         // Extract each SRC inner-tag, and construct a URL.  Use the 'sourcePage' parameter if
         // the URL is only partially-resolved.  If any URL's on the original-page are invalid,
         // the method shall not crash, but save the exception instead.
         Vector<Ret2<URL, MalformedURLException> urlsWithEx = Links.resolveSRCs_KE(page, picturePosArr, mySourcePage);
        
         // Print out any "failed" urls
         for (Ret2<URL, MalformedURLException> r : urlsWithEx)
             if (r.b != null) 
                 System.out.println("There was an exception: " + r.b.toString());
        

        which would obtain a pointer-array / (a.k.a. a "vector-index-array") to every HTML "<IMG ...>" element that was available in the HTML page-Vector parameter 'html', and then resolve any shortened URL's.
        sourcePage - This is the source page URL from which the TagNode's (possibly-relative) URL's in the Vector will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any TagNode with no SRC tag, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b

        SPECIFICALLY: If any of the elements in tnListWithSRC do not contain a SRC inner-tag, then the method will default, and also cause a null return value in the Vector. Note that the primary impetus for returning 'null' rather than throwing an exception is due to cases where large numbers of links from a web-page are being de-referenced, skipping over "broken URL's" makes for simpler coding.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector-index must point-to an instance of TagNode, but that index instead holds some other HTMLNode instance (either CommentNode or TextNode). If an integer-position array (int[] posArr) is passed, but that array has an index pointing-to - something besides a TagNode - then this exception will be thrown.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but that TagNode is one in which its boolean 'isClosing' field set to TRUE, then this exception shall throw. When passing an int[] posArr (integer-array) of Vector-indices, there is code which expects that each of those locations pointed-to to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        resolve_KE(String, URL), TagNode.AV(String), Ret2
        Code:
        Exact Method Body:
         Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>();
                                                                     // Return Vector
        
         for (int nodePos : nodePosArr)
         {
             HTMLNode n = html.elementAt(nodePos);
             if (! n.isTagNode())                                    // Must be an HTML TagNode
                 throw new TagNodeExpectedException(nodePos);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                       // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(nodePos);
        
             ret.addElement(resolve_KE(tn.AV("src"), sourcePage));   // Resolve "SRC" and keep URL's
         }
        
         return ret;
        
      • resolve_KE

        🡅  🡇    
        public static java.util.Vector<Ret2<java.net.URL,​java.net.MalformedURLException>> resolve_KE​
                    (java.util.Vector<java.lang.String> src,
                     java.net.URL sourcePage)
        
        Resolve all URL's, represented as String's, inside of a Vector.

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        src - a list of String's - usually partially or totally completed Internet URL's
        sourcePage - This is the source page URL from which the String's (possibly-relative) URL's in the Vector will be resolved.
        Returns:
        A list of URL's, each of which have been completed/resolved with the 'sourcePage' parameter. If there were any String's that were zero-length or null, then null is returned in the related Vector position. If any TagNode causes a MalformedURLException, then that position in the Vector will contain the exception in Ret2.b

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        See Also:
        resolve_KE(String, URL), Ret2
        Code:
        Exact Method Body:
         Vector<Ret2<URL, MalformedURLException>> ret = new Vector<>();
        
         for (String s : src) ret.addElement(resolve_KE(s, sourcePage));
        
         return ret;
        
      • resolve_KE

        🡅    
        public static Ret2<java.net.URL,​java.net.MalformedURLException> resolve_KE​
                    (java.lang.String src,
                     java.net.URL sourcePage)
        
        This will convert a simple java String to a URL, de-referencing any missing information using the 'sourcePage' parameter.

        'KE' - Keep Exceptions: If this method generates a 'MalformedURLException' it will be returned along with the result (not thrown).

        Ret2<URL, MalformedURLException>: The URL result or an exception will be returned in the Java.Additional.Ret2<A, B> data-structure.
        Parameters:
        src - Any java String, usually one which was scraped from an HTML-Page, and needs to be "completed."
        sourcePage - This is the source page URL from which the String (possibly relative) URL will be resolved.
        Returns:
        A URL, which has been completed/resolved with the 'sourcePage' parameter. If parameter 'src' is null or zero-length, null will be returned. If a MalformedURLException is thrown, that will be included with the Ret2<> result.

        • Ret2.a (URL)

          This shall contain the fully resolved URL - resolved using the parameter 'sourcePage' as the Base-URL.

        • Ret2.b (MalformedURLException)

          If there were any problems resolving the URL - such that an exception was thrown while producing the resolved-URL, the exception thrown will be caught and returned as a reference instead.
        See Also:
        Ret2
        Code:
        Exact Method Body:
         if (sourcePage == null) throw new NullPointerException(
             "Though you may provide null to the partial-URL to dereference parameter, null " +
             "may not be passed to the Source-Page Parameter.  The purpose of the 'resolve' " +
             "operation is to resolve partial-URLs against a source-page (root) URL. " +
             "Therefore this is not allowed."
         );
        
         if (src == null) return null;
        
         src = src.trim();
        
         if (src.length() == 0) return null;
        
         String srcLC = src.toLowerCase();
        
         if (StrCmpr.startsWithXOR
                 (srcLC, "tel:", "javascript:", "mailto:", "magnet:", "file:", "ftp:", "#"))
             return new Ret2<URL, MalformedURLException>
                 (null, new MalformedURLException(
                     "InnerTag/Attribute begins with: " + src.substring(0, 1 + src.indexOf(":")) +
                     ", so it is not a hyper-link."
                 ));
        
        
         // Includes the first few characters of the URL - for reporting/convenience. 
         // If this is an "image", the image-type & name will be included
         if (StrCmpr.startsWithXOR(srcLC, "data:", "blob:"))
             return new Ret2<URL, MalformedURLException>(null, new MalformedURLException(
                 "InnerTag/Attribute begins with: " +
                 ((src.length() > 25) ? src.substring(0, 25) : src) +
                 ", not a URL."
             ));
        
        
         if (srcLC.startsWith("http://") || srcLC.startsWith("https://"))
             try
                 { return new Ret2<URL, MalformedURLException>(new URL(src), null); }
             catch (MalformedURLException e)
                 { return new Ret2<URL, MalformedURLException>(null, e); }
        
        
         if (src.startsWith("//") && (src.charAt(3) != '/'))
             try
             { 
                 return new Ret2<URL, MalformedURLException>
                     (new URL(  sourcePage.getProtocol().toLowerCase() + ":" + src), null);
             }
             catch (MalformedURLException e)
                 { return new Ret2<URL, MalformedURLException>(null, e); }
        
        
         if (src.startsWith("/"))
             try
             {
                 return new Ret2<URL, MalformedURLException>(new URL(
                     sourcePage.getProtocol().toLowerCase() + "://" +
                     sourcePage.getHost().toLowerCase() +
                     src), null
                 );
             }
             catch (MalformedURLException e)
                 { return new Ret2<URL, MalformedURLException>(null, e); }
        
        
         if (src.startsWith("../"))
         {
             String  sourcePageStr   = sourcePage.toString();
             short   nLevels         = 0;
        
             do
                 { nLevels++;  src = src.substring(3); }
             while (src.startsWith("../"));
        
             String  directory = StringParse.dotDotParentDirectory(sourcePage.toString(), nLevels);
        
             try
                 { return new Ret2<URL, MalformedURLException>(new URL(directory + src), null); }
             catch (MalformedURLException e)
                 { return new Ret2<URL, MalformedURLException>(null, e); }
             catch (Exception e)
             { 
                 return new Ret2<URL, MalformedURLException>
                     (null,
                     new MalformedURLException(e.getClass().getCanonicalName() +
                     ":" + e.getMessage())
                     );
             }
         }
        
        
         String  root    = sourcePage.getProtocol().toLowerCase() + "://" + 
                             sourcePage.getHost().toLowerCase();
         String  path    = sourcePage.getPath().trim();
         int     pos     = StringParse.findLastFrontSlashPos(path);
        
         if (pos == -1) throw new StringIndexOutOfBoundsException(
             "The URL you have provided: " + sourcePage.toString() +
             " does not have a '/' front-slash character in it's path." +
             "Cannot proceed resolving relative-URL's without this."
         );
        
         path = path.substring(0, pos + 1);
        
         try
             { return new Ret2<URL, MalformedURLException>(new URL(root + path + src), null); }
         catch (MalformedURLException e)
             { return new Ret2<URL, MalformedURLException>(null, e); }