Package Torello.HTML

Class Features


  • public class Features
    extends java.lang.Object
    Tools to retrieve and insert tags into the <HEAD> of a web-page.

    Replaceable Optimization's:
    Note that if updates are being performed, and if there are many, it is very likely more efficient to use the Replaceable interface to perform these page changes.

    The first thing to do is extract a SubSection instance (which is one of the classes that implements Replaceable). Retrieve a SubSection containing the HTML <HEAD> ... </HEAD> section.

    Once extracted, perform all the Header-Tag modifications using this class 'Features' to operate on SubSection.html performing all necessary modifications to the Web-Page's HEAD-Section.

    After all updates have been made, use the class ReplaceNodes to re-insert the previously extracted HEAD-Section back into the page, so that any/all node-shifts that have to occur only happen once! The example below demonstrates how this is done:

    Example:
    // Scrape any Page
    Vector<HTMLNode> page = HTMLPage.getPageTokens(new URL("http://some.url.com/page.html"), false);
    
    // IMPORTANT: By extracting a "SubSection", the next several lines which insert HTML into the
    //            header section **DO NOT** require shifting hundreds of HTML nodes forward to
    //            perform these inserts.  Only the nodes in the header are shifted forward during
    //            these insert operations.
    
    SubSection header = TagNodePeekInclusive.first(page, "HEAD");
    
    // Add Some HTML / CSS Header Elements
    Features.insertFavicon(header.html, "../../SiteLogo.png");
    Features.insertCSSLink(header.html, "../../MyCSSPage.css");
    Features.Meta.insertUTF8MetaTag(header.html);
    Features.Meta.insertKeyWords(header.html, "Java", "HTML", "Parse");
    
    // Re-Insert the header back into the main page.  The node-shifting that has to occur of this 
    // potentially very-large Web-Page will only happen once!
    
    page = ReplaceNodes.r(page, header);
    



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 10 Method(s), 10 declared static
    • 7 Field(s), 7 declared static, 7 declared final


    • Field Detail

      • NO_HEADER_MESSAGE

        🡇     🗕  🗗  🗖
        public static final java.lang.String NO_HEADER_MESSAGE
        Error Message that is used repeatedly.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final String NO_HEADER_MESSAGE =
                 "You are attempting to insert an HTML INSERT-STR, but such an element belongs in the " +
                 "page's header.  Unfortunately, the page or sub-page you have passed does not have a " +
                 "<HEAD>...</HEAD> sub-section.  Therefore, there is no place to insert the elements.";
        
      • favicon

        🡅  🡇     🗕  🗗  🗖
        public static final java.lang.String favicon
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a "logo-image" at the top-left corner of the Web-Browser's tab for the page when it loads. This logo is called a 'favicon'.
        See Also:
        insertFavicon(Vector, String), hasFavicon(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final String favicon =
                 "<LINK REL='icon' TYPE='image/INSERT-IMAGE-TYPE-HERE' HREF='INSERT-URL-STRING-HERE' />";
        
      • cssExternalSheet

        🡅  🡇     🗕  🗗  🗖
        public static final java.lang.String cssExternalSheet
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a Cascading Style Sheet (a '.css' file) to your page.

        The web-browser that ultimately loads the HTML that you are exporting will render the style elements across all the HTML elements in your page that match their respective CSS-Selectors. Without going into a big diatribe about how CSS works, just know that the String used to build / instantiate a new TagNode with an externally linked CSS-Page is provided here, by this field.
        See Also:
        insertCSSLink(Vector, String), getAllCSSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final String cssExternalSheet =
                 "<LINK REL=stylesheet TYPE='text/css' HREF='INSERT-URL-STRING-HERE' />";
        
      • javaScriptExternalPage

        🡅  🡇     🗕  🗗  🗖
        public static final java.lang.String javaScriptExternalPage
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add an externally-linked Java-Script File ('.js' File) to your page.

        The Web-Browser will download this Java-Script page from the URL that you ultimately provide and (hopefully) load all your variable definitions and methods when the page loads.

        Closing </SCRIPT> Tag:
        Inserting an external Java-Script Page has one important difference vis-a-vis inserting an external CSS-Page. Inserting a link to a '.js' page requires both the opening <SCRIPT ..> and the closing </SCRIPT> Tags.

        This is expected and required even-when / especially-when there is no actual java-script code being placed on the '.html' page itself. Effectively, regardless of whether you are putting actual java-script code into / inside your HTML page, or you are just inserting a link to a '.js' File on your server - you must always create both the open and the closed HTML <SCRIPT SRC='...'></SCRIPT> tags and insert them into your Vectorized-HTML Web-Page.

        In the brief example below, it should be clear that even though the SCRIPT-Tags do not enclose any Java-Script, both the open and the closed versions of the tag are placed into the HTML-File.

        HTML Elements:
         <!-- This is a short note about including the HTML SCRIPT element in your web-pages. -->
         <HTML>
         <HEAD>
         <!-- Version #1 Inserting a java-script 'variables & functions' external-page -->
         <SCRIPT TYPE='text/javascript' SRC='/script/javaScriptFiles/functions.js'>
         </SCRIPT>
         <!-- Right here (line above) we always need the closing Script-tag, even when there is no
              actual java-script present, and the methods/variables are going to be downloaded from
              the java-script file identified in by the SRC="..." attribute! --> 
        
         <SCRIPT TYPE='text/javascript'>
         var someVar1;
         var someVar2;
         
         function someFunction()
         { return;    }
         
         </SCRIPT> <!-- Either way, the closing-script tag is expected. -->
        
        See Also:
        insertExternalJavaScriptLink(Vector, String), getAllExternalJSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final String javaScriptExternalPage =
                 "<SCRIPT TYPE='text/javascript' SRC='INSERT-URL-STRING-HERE'>";
        
      • canonicalTag

        🡅  🡇     🗕  🗗  🗖
        public static final java.lang.String canonicalTag
        If you have pages on your site that are almost identical, then you may need to inform search engines which one to prioritize. Or you might have syndicated content on your site which was republished elsewhere. You can do both of these things without incurring a duplicate content penalty – as long as you use a CANONICAL-Tag.

        Instead of confusing Google and missing your ranking on the SERP's, you are guiding the crawlers as to which URL counts as the “main” one. This places the emphasis on the right URL and prevents the others from cannibalizing your SEO.

        Use CANONICAL-Tags to avoid having problems with duplicate content that may affect your rankings.



        The content of this Documentation Page was copied from a page on the web-domain 'http://searchenginewatch.com'. It was lifted on May 24th, 2019.

        See link below, if still valid:
        https://searchenginewatch.com/2018/04/04/a-quick-and-easy-guide-to-meta-tags-in-seo/
        See Also:
        insertCanonicalURL(Vector, String), hasCanonicalURL(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final String canonicalTag = 
                 "<LINK REL=canonical HREF='INSERT-URL-STRING-HERE' />";
        
    • Method Detail

      • checkForSingleQuote

        🡅  🡇     🗕  🗗  🗖
        protected static void checkForSingleQuote​(java.lang.String s)
        This method checks whether the String-Parameter 's' contains a Single-Quotations Punctuation-Mark anywhere inside that String. If so, a properly formatted exception is thrown. This is used as an internal Helper-Method.
        Parameters:
        s - This may be any Java String, but generally it is one used to insert into an HTML CONTENT-Attribute.
        Throws:
        QuotesException - If String-Parameter 's' contains any single-quotation marks.
        Code:
        Exact Method Body:
         int pos;
        
         if ((pos = s.indexOf("'")) != -1) throw new QuotesException(
             "The passed string-parameter may not contain a single-quote punctuation mark.  " +
             "Yours was: [" + s + "], and has a single-quotation mark at string-position " +
             "[" + pos + "]"
         );
        
      • insertFavicon

        🡅  🡇     🗕  🗗  🗖
        public static void insertFavicon​(java.util.Vector<HTMLNode> html,
                                         java.lang.String imageURLAsString)
        This inserts a favicon HTML link element into the right location so that a particular Web-Page will render an "browser icon image" into the top-left corner of the Web-Page's Browser-Tab.
        Parameters:
        html - Any Vectorized-HTML Web-Page, but it is important that this page contain an HTML <HEAD> ... </HEAD> section or area. If the passed Vectorized-HTML does not have a header, then this method will throw a NodeNotFoundException because whenever a <META>-Tag is inserted, it must be inserted into a page's HEAD-Section.
        imageURLAsString - This is the String that will be copied into the String-Field favicon, and subsequently used to build a new TagNode instance, and inserted into the HTML Page's HTML HEAD-Section.
        Throws:
        NodeNotFoundException - Throws if there is no HTML HEAD-Section. Specifically, if parameter 'html' doesn't have a <HEAD> ... </HEAD> element where the insertion would have to be performed, then this exception will throw.
        QuotesException - If String-Parameter 'imageURLAsString' contains any single-quotation marks.
        See Also:
        favicon, checkForSingleQuote(String)
        Code:
        Exact Method Body:
         // Insert the Favicon <LINK ...> element into the <HEAD> section of the input html page.
         // <link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(imageURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "favicon <LINK> element"));
        
         String ext = IF.getGuess(imageURLAsString).extension;
        
         if (ext == null) throw new IllegalArgumentException(
             "The Image-Type of the 'imageURLAsString' parameter could not be determined.  " +
             "The method IF.getGuess(faviconURL) returned null.  Please provide a favicon with " +
             "standard image file-type.  This is required because the image-type is required " +
             "to be placed inside the HTML <LINK TYPE=... HREF=...> Element 'TYPE' Attribute."
         );
        
         // Build a new Favicon TagNode.
         TagNode faviconTN = new TagNode
             ("<LINK REL='icon' TYPE='image/" + ext + "' HREF='" + imageURLAsString + "' />");
        
         // Insert the Favicon into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, faviconTN, NEWLINE);
        
      • hasFavicon

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String hasFavicon​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This method will search for an HTML <LINK REL="icon" ...> Tag, in hopes of finding a REL-Attribute whose value is 'icon'.

        When this method finds such a tag, it will return the value of that Tag's HREF-Attribute.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        Returns:
        This method will return the String-value of the HREF-Attribute found inside the LINK-Tag. If this page or sub-page does not have such a tag with an HREF-Attribute, then null is returned.

        NOTE: In the event that multiple copies of the HTML LINK-Tag are found, and more than one of these tags has a REL-Attribute with a value equal to "icon", then this method will simple return the first of the 'favicon' tags that were found.

        An (albeit erroneous) page, with multiple favicon definitions, will not cause this method to throw an exception.
        See Also:
        InnerTagGet, favicon, TagNode.AV(String)
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: <LINK rel="icon" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison.
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         Vector<TagNode> list = InnerTagGet.all
             (html, "LINK", "REL", TextComparitor.EQ_CI_TRM, "icon");
        
         // If there were no HTML "<LINK ...>" elements with REL='ICON' attributes, then
         // there was no favicon.
        
         if (list.size() == 0) return null;
        
         // Just in case there were multiple favicon <LINK ...> tags, just return the first
         // one found.  Inside of a <LINK REL="icon" HREF="..."> the 'HREF' Attribute contains
         // the Image-URL.  Use TagNode.AV("HREF") to retrieve that image url.
        
         String s;
         for (TagNode tn : list) if ((s = tn.AV("HREF")) != null) return s;
        
         // If for some reason, none of these <LINK REL='ICON' ...> elements had an "HREF" 
         // attribute, then just return null.
        
         return null;
        
      • insertCSSLink

        🡅  🡇     🗕  🗗  🗖
        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString)
        
        This inserts an HTML LINK-Tag into Web-Page parameter 'html' with the purpose of linking an externally-defined Cascading Style Sheet (also known as a CSS-Page) into that Page-Vector.
        Parameters:
        html - Any Vectorized-HTML Web-Page, but it is important that this page contain an HTML <HEAD> ... </HEAD> section or area. If the passed Vectorized-HTML does not have a header, then this method will throw a NodeNotFoundException because whenever a <META>-Tag is inserted, it must be inserted into a page's HEAD-Section.
        externalCSSFileURLAsString - This is the String that will be copied into the String-Field cssExternalSheet, and subsequently used to build a new TagNode instance, and inserted into the HTML Page's HTML HEAD-Section.
        Throws:
        NodeNotFoundException - Throws if there is no HTML HEAD-Section. Specifically, if parameter 'html' doesn't have a <HEAD> ... </HEAD> element where the insertion would have to be performed, then this exception will throw.
        QuotesException - If String-Parameter 'externalCSSFileURLAsString' contains any single-quotation marks.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair, TagNode
        Code:
        Exact Method Body:
         // Inserts an external CSS Link into the <HEAD> section of this html page vector
         // <link REL=stylesheet type='text/css' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS page <LINK> element")
         );
        
         TagNode cssTN = new TagNode
             ("<LINK REL=stylesheet TYPE='text/css' HREF='" + externalCSSFileURLAsString + "' />");
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header,
         // just after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • insertCSSLink

        🡅  🡇     🗕  🗗  🗖
        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString,
                     java.lang.String mediaInnerTagValue)
        
        This inserts a Cascading Style Sheet with the extra MEDIA-Attribute using an HTML LINK-Tag into the Vectorized-HTML Web-Page parameter 'html'
        Parameters:
        html - Any Vectorized-HTML Web-Page, but it is important that this page contain an HTML <HEAD> ... </HEAD> section or area. If the passed Vectorized-HTML does not have a header, then this method will throw a NodeNotFoundException because whenever a <META>-Tag is inserted, it must be inserted into a page's HEAD-Section.
        externalCSSFileURLAsString - This is the String that will be copied into the String-Field cssExternalSheet, and subsequently used to build a new TagNode instance, and inserted into the HTML Page's HTML HEAD-Section.
        mediaInnerTagValue - Externally linked CSS-Pages, which are included using the HTML LINK-Tag may explicitly request a MEDIA-Attribute be inserted into that Tag. That MEDIA-Attribute may take one of five values. In such a tag, the extra attribute specifies when the listed CSS-Rules are to be applied.

        Listed here are the most common values for the MEDIA-Attribute:
        Attribute Value Intended CSS Meaning
        screen indicates for use on a computer screen
        projection for projected presentations
        handheld for handheld devices (typically with small screens)
        print to style printed Web-Pages
        all (default value) This is what most people choose. You can leave off the MEDIA-Attribute completely if you want your styles to be applied for all media types.
        Throws:
        NodeNotFoundException - Throws if there is no HTML HEAD-Section. Specifically, if parameter 'html' doesn't have a <HEAD> ... </HEAD> element where the insertion would have to be performed, then this exception will throw.
        QuotesException - If either of the String-Parameter's 'externalCSSFileURLAsString' or 'mediaInnerTagValue' contain any single-quotation marks.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair
        Code:
        Exact Method Body:
         // Inserts an external CSS Link (with 'media' attribute) into the <HEAD> section of
         // this html page vector 
         // <link REL=stylesheet type='text/css' href='INSERT-URL-STRING-HERE'
         //      media='INSERT-MEDIA-ATTRIBUTE-VALUE-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
         checkForSingleQuote(mediaInnerTagValue);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "HEAD");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS Style-Sheet LINK-Tag")
         );
        
         // Build the TagNode
         TagNode cssTN   = new TagNode(
             "<LINK REL=stylesheet TYPE='text/css' HREF='" + externalCSSFileURLAsString + "' " +
             "MEDIA='" + mediaInnerTagValue + "' />"
         );
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • getAllCSSLinks

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Vector<TagNodegetAllCSSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will retrieve all linked CSS-Pages from Vectorized-HTML Web-Page parameter 'html'.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        Returns:
        This will return the links as a list of TagNode's'
        See Also:
        insertCSSLink(Vector, String), insertCSSLink(Vector, String, String), InnerTagGet
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: 
         //                  <LINK rel="stylesheet" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         return InnerTagGet.all(html, "LINK", "REL", TextComparitor.EQ_CI_TRM, "stylesheet");
        
      • insertExternalJavaScriptLink

        🡅  🡇     🗕  🗗  🗖
        public static void insertExternalJavaScriptLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalJSFileURLAsString)
        
        This inserts an HTML '<LINK ...>' element into the proper location for linking an externally-defined Java-Script (a '.js' File) into the Web-Page.
        Parameters:
        html - Any Vectorized-HTML Web-Page, but it is important that this page contain an HTML <HEAD> ... </HEAD> section or area. If the passed Vectorized-HTML does not have a header, then this method will throw a NodeNotFoundException because whenever a <META>-Tag is inserted, it must be inserted into a page's HEAD-Section.
        externalJSFileURLAsString - This is the String that will be copied into the String-Field javaScriptExternalPage, and subsequently used to build a new TagNode instance, and inserted into the HTML Page's HTML HEAD-Section.
        Throws:
        NodeNotFoundException - Throws if there is no HTML HEAD-Section. Specifically, if parameter 'html' doesn't have a <HEAD> ... </HEAD> element where the insertion would have to be performed, then this exception will throw.
        QuotesException - If String-Parameter 'externalJSFileURLAsString' contains any single-quotation marks.
        See Also:
        javaScriptExternalPage, getAllExternalJSLinks(Vector), checkForSingleQuote(String), TagNode, TextNode, DotPair, HTMLTags.hasTag(String, TC)
        Code:
        Exact Method Body:
         // Builds an external Java-Script link, and inserts it into the header portion of
         // this html page.
         // <script type='text/javascript' src='INSERT-URL-STRING-HERE'>
        
         checkForSingleQuote(externalJSFileURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "HEAD");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace(
                 "INSERT-STR", "externally-linked Java-Script <SCRIPT> ... </SCRIPT> elements")
         );
        
         // Build an HTML <SCRIPT ...> node, and a </SCRIPT> node.
         HTMLNode n = new TagNode
             ("<SCRIPT TYPE='text/javascript' SRC='" + externalJSFileURLAsString + "'>");
        
         HTMLNode closeN = HTMLTags.hasTag("script", TC.ClosingTags);
        
         // Insert the Java-Script link into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, n, closeN, NEWLINE);
        
      • getAllExternalJSLinks

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String[] getAllExternalJSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        Inserting Java-Script directly onto an HTML-Page and including an external link to a '.js' File are extremely similar tasks. Either way, in both cases the construct is simply:

        <SCRIPT TYPE='text/javascript'> ... </SCRIPT>

        When the actual functions and methods are pasted into an HTML-Page directly, they are pasted into the String above where the ellipses '...' are. When a link is made to an external page from a directory on the same Web-Server - both the open and the close HTML SCRIPT-Tag's must be included.

        If just a link is being added, then the text-content of the SCRIPT-Tag should just be left blank or empty. Instead, the URL to the Java-Script Page is added as an HTML SRC-Attribute.

        This method will retrieve any and all 'SCRIPT' nodes that meet the following criteria:

        1. The Script Body must be empty, meaning there is no Java-Script between the opening and closing SCRIPT-Tags
        2. The HTML SRC-Attribute must contain a non-null, non-zero-length value
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        Returns:
        This will return a list of relative URL's to externally linked Java-Script Pages as String's
        See Also:
        InnerTagGetInclusive, javaScriptExternalPage, insertExternalJavaScriptLink(Vector, String), TagNode, TextNode, TagNode.AV(String), HTMLNode.str
        Code:
        Exact Method Body:
         // InnerTagGetInclusive.all: Returns a vector of TagNode's that resemble:
         //                              <SCRIPT TYPE="javascript" ...>
         //
         // CN_CI: Check the 'rel' Attribute-Value using a Case-Insensitive, "Contains"
         //        String-Comparison
         //        'contains' rather than 'equals' testing is done because this value may be
         //        "javascript", but it may also be "text/javascript"
         //
         // Inclusive: This means that everything between the <SCRIPT type="javascript"> ... and
         //            the closing </SCRIPT> tag are returned in a vector of vectors.
        
         Vector<Vector<HTMLNode>> v = InnerTagGetInclusive.all
             (html, "SCRIPT", "TYPE", TextComparitor.CN_CI, "javascript");
        
         Stream.Builder<String> b = Stream.builder();
        
         TOP:
         for (Vector<HTMLNode> scriptSection : v)
         {
             String srcValue = null;
        
             for (HTMLNode n : scriptSection)
             {
                 if (n.isTagNode())
                     if ((srcValue = ((TagNode) n).AV("SRC")) != null)
                         break;
        
                 if (n.isTextNode())
                     if (n.str.trim().length() > 0)
                         break TOP;
             }
        
             b.add(srcValue);
         }
        
         return b.build().toArray(String[]::new);
        
      • insertCanonicalURL

        🡅  🡇     🗕  🗗  🗖
        public static void insertCanonicalURL​(java.util.Vector<HTMLNode> html,
                                              java.lang.String canonicalURLAsStr)
        This section will insert a Canonical-URL into Vectorized-HTML parameter 'html'. The URL itself will be inserted into an HTML LINK-Tag as below:

        <LINK REL=canonical HREF='the_url'>

        Since HTML mandates that such elements be located in the 'HEAD' portion of an HTML-Page, if the Vectorized-HTML parameter 'html' does not have a 'HEAD' area, then this method shall throw a NodeNotFoundException.

        Note that this exception is an unchecked / runtime exception.
        Parameters:
        html - Any Vectorized-HTML Web-Page, but it is important that this page contain an HTML <HEAD> ... </HEAD> section or area. If the passed Vectorized-HTML does not have a header, then this method will throw a NodeNotFoundException because whenever a <META>-Tag is inserted, it must be inserted into a page's HEAD-Section.
        canonicalURLAsStr - This is the String that will be copied into the String-Field canonicalTag, and subsequently used to build a new TagNode instance, and inserted into the HTML Page's HTML HEAD-Section.
        Throws:
        NodeNotFoundException - Throws if there is no HTML HEAD-Section. Specifically, if parameter 'html' doesn't have a <HEAD> ... </HEAD> element where the insertion would have to be performed, then this exception will throw.
        QuotesException - If String-Parameter 'canonicalURLAsStr' contains any single-quotation marks.
        See Also:
        canonicalTag, hasCanonicalURL(Vector), checkForSingleQuote(String), TagNode, DotPair
        Code:
        Exact Method Body:
         // Inserts a link element into the header of this page
         // <link REL=canonical href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(canonicalURLAsStr);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "HEAD");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Canonical-url LINK-Tag"));
        
         // Builds the canonical <LINK ...> element
         TagNode linkTN  = new TagNode
             ("<LINK REL=canonical HREF='" + canonicalURLAsStr + "' />");
        
         // Insert the canonical-url into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, linkTN, NEWLINE);
        
      • hasCanonicalURL

        🡅     🗕  🗗  🗖
        public static java.lang.String hasCanonicalURL​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This method will check whether a Vectorized-HTML Page has an HTML <LINK REL=canonical ...> Tag. This tag is used to inform Search-Engines whether or not this page surrenders or relays to a "Canonical-URL".

        Canonical-Pages help Search-Engines index large web-sites by providing a root or Master-URL to which all sub-pages may point. Such URL's are often (but not always) like a "Table of Contents".

        The primary goal of having a canonical is to avoid forcing Search-Engines (and their users) from sifting through and indexing every page of a large Web-Site, and instead focusing on either an introductory T.O.C. or a Title-Page.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        Returns:
        This will return whatever text was placed inside the canonical-url HREF='some_url' attribute/value pair of the HTML link tag. If there were no HTML <LINK REL=canonical HREF='some_url'> tag, then this method will return null.
        Throws:
        MalformedHTMLException - This exception will be thrown if there are multiple html tags that match the link, and REL=canonical search criteria requirements. If an HTML element <link REL=canonical> is found, but that element does not have an href='...' attribute, or that attribute is of zero length, then this a situation that will also force this exception to throw.
        See Also:
        InnerTagGet, canonicalTag, insertCanonicalURL(Vector, String), TagNode.AV(String)
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         //                  <LINK rel="canonical" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         Vector<TagNode> v = InnerTagGet.all
             (html, "LINK", "REL", TextComparitor.EQ_CI_TRM, "canonical");
        
         if (v.size() == 0) return null;
        
         if (v.size() > 1) throw new MalformedHTMLException(
             "The Web-Page you have passed has precisely " + v.size() +
             " Canonical-URL LINK-Tags, but it may not have more than 1.  This is " +
             "invalid HTML."
         );
        
         String s = v.elementAt(0).AV("href");
        
         if (s == null) throw new MalformedHTMLException(
             "The HTML LINK-Tag that was retrieved, contained a " +
             "REL=canonical Attribute-Value pair, but did not have an HREF-Attribute." +
             "This is invalid HTML."
         );
        
         if (s.length() == 0) throw new MalformedHTMLException(
             "The HTML LINK-Tag that was retrieved contained a zero-length " +
             "String as the Attribute-Value for the HREF-Attribute.   This is not " +
             "invalid, but poorly formatted HTML."
         );
        
         return s;