Package Torello.HTML

Class Features


  • public class Features
    extends java.lang.Object
    Tools to retrieve and insert tags into the <HEAD> of a web-page.

    Simple Tools for Retrieving and Updating Page-Header Pieces

    Note that if updates are being performed, and if there are many, it is more efficient to extract a SubSection, and update that SubSection - and re-insert it back into the page afterwards.

    Example:
    // Scrape any Page
    Vector<HTMLNode> page = HTMLPage.getPageTokens(new URL("http://some.url.com/page.html"), false);
    
    // IMPORTANT: By extracting a "SubSection", the next several lines which insert HTML into the
    //            header section **DO NOT** require shifting hundreds of HTML nodes forward to
    //            perform these inserts.  Only the nodes in the header are shifted forward during
    //            these insert operations.
    
    SubSection header = TagNodePeekInclusive.first(page, "head");
    
    // Add Some HTML / CSS Header Elements
    Features.insertFavicon(header.html, "../../SiteLogo.png");
    Features.insertCSSLink(header.html, "../../MyCSSPage.css");
    Features.Meta.insertUTF8MetaTag(header.html);
    Features.Meta.insertKeyWords(header.html, "Java", "HTML", "Parse");
    
    // Re-Insert the header back into the main page.  Shifting all of the nodes on the main-page
    // is now done only once!
    
    ReplaceNodes.r(page, header);
    

    This class handles some of the extremely common features found in HTML web-pages. The collection of capabilities listed here are sometimes referred to as "SEO" or "Search Engine Optimization." Generally, the features do not actually work as well as Search Engine Companies would like you to believe. Sure, there are tags SEO tags for companies that cater to very specialized, and very "niche markets." If you have a website that sells cup-cakes in Dallas, and you specialize in cup-cakes, your SEO settings will probably work all-right - probably!

    If you have decided to write a Java-Based HTML Search Engine, and would like your Java Libraries to be ranked at the top of search-engine requests any-time a user types the words "Java and HTML" into a browser, there is not a lot SEO will be able to do for you - not even using the features in this "Features" class!


Stateless Class: This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member field. It is very similar to the Java-Bean @Stateless Annotation.
  • 1 Constructor(s), 1 declared private, zero-argument constructor
  • 10 Method(s), 10 declared static
  • 7 Field(s), 7 declared static, 7 declared final


    • Field Detail

      • NO_HEADER_MESSAGE

        🡇    
        public static final java.lang.String NO_HEADER_MESSAGE
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final String NO_HEADER_MESSAGE =
                "You are attempting to insert an HTML INSERT-STR, but such an element belongs in the " +
                "page's header.  Unfortunately, the page or sub-page you have passed does not have a " +
                "<HEAD>...</HEAD> sub-section.  Therefore, there is no place to insert the elements.";
        
      • favicon

        🡅  🡇    
        public static final java.lang.String favicon
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a "logo-image" at the top-left corner of the web-browser's tab for the page when it loads. Some people call this a 'favicon'. I mean I don't usually call it that, but I guess here it's going to be a 'favicon.'
        See Also:
        insertFavicon(Vector, String), hasFavicon(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final String favicon =
                "<link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />";
        
      • cssExternalSheet

        🡅  🡇    
        public static final java.lang.String cssExternalSheet
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a Cascading Style Sheet (a ".css" file) to your page. The web-browser that ultimately loads the HTML that you are exporting will render the style elements across all the HTML elements in your page that match the CSS selectors. Without going into a diatribe about how CSS works, instead, the String that is ultimately instantiated as a TagNode is provided here.
        See Also:
        insertCSSLink(Vector, String), getAllCSSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final String cssExternalSheet =
                "<link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' />";
        
      • javaScriptExternalPage

        🡅  🡇    
        public static final java.lang.String javaScriptExternalPage
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a Java-Script '.js' file. The web-browser will download this Java-Script page from the URL that you ultimately provide and load all variable definitions, and dispatch to any methods that are invoked by the event-handlers when user or operating system events are fired.

        IMPORTANT NOTE: Inserting an external java-script page has one important difference vis-a-vis inserting an external CSS page. Inserting a link to a '.js' page requires both the opened and the closed HTML <SCRIPT ..></SCRIPT> tag-elements. This is expected and required even-when / especially-when there is no actual java-script code being placed on the '.html' page itself. Effectively, regardless of whether you are putting actual java-script code on your HTML page, or just inserting a link knowing the browser will download the external '.js' file for you, you still must create an both the open and the closed HTML <SCRIPT SRC='...'></SCRIPT> elements and insert them into your vectorized-html web-page.

        HTML Elements:
         <!-- This is a short note about including the HTML SCRIPT element in your web-pages. -->
         <HTML>
         <HEAD>
         <!-- Version #1 Inserting a java-script 'variables & functions' external-page -->
         <SCRIPT TYPE='text/javascript' SRC='/script/javaScriptFiles/functions.js'>
         </SCRIPT>
         <!-- Right here (line above) we always need the closing Script-tag, even when there is no
              actual java-script present, and the methods/variables are going to be downloaded from
              the java-script file identified in by the SRC="..." attribute! --> 
        
         <SCRIPT TYPE='text/javascript'>
         var someVar1;
         var someVar2;
         
         function someFunction()
         { return;    }
         
         </SCRIPT> <!-- Either way, the closing-script tag is expected. -->
        
        See Also:
        insertExternalJavaScriptLink(Vector, String), getAllExternalJSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final String javaScriptExternalPage =
                "<script type='text/javascript' src='INSERT-URL-STRING-HERE'>";
        
      • canonicalTag

        🡅  🡇    
        public static final java.lang.String canonicalTag
        If you have pages on your site that are almost identical, then you may need to inform search engines which one to prioritize. Or you might have syndicated content on your site which was republished elsewhere. You can do both of these things without incurring a duplicate content penalty – as long as you use a canonical tag.

        Instead of confusing Google and missing your ranking on the SERP's, you are guiding the crawlers as to which URL counts as the “main” one. This places the emphasis on the right URL and prevents the others from cannibalizing your SEO.

        Use canonical tags to avoid having problems with duplicate content that may affect your rankings.

        NOTE: Content of this java-documentation description was copied from a page on web-domain 'http://searchenginewatch.com'. It was lifted on May 24th, 2019. See link below, if still valid:
        https://searchenginewatch.com/2018/04/04/a-quick-and-easy-guide-to-meta-tags-in-seo/
        See Also:
        insertCanonicalURL(Vector, String), hasCanonicalURL(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final String canonicalTag = 
                "<link rel='canonical' href='INSERT-URL-STRING-HERE' />";
        
      • NEWLINE

        🡅  🡇    
        protected static final TextNode NEWLINE
        This is a new-line HTMLNode
        Code:
        Exact Field Declaration Expression:
        protected static final TextNode NEWLINE = new TextNode("\n");
        
    • Method Detail

      • checkForSingleQuote

        🡅  🡇    
        protected static void checkForSingleQuote​(java.lang.String s)
        This method checks whether the parameter string contains a single-quotations punctuation-mark anywhere in the String. If so, an exception is thrown. This is generally an internal-helper method.
        Parameters:
        s - This is any java-string, but generally it is one used to insert into an HTML 'content' attribute.
        Throws:
        QuotesException - If the passed parameter string contains any instance of single-quotation.
        Code:
        Exact Method Body:
         int pos;
        
         if ((pos = s.indexOf("'")) != -1) throw new QuotesException(
             "The passed string-parameter may not contain a single-quote punctuation mark.  " +
             "Yours was: [" + s + "], and has a single-quotation mark at string-position " +
             "[" + pos + "]"
         );
        
      • insertFavicon

        🡅  🡇    
        public static void insertFavicon​(java.util.Vector<HTMLNode> html,
                                         java.lang.String imageURLAsString)
        This inserts a favicon HTML link element into the right location so that a particular web-page will render an "browser icon image" in the page's browser-tab left corner when the page loads into a browser.
        Parameters:
        html - The vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        imageURLAsString - This is the String that will be copied into the public static final String 'favicon' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='icon' href='image_url'> would have to be inserted.
        QuotesException - If the image URL uses a single-quote mark, anywhere in the URL-string.
        See Also:
        favicon, checkForSingleQuote(String)
        Code:
        Exact Method Body:
         // Insert the Favicon <LINK ...> element into the <HEAD> section of the input html page.
         // <link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(imageURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "favicon <LINK> element"));
        
         String ext = IF.getGuess(imageURLAsString).extension;
        
         if (ext == null) throw new IllegalArgumentException(
             "The Image-Type of the 'imageURLAsString' parameter could not be determined.  " +
             "The method IF.getGuess(faviconURL) returned null.  Please provide a favicon with " +
             "standard image file-type.  This is required because the image-type is required " +
             "to be placed inside the HTML <LINK TYPE=... HREF=...> Element 'TYPE' Attribute."
         );
        
         // Build a new Favicon TagNode.
         TagNode faviconTN = new TagNode(
             favicon
                 .replace("INSERT-URL-STRING-HERE", imageURLAsString)
                 .replace("INSERT-IMAGE-TYPE-HERE", ext)
         );
        
         // Insert the Favicon into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, faviconTN, NEWLINE);
        
      • hasFavicon

        🡅  🡇    
        public static java.lang.String hasFavicon​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This method will search for an HTML <LINK REL="icon" ...> element, specifically expecting the link element to contain an inner-tag / attribute name 'REL' whose value is 'icon'. If it finds one, it will return the value of the other attribute named 'HREF=...'.
        Parameters:
        html - Any html page, but preferably one that contains a <LINK REL="icon" ...> element.
        Returns:
        This method will return the String value of the 'HREF=...' attribute found inside the '<LINK>' element, if this page or sub-page has such an element, with such an attribute. If there are no LINK elements found on this page, then 'null' will be returned.

        NOTE: In the event that multiple copies of the HTML <LINK> element are found, and more than one has a 'REL' attribute whose value contains the String 'icon', this method will just return the first value it finds.
        See Also:
        InnerTagGet, favicon, TagNode.AV(String)
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: <LINK rel="icon" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison.
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         Vector<TagNode> list = InnerTagGet.all
             (html, "link", "rel", TextComparitor.EQ_CI_TRM, "icon");
        
         // If there were no HTML "<LINK ...>" elements with REL='ICON' attributes, then
         // there was no favicon.
        
         if (list.size() == 0) return null;
        
         // Just in case there were multiple favicon <LINK ...> tags, just return the first
         // one found.  Inside of a <LINK REL="icon" HREF="..."> the 'HREF' Attribute contains
         // the Image-URL.  Use TagNode.AV("HREF") to retrieve that image url.
        
         String s;
         for (TagNode tn : list) if ((s = tn.AV("href")) != null) return s;
        
         // If for some reason, none of these <LINK REL='ICON' ...> elements had an "HREF" 
         // attribute, then just return null.
        
         return null;
        
      • insertCSSLink

        🡅  🡇    
        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined Cascading Style Sheet '.css' page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalCSSFileURLAsString - This is the String that will be copied into the public static final String 'cssExternalSheet' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='stylesheet' type='text/css' href='local-url/someFile.css' /> would have to be inserted.
        QuotesException - If the CSS-sheet URL uses a single-quote mark, anywhere in the URL-string.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair, TagNode
        Code:
        Exact Method Body:
         // Inserts an external CSS Link into the <HEAD> section of this html page vector
         // <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS page <LINK> element")
         );
        
         TagNode cssTN   = new TagNode
             (cssExternalSheet.replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString));
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header,
         // just after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • insertCSSLink

        🡅  🡇    
        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString,
                     java.lang.String mediaInnerTagValue)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined Cascading Style Sheet '.css' page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalCSSFileURLAsString - This is the String that will be copied into the public static final String 'cssExternalSheet' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        mediaInnerTagValue - Externally linked CSS pages, which are included using the HTML <LINK ...> element may explicitly request a 'media' attribute be inserted into the link element. That media attribute may take one of five values. The media attribute in a link tag specifies when the CSS rules are to be applied.

        Here are the most common values for attribute 'media,' below:

        Attribute ValueIntended CSS Meaning
        screenindicates for use on a computer screen.
        projectionfor projected presentations.
        handheldfor handheld devices (typically with small screens).
        printto style printed web pages.
        all(default value) This is what most people choose. You can leave off the media attribute completely if you want your styles to be applied for all media types.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link ... type='text/css' rel="stylesheet" href="local-url/someFile.css" media="some-media"> node would have to be inserted.
        QuotesException - If the CSS-sheet URL, or the media-tag, use a single-quote mark, anywhere inside the String's.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair, TagNode
        Code:
        Exact Method Body:
         // Inserts an external CSS Link (with 'media' attribute) into the <HEAD> section of
         // this html page vector 
         // <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE'
         //      media='INSERT-MEDIA-ATTRIBUTE-VALUE-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
         checkForSingleQuote(mediaInnerTagValue);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS page <LINK> element")
         );
        
         // Build the TagNode
         TagNode cssTN   = new TagNode(
             cssExternalSheetWithMediaAttribute
                 .replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString)
                 .replace("INSERT-MEDIA-ATTRIBUTE-VALUE-HERE", mediaInnerTagValue)
         );
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • getAllCSSLinks

        🡅  🡇    
        public static java.util.Vector<TagNodegetAllCSSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will retrieve all linked CSS pages from a vectorized-html web-page.
        Parameters:
        html - This may be any vectorized-html web-page.
        Returns:
        This will return the links as a list of TagNode
        See Also:
        insertCSSLink(Vector, String), insertCSSLink(Vector, String, String), InnerTagGet
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: 
         //                  <LINK rel="stylesheet" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         return InnerTagGet.all(html, "link", "rel", TextComparitor.EQ_CI_TRM, "stylesheet");
        
      • insertExternalJavaScriptLink

        🡅  🡇    
        public static void insertExternalJavaScriptLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalJSFileURLAsString)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined java-script '.js' file-page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalJSFileURLAsString - This is the String that will be copied into the public static final String 'javaScriptExternalPage' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <SCRIPT ... SRC="local-url/someScriptFile.js"></SCRIPT> nodes would have to be inserted.
        QuotesException - If the java-script page-URL uses a single-quote mark, anywhere in the url-string.
        See Also:
        javaScriptExternalPage, getAllExternalJSLinks(Vector), checkForSingleQuote(String), TagNode, TextNode, DotPair, HTMLTags.hasTag(String, TC)
        Code:
        Exact Method Body:
         // Builds an external Java-Script link, and inserts it into the header portion of
         // this html page.
         // <script type='text/javascript' src='INSERT-URL-STRING-HERE'>
        
         checkForSingleQuote(externalJSFileURLAsString);
        
         // Build an HTML <SCRIPT ...> node, and a </SCRIPT> node.
         HTMLNode n = new TagNode(javaScriptExternalPage.replace
                         ("INSERT-URL-STRING-HERE", externalJSFileURLAsString));
        
         HTMLNode closeN = HTMLTags.hasTag("script", TC.ClosingTags);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace(
                 "INSERT-STR", "externally-linked Java-Script <SCRIPT> ... </SCRIPT> elements")
         );
        
         // Insert the Java-Script link into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, n, closeN, NEWLINE);
        
      • getAllExternalJSLinks

        🡅  🡇    
        public static java.lang.String[] getAllExternalJSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        First, inserting java-script directly onto an HTML page and including an external link to a '.js' file page are extremely similar tasks. The construct is simply: <SCRIPT TYPE='text/javascript'> ... </SCRIPT> either way! When the actual functions and methods are pasted into the HTML page directly, they are pasted exactly where the ellipses '...' are listed in the HTML code noted previously. When a link is made to an external page from the same server directory... (See: 'linking external java-script pages from the same host' at the Google Search Bar), when linking pages - both the open and close <SCRIPT> ... </SCRIPT> tag-elements must be included, while the text-content in place of the ellipses '...' should just be left blank. The URL to the java-script page is included in the src='js_file_url' inner-tag value.

        This page will retrieve any and all script nodes that meet these criteria:

        1. The "script body" must be empty, meaning there is no java-script between the open and close script-tags
        2. The src='' attribute must contain some non-null, non-zero-length value
        Parameters:
        html - This is any vectorized-html web-page.
        Returns:
        This will return a list of relative URL's to externally linked java-script pages as Strings.
        See Also:
        InnerTagGetInclusive, javaScriptExternalPage, insertExternalJavaScriptLink(Vector, String), TagNode, TextNode, TagNode.AV(String), HTMLNode.str
        Code:
        Exact Method Body:
         // InnerTagGetInclusive.all: Returns a vector of TagNode's that resemble:
         //                              <SCRIPT TYPE="javascript" ...>
         //
         // CN_CI: Check the 'rel' Attribute-Value using a Case-Insensitive, "Contains"
         //        String-Comparison
         //        'contains' rather than 'equals' testing is done because this value may be
         //        "javascript", but it may also be "text/javascript"
         //
         // Inclusive: This means that everything between the <SCRIPT type="javascript"> ... and
         //            the closing </SCRIPT> tag are returned in a vector of vectors.
        
         Vector<Vector<HTMLNode>> v = InnerTagGetInclusive.all
             (html, "script", "type", TextComparitor.CN_CI, "javascript");
        
         Stream.Builder<String> b = Stream.builder();
        
         TOP:
         for (Vector<HTMLNode> scriptSection : v)
         {
             String srcValue=null;
             for (HTMLNode n : scriptSection)
             {
                 if (n.isTagNode())
                     if ((srcValue = ((TagNode) n).AV("src")) != null)
                         break;
        
                 if (n.isTextNode())
                     if (n.str.trim().length() > 0)
                         break TOP;
             }
             b.add(srcValue);
         }
        
         return b.build().toArray(String[]::new);
        
      • insertCanonicalURL

        🡅  🡇    
        public static void insertCanonicalURL​(java.util.Vector<HTMLNode> html,
                                              java.lang.String canonicalURLAsStr)
        This section will insert a "canonical url" into the web-page passed via the html-parameter. This canonical-url will be placed in an html <link rel='canonical' href='the_url'> element. This element must be placed in the head section of the passed html page, and if the vectorized-html page that was passed does not contain a 'header' section, a 'NodeNotFoundException' (a run-time / unchecked Exception) will be thrown.
        Parameters:
        html - This is any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        canonicalURLAsStr - This text-String' will be substituted for the 'href' attribute-value inside an HTML <LINK> element, and then inserted into the vectorized-page at the top of the <HEAD>...</HEAD> section.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='canonical' href='canonical_url'> would have to be inserted.
        QuotesException - If the canonical-page URL uses a single-quote mark, anywhere in the url-string.
        See Also:
        canonicalTag, hasCanonicalURL(Vector), checkForSingleQuote(String), TagNode, DotPair
        Code:
        Exact Method Body:
         // Inserts a link element into the header of this page
         // <link rel='canonical' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(canonicalURLAsStr);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Canonical-url <LINK> element"));
        
         // Builds the canonical <LINK ...> element
         TagNode linkTN  = new TagNode
             (canonicalTag.replace("INSERT-URL-STRING-HERE", canonicalURLAsStr));
        
         // Insert the canonical-url into the page.  Put it at the top of the header, just
         // after <HEAD>
        
         Util.insertNodes(html, header.start + 1, NEWLINE, linkTN, NEWLINE);
        
      • hasCanonicalURL

        🡅    
        public static java.lang.String hasCanonicalURL​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This method will check whether a vectorized-html page has an HTML <LINK REL='canonical' ...> tag informing search-engines whether or not the page indicates there is a "Canonical URL" available that may be visited when trying to index a web-site with many pages and sub-pages. Canonical URL's are similar to a top-level "table of contents" web-page that allow the search-engine to avoiding sifting through thousands of sub-pages, and trying to index all of them.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        This will return whatever text was placed inside the canonical-url HREF='some_url' attribute/value pair of the HTML link tag. If there were no HTML <LINK REL='canonical' HREF='some_url'> tag, then this method will return null.
        Throws:
        MalformedHTMLException - This exception will be thrown if there are multiple html tags that match the link, and rel='canonical' search criteria requirements. If an HTML element <link rel='canonical'> is found, but that element does not have an href='...' attribute, or that attribute is of zero length, then this a situation that will also force this exception to throw.
        See Also:
        InnerTagGet, canonicalTag, insertCanonicalURL(Vector, String), TagNode.AV(String)
        Code:
        Exact Method Body:
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         //                  <LINK rel="canonical" ...>
         //
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         Vector<TagNode> v = InnerTagGet.all
             (html, "link", "rel", TextComparitor.EQ_CI_TRM, "canonical");
        
         if (v.size() == 0) return null;
        
         if (v.size() > 1) throw new MalformedHTMLException(
             "The web-page you have passed has precisely " + v.size() +
             " canonical-url link elements, but it may not have more than 1.  This is " +
             "invalid HTML."
         );
        
         String s = v.elementAt(0).AV("href");
        
         if (s == null) throw new MalformedHTMLException(
             "The HTML link element that was retrieved, contained a " +
             "rel='canonical' inner-tag / value pair, but did not have an href='...' " +
             "attribute.  This is invalid HTML."
         );
        
         if (s.length() == 0) throw new MalformedHTMLException(
             "The HTML link element that was retrieved contained a zero-length " +
             "string as an attribute-value for the href='...' attribute.   This is not " +
             "invalid, but poorly formatted HTML."
         );
        
         return s;