Package Torello.HTML

Class Balance


  • public class Balance
    extends java.lang.Object
    Utilities for checking that opening and closing TagNode elements match up (that the HTML is balanced).

    This class provides for one inspecting one particular aspect of HTML validity, that of properly balanced Opening & Closing Tags. Try to realize that these sorts of checks are not perfectly constructed. For instance in the case of this class, most Web-Browsers (as of the writing of this Java-Doc Explanation) do not even require that all HTML-Tags actually be closed. Even a page with opened-but-not-closed <DIV> will often display just fine in Google Chrome.

    There are plenty of pages that will use opening <LI> tags inside an Ordered-List (a <OL>). This is also sometimes acceptible practice for <TR> and even >TD> tags.

    Balance Heuristic:
    All of the methods below generate an "Open and Closed Count" per page, for each HTML-Tag that the user has requested by counted. As an example, an HTML-Page, or page sub-section, containing 5 opening <DIV> tags, but only 4 closing <DIV> Tags would have Tag-Balance count of +1. This may seem somewhat trivial at first. Generally, how complicated it can be to hunt down "bugs" on HTML will hopefully make it clear that that tools that find such mistakes in HTML can be invaluable.

    Balance Tag-Count Examples:
    A Tag-Balance Count of '0' means that on the page-provided, there is exactly one closing-tag for each and every opening-tag on the page. A few sample return values are provided in the table below. Remember, the primary 'balance' method in this class computes a count for each and every HTML-Tag present on the page. Usuallly tags with a '0' Count are removed from the result, and only non-zero tags are mentioned in the returned Hashtable.

    Here are a few sample counts for some common HTML-Tags:

    Tag-Balance Count Meaning
    TD: -1 There is an "extra" closing Table-Data Cell on the page. Specifically there is one more </TD> tag than needed.
    DIV: 0 The number of <DIV> Tags is precisely equal to the number </DIV> tags on the page provided
    B: +1 Somewhere on the page there is an opening Bold-Tag (a <B> Tag), that isn't actually closed by a </B> anywhere.
    TR: +2 There are two Table-Rows whose opening <TR> tags aren't closed.





    Depth Heuristic:
    The 'depth' methods use a different approach to investigating HTMl-Tag validity. The 'depth' of an HTML-Tag is simply the level of nesting for any particular tag. The Maximum-Depth of an HTML-Tag is the deepest number nested tags that are present on a page. A page whose <DIV> tag has a Maximum-Depth of +3 is a page where (in at least one location) there are three levels of nested-dividers.

    The Minimum-Depth of a tag for any given page should usually be zero. A page that contains tags which have a negative-depth are considered invalid. A negative depth means that at some point there was a Closing-Tag place without there being any opening tag.



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 16 Method(s), 16 declared static
    • 0 Field(s)


    • Method Summary

       
      Balanced Open-Closed HTML Tag Checks
      Modifier and Type Method
      static Hashtable<String,​Integer> check​(Vector<? super TagNode> html)
      static int[] check​(Vector<? super TagNode> html, String... htmlTags)
      static Hashtable<String,​Integer> checkNonZero​(Hashtable<String,​Integer> ht)
      static int checkTag​(Vector<? super TagNode> html, String htmlTag)
      static int[] nonNestedCheck​(Vector<? super TagNode> html, String htmlTag)
       
      Balanced Open-Closed Tag Check & Print to String
      Modifier and Type Method
      static String CB​(Vector<HTMLNode> html)
       
      Nested HTML Tag Checks
      Modifier and Type Method
      static Hashtable<String,​int[]> depth​(Vector<? super TagNode> html)
      static Hashtable<String,​int[]> depth​(Vector<? super TagNode> html, String... htmlTags)
      static Hashtable<String,​int[]> depthGreaterThanOne​(Hashtable<String,​int[]> ht)
      static Hashtable<String,​int[]> depthInvalid​(Hashtable<String,​int[]> ht)
      static int[] depthTag​(Vector<? super TagNode> html, String htmlTag)
      static Ret2<int[],​int[]> locationsAndDepth​(Vector<? super TagNode> html, String htmlTag)
       
      Printing Tag-Check Hashtables
      Modifier and Type Method
      static String toStringBalance​(int[] balanceCheckReport, String... htmlTags)
      static String toStringBalance​(Hashtable<String,​Integer> balanceCheckReport)
      static String toStringDepth​(Hashtable<String,​int[]> depthReport)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • CB

        🡇     🗕  🗗  🗖
        public static java.lang.String CB​(java.util.Vector<HTMLNode> html)
        Invokes:



        Example:
         String b = Balance.CB(a.articleBody);
         System.out.println((b == null) ? "Page has Balanced HTML" : b);
         
         // If Page has equal number of open and close tags prints:
         // Page Has Balanced HTML
         // OTHERWISE PRINTS REPORT
        
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        Will return null if the snippet or page has 'balanced' HTML, otherwise returns the trimmed balance-report as a String.
        Code:
        Exact Method Body:
         String ret = toStringBalance(checkNonZero(check(html)));
        
         return (ret.length() == 0) ? null : ret;
        
      • check

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​java.lang.Integer> check​
                    (java.util.Vector<? super TagNode> html)
        
        Creates a Hashtable that has a count of all open and closed HTML tags found on the page.

        This Hashtable may be regarded as maintaining "counts" on each-and-every HTML tag to identify whether there is a one-to-one balance mapping between opening and closing tags for each element. When the Hashtable generated by this method is non-zero (for a particular HTML-Tag) it means that there are an unequal number of opening and closing elements for that tag.

        Suppose this method were to produce a Hashtable, and that Hashtable queried for a count on the HTML <DIV> tag (dividers). If that count turned out to be a non-zero positive number it would mean that the Vectorized-HTML had more opening <DIV> tags than the number of closing </DIV> tags on that page.

        Browser Validity:
        There are some browser-parse advocates who may state that not all HTML Tags have to be closed. For instance, there are plenty of page out there that won't always use a '</LI>' Tag for elements of an Ordered or Un-Ordered List.

        These types of subtle nuances hint at the commonly-heard phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.

        The following example will help explain the use of this method. If an HTML page needs to be checked to see that all elements are properly opened and closed, this method can be used to return a list of any HTML element tag that does not have an equal number of opening and closing tags.

        In this example, the generated Java-Doc HTML-Page for class TagNode is checked.

        Example:
         String                      html    = FileRW.loadFileToString(htmlFileName);
         Vector<HTMLNode>            v       = HTMLPage.getPageTokens(html, false);
         Hashtable<String, Integer>  b       = Balance.check(v);
         StringBuffer                sb      = new StringBuffer();
        
         // This part just prints a text-output to a string buffer, which is printed to the screen.
         for (String key : b.keySet())
         {
             Integer i = b.get(key);
         
             // Only print keys that had a "non-zero count"
             // A Non-Zero-Count implies Opening-Tag-Count and Closing-Tag-Count are not equal!
         
             if (i.intValue() != 0) sb.append(key + "\t" + i.intValue() + "\n");
         }
         
         // This example output was: "i   -1", because of an unclosed italics element.
         // NOTE: To find where this unclosed element is, use method: nonNestedCheck(Vector, String)
        
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        A Hashtable map of the count of each HTML-Tag present in the input Vector.

        For instance, if this Vector had five <A HREF=...> (Anchor-Link) tags, and six </A> tags, then the returned Hashtable would have a String-key equal to "A" with an integer value of -1.
        See Also:
        FileRW.loadFileToString(String), HTMLPage.getPageTokens(CharSequence, boolean)
        Code:
        Exact Method Body:
         Hashtable<String, Integer> ht = new Hashtable<>();
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
        
         for (Object o : html) if (o instanceof TagNode)
         {
             TagNode tn = (TagNode) o;
        
             // Singleton tags are also known as 'self-closing' tags.  BR, HR, IMG, etc...
             if (HTMLTags.isSingleton(tn.tok)) continue;
        
             Integer I = ht.get(tn.tok);
             int     i = (I != null) ? I.intValue() : 0;
        
             // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
             // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             i += tn.isClosing ? -1 : 1;
        
             // Update the return result Hashtable for this particular HTML-Element (tn.tok)
             ht.put(tn.tok, Integer.valueOf(i));
         }
        
         return ht;
        
      • check

        🡅  🡇     🗕  🗗  🗖
        public static int[] check​(java.util.Vector<? super TagNode> html,
                                  java.lang.String... htmlTags)
        Creates an array that includes an open-and-close 'count' for each HTML-Tag / that was requested via the passed input String[]-Array parameter 'htmlTags'.

        Browser Validity:
        There are some browser-parse advocates who may state that not all HTML Tags have to be closed. For instance, there are plenty of page out there that won't always use a '</LI>' Tag for elements of an Ordered or Un-Ordered List.

        These types of subtle nuances hint at the commonly-heard phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.

        The HTML-Element Open-Close-Counts are computed from this page.
        htmlTags - This may be one, or many, HTML-Tags whose open-close count needs to be computed. Any HTML Element that is not present in this list - will not have a count computed.

        The count results which are stored in an int[]-Array that should be considered "parallel" to this input Var-Args-Array.
        Returns:
        An array of the count of each html-element present in the input vectorized-html parameter 'html'. For instance, If the following values were passed to this method:

        • A Vectorized-HTML page that had 5 '<SPAN ...>' open-elements, and 6 '</SPAN>' closing SPAN-Tags.
        • And at least one of the String's in the Var-Args parameter 'htmlTags' was equal to the String "SPAN" (case insensitive).
        • ==> Then the array-position corresponding to the position in array 'htmlTags' that had the "SPAN" would have a value of '-1'.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If and of the String-Tags passed to parameter 'htmlTags' are 'singleton' (Self-Closing) Tags, then this exception throws
        Code:
        Exact Method Body:
         // Check that these are all valid HTML Tags, throw an exception if not.
         htmlTags = ARGCHECK.htmlTags(htmlTags);
        
         // Temporary Hash-table, used to store the count of each htmlTag
         Hashtable<String, Integer> ht = new Hashtable<>();
        
         // Initialize the temporary hash-table.  This will be discarded at the end of the method,
         // and converted into a parallel array.  (Parallel to the input String... htmlTags array).
         // Also, check to make sure the user hasn't requested a count of Singleton HTML Elements.
        
         for (String htmlTag : htmlTags)
         {
             if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
                 "One of the tags you have passed: [" + htmlTag + "] is a singleton-tag, " +
                 "and is only allowed opening versions of the tag."
             );
        
             ht.put(htmlTag, Integer.valueOf(0));
         }
        
         Integer I;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
         for (Object o : html) if (o instanceof TagNode)
         {
             TagNode tn = (TagNode) o;
        
             // Get the current count from the hash-table
             I = ht.get(tn.tok);
        
             // The hash-table only holds elements we are counting, if null, then skip.
             if (I == null) continue;
        
             // Save the new, computed count, in the hash-table
             //
             // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
             // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             ht.put(tn.tok, Integer.valueOf(I.intValue() + (tn.isClosing ? -1 : 1)));
         }
        
         // Convert the hash-table to an integer-array, and return this to the user
         int[] ret = new int[htmlTags.length];
        
         for (int i=0; i < ret.length; i++)
             ret[i] = 0;
        
         for (int i=0; i < htmlTags.length; i++)
             if ((I = ht.get(htmlTags[i])) != null) 
                 ret[i] = I.intValue();
            
         return ret;
        
      • checkNonZero

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​java.lang.Integer> checkNonZero​
                    (java.util.Hashtable<java.lang.String,​java.lang.Integer> ht)
        
        Creates a Hashtable that has a count of all open and closed HTML-Tags found on the page - whose count-value is not equal to zero.

        This method will report when there are unbalanced HTML-Tags on a page, and strictly ignore any & all tags with a count of zero. Specifically, if a tag has a 1-to-1 open-close count, then it will not have any keys avialable in the returned Hashtable.

        Browser Validity:
        There are some browser-parse advocates who may state that not all HTML Tags have to be closed. For instance, there are plenty of page out there that won't always use a '</LI>' Tag for elements of an Ordered or Un-Ordered List.

        These types of subtle nuances hint at the commonly-heard phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.

        Cloned Input:
        This method clones the input Hashtable parameter 'ht', and removes the elements whose depth was equal zero. This allows the user to perform other operations with the original values contained by the original table, without those changes affecting this method once it has started processing.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available check(...) methods.
        Returns:
        A Hashtable map of the count of each html-element present in this Vector. For instance, if this Vector had 5 '<A ...>' (Anchor-Link) elements, and six '</A>' then this Hashtable would have a String-key 'a' with an integer value of '-1'.
        Code:
        Exact Method Body:
         @SuppressWarnings("unchecked")
         Hashtable<String, Integer>  ret     = (Hashtable<String, Integer>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         while (keys.hasMoreElements())
         {
             String key = keys.nextElement();
        
             // Remove any keys (HTML element-names) that have a normal ('0') count.
             if (ret.get(key).intValue() == 0) ret.remove(key);
         }
        
         return ret;
        
      • checkTag

        🡅  🡇     🗕  🗗  🗖
        public static int checkTag​(java.util.Vector<? super TagNode> html,
                                   java.lang.String htmlTag)
        This will compute a count for just one, particular, HTML Element of whether that Element has been properly opened and closed. An open and close count (integer value) will be returned by this method.

        Browser Validity:
        There are some browser-parse advocates who may state that not all HTML Tags have to be closed. For instance, there are plenty of page out there that won't always use a '</LI>' Tag for elements of an Ordered or Un-Ordered List.

        These types of subtle nuances hint at the commonly-heard phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose open-close count needs to be kept.
        Returns:
        The count of each html-element present in this Vector. For instance, if the user had requested that HTML Anchor Links be counted, and if the input Vector had 5 '<A ...>' (Anchor-Link) elements, and six '</A>' then this method would return -1.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         TagNode tn;     int i = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
        
         for (Object o : html) if (o instanceof TagNode) 
        
             // If we encounter an HTML Element whose tag is the tag whose count we are 
             // computing, then....
        
             if ((tn = (TagNode) o).tok.equals(htmlTag))
                    
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
                 i += tn.isClosing ? -1 : 1;
        
         return i;
        
      • depth

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​int[]> depth​
                    (java.util.Vector<? super TagNode> html)
        
        This method will calculate the "Maximum" and "Minimum" depth for every HTML 5.0 Tag found on a page. The Max-Depth is the "Maximum-Number" of Opening HTML Element Opening Tags were found for a particular element, before a matching closing version of the same Element is encountered. In the example below, the maximum "open-count" for the HTML 'divider' Element (<DIV>) is '2'. This is because a second <DIV> element is opened before the first is closed.

        HTML Elements:
         <DIV class="MySection"><H1>These are my ideas:</H1>
         <!-- Above is an outer divider, below is an inner divider -->
         <DIV class="MyNumbers">Here are the points:
         <!-- HTML Content Here -->
         </DIV></DIV>
        


        Browser Validity:
        Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML."

        What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>' tags, <table> tags, and of course any number of nested <DIV> tags.
        In such an HTML page, the elements 'tr', 'td', 'table' (among others) could all have depths that reach much higher than 1.

        'Count' Computation-Heuristic:
        This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop which calculaties the count shall hopefully explain this computation clearly enough. This may be viewed in this method's hilited source-code, below.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        The returned Hashtable will contain an integer-array for each HTML Element that was found on the page. Each of these arrays shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         Hashtable<String, int[]> ht = new Hashtable<>();
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and not HTML Comments
         for (Object o : html) if (o instanceof TagNode) 
         {
             TagNode tn = (TagNode) o;
        
             // Don't keep a count on singleton tags.
             if (HTMLTags.isSingleton(tn.tok)) continue;
        
             int[] curMaxAndMinArr = ht.get(tn.tok);
        
             // If this is the first encounter of a particular HTML Element, create a MAX/MIN
             // integer array, and initialize it's values to zero.
        
             if (curMaxAndMinArr == null)
             {
                 curMaxAndMinArr = new int[3];
        
                 curMaxAndMinArr[0] = 0;     // Current Min Depth Count for Element "tn.tok" is zero
                 curMaxAndMinArr[1] = 0;     // Current Max Depth Count for Element "tn.tok" is zero
                 curMaxAndMinArr[2] = 0;     // Current Computed Depth Count for "tn.tok" is zero
        
                 ht.put(tn.tok, curMaxAndMinArr);
             }
        
             // curCount += tn.isClosing ? -1 : 1;
             //
             // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
             // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             curMaxAndMinArr[2] += tn.isClosing ? -1 : 1;
        
             // If the current depth-count is a "New Minimum" (a new low! :), then save it in the
             // minimum pos of the output-array.
        
             if (curMaxAndMinArr[2] < curMaxAndMinArr[0]) curMaxAndMinArr[0] = curMaxAndMinArr[2];
        
             // If the current depth-count (for this tag) is a "New Maximum" (a new high), save it
             // to the max-pos of the output-array.
        
             if (curMaxAndMinArr[2] > curMaxAndMinArr[1]) curMaxAndMinArr[1] = curMaxAndMinArr[2];
         }
        
         return ht;
        
      • depth

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​int[]> depth​
                    (java.util.Vector<? super TagNode> html,
                     java.lang.String... htmlTags)
        
        This method will calculate the "Maximum" and "Minimum" depth for every HTML Tag listed in the var-args String[] htmlTags parameter. The Max-Depth is the "Maximum-Number" of Opening HTML Element Opening Tags were found for a particular element, before a matching closing version of the same Element is encountered. In the example below, the maximum 'open-count' for the HTML 'divider' Element (<DIV>) is '2'. This is because a second <DIV> element is opened before the first is closed.

        HTML Elements:
         <DIV class="MySection"><H1>These are my ideas:</H1>
         <!-- Above is an outer divider, below is an inner divider -->
         <DIV class="MyNumbers">Here are the points:
         <!-- HTML Content Here -->
         </DIV></DIV>
        


        Browser Validity:
        Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML."

        What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>' tags, <table> tags, and of course any number of nested <DIV> tags.
        In such an HTML page, the elements 'tr', 'td', 'table' (among others) could all have depths that reach much higher than 1.

        'Count' Computation-Heuristic:
        This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop which calculaties the count shall hopefully explain this computation clearly enough. This may be viewed in this method's hilited source-code, below.

        Var-Args Addition:
        This method differs from the method with an identical name (defined above) in that it adds a String-VarArgs parameter that allows a user to decide which tags he would like counted and returned in this Hashtable, and which he would like to ignore.

        If one of the requested HTML-Tags from thisString-VarArgs parameter is not actually an HTML Element present on the page, the returned Hashtable will still contain an int[]-Array for that tag. The values in that array will be equal to zero.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        The returned Hashtable will contain an integer-array for each HTML Element that was found on the page. Each of these arrays shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         // Check that these are all valid HTML Tags, throw an exception if not.
         htmlTags = ARGCHECK.htmlTags(htmlTags);
        
         Hashtable<String, int[]> ht = new Hashtable<>();
        
         // Initialize the temporary hash-table.  This will be discarded at the end of the method,
         // and converted into a parallel array.  (Parallel to the input String... htmlTags array).
         // Also, check to make sure the user hasn't requested a count of Singleton HTML Elements.
        
         for (String htmlTag : htmlTags)
         {
             if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
                 "One of the tags you have passed: [" + htmlTag + "] is a singleton-tag, " +
                 "and is only allowed opening versions of the tag."
             );
        
             // Insert an initialized array (init to zero) for this HTML Tag/Token
             int[] arr = new int[3];
        
             arr[0] = 0;     // Current Minimum Depth Count for HTML Element "tn.tok" is zero
             arr[1] = 0;     // Current Maximum Depth Count for HTML Element "tn.tok" is zero
             arr[2] = 0;     // Current Computed Depth Count is HTML Element "tn.tok" is zero
        
             ht.put(htmlTag, arr);
         }
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text,
         // and not HTML Comments
        
         for (Object o: html) if (o instanceof TagNode) 
         {
             TagNode tn = (TagNode) o;
        
             int[] curMaxAndMinArr = ht.get(tn.tok);
        
             // If this is null, we are attempting to perform the count on an HTML Element that
             // wasn't requested by the user with the var-args 'String... htmlTags' parameter.
             // The Hashtable was initialized to only have those tags. (see about 5 lines above 
             // where the Hashtable is initialized)
        
             if (curMaxAndMinArr == null) continue;
        
             // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
             // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             curMaxAndMinArr[2] += tn.isClosing ? -1 : 1;
        
             // If the current depth-count is a "New Minimum" (a new low! :), then save it in the
             // minimum pos of the output-array.
        
             if (curMaxAndMinArr[2] < curMaxAndMinArr[0]) curMaxAndMinArr[0] = curMaxAndMinArr[2];
        
             // If the current depth-count (for this tag) is a "New Maximum" (a new high), save it
             // to the max-pos of the output-array.
        
             if (curMaxAndMinArr[2] > curMaxAndMinArr[1]) curMaxAndMinArr[1] = curMaxAndMinArr[2];
        
             // NOTE:    No need to update the hash-table, since this is an array - changing its
             //          values is already "reflected" into the Hashtable.
         }
        
         return ht;
        
      • depthInvalid

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​int[]> depthInvalid​
                    (java.util.Hashtable<java.lang.String,​int[]> ht)
        
        Creates a Hashtable that has a maximum and minimum depth for all HTML tags found on the page. Any HTML Tags that meet ALL of these criteria shall be removed from the result-set Hashtable ...

        • Minimum Depth Is '0' - i.e. closing tag never precedes opening.
        • Count is '0' - i.ei. there is a 1-to-1 ratio of opening and closing tags for the particular HTML Element.


        NOTE: This means that there is a 1:1 ratio of opening and closing versions of the tag, and also that there are no positions in the vector where a closing tag to come before an tag to open it.

        Cloned Input:
        This method clones the original input Hashtable, and removes the tags whose depth-calculations are invalid - as described above. This allows the user to perform other operations with the original table, while this class is processing.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available depth(...) methods.
        Returns:
        This shall a return a list of HTML Tags that are potentially (but not guaranteed to be) invalid.
        Code:
        Exact Method Body:
         @SuppressWarnings("unchecked")
         Hashtable<String, int[]>    ret     = (Hashtable<String, int[]>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         // Using the "Enumeration" class allows the situation where elements can be removed from
         // the underlying data-structure - while iterating through that data-structure.  This is
         // not possible using a keySet Iterator.
        
         while (keys.hasMoreElements())
         {
             String  key = keys.nextElement();
             int[]   arr = ret.get(key);
        
             if ((arr[1] >= 0) && (arr[2] == 0)) ret.remove(key);
         }
        
         return ret;
        
      • depthGreaterThanOne

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Hashtable<java.lang.String,​int[]> depthGreaterThanOne​
                    (java.util.Hashtable<java.lang.String,​int[]> ht)
        
        Creates a Hashtable that has a maximum and minimum depth for all HTML tags found on the page. Any HTML Tags that meet ALL of these criteria, below, shall be removed from the result-set Hashtable ...

        • Maximum Depth is precisely '1' - i.e. Each element of this tag is closed before a second is open.


        Cloned Input:
        This method clones the original input Hashtable, and removes the tags whose maximum-depth is not greater than one. This allows the user to perform other operations with the original table, while this class is processing.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available depth(...) methods.
        Returns:
        This shall a return a list of HTML Tags that are potentially (but not guaranteed to be) invalid.
        Code:
        Exact Method Body:
         @SuppressWarnings("unchecked")
         Hashtable<String, int[]>    ret     = (Hashtable<String, int[]>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         // Using the "Enumeration" class allows the situation where elements can be removed from
         // the underlying data-structure - while iterating through that data-structure.  This is not
         // possible using a keySet Iterator.
        
         while (keys.hasMoreElements())
         {
             String  key = keys.nextElement();
             int[]   arr = ret.get(key);
        
             if (arr[1] == 1) ret.remove(key);
         }
        
         return ret;
        
      • depthTag

        🡅  🡇     🗕  🗗  🗖
        public static int[] depthTag​(java.util.Vector<? super TagNode> html,
                                     java.lang.String htmlTag)
        This method will calculate the "Maximum" and "Minimum" depth for a particular HTML Tag. The Max-Depth just means the number of Maximum-Number of Opening HTML Element Opening Tags were found, before a matching closing version of the same Element is encountered. For instance: <DIV ...><DIV ..> Some Page</DIV></DIV> has a maximum depth of '2'. This means there is a point in the vectorized-html where there are 2 successive divider elements that are opened, before even one has been closed.

        Browser Validity:
        Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML."

        What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>' tags, <table> tags, and of course any number of nested <DIV> tags.
        In such an HTML page, the elements 'tr', 'td', 'table' (among others) could all have depths that reach much higher than 1.

        'Count' Computation-Heuristic:
        This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop which calculaties the count shall hopefully explain this computation clearly enough. This may be viewed in this method's hilited source-code, below.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose maximum and minimum depth-count needs to be computed.
        Returns:
        The returned integer-array, shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only allowed " +
             "opening versions of the tag."
         );
        
         TagNode tn;     int i = 0;      int max = 0;        int min = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and not HTML Comments
         for (Object o : html) if (o instanceof TagNode)
        
             if ((tn = (TagNode) o).tok.equals(htmlTag))
             {
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
                 i += tn.isClosing ? -1 : 1;
        
                 if (i > max) max = i;
                 if (i < min) min = i;
             }
        
         // Generate the output array, and return
         int[] ret = new int[2];
        
         ret[0] = min;
         ret[1] = max;
         ret[2] = i;
        
         return ret;
        
      • nonNestedCheck

        🡅  🡇     🗕  🗗  🗖
        public static int[] nonNestedCheck​(java.util.Vector<? super TagNode> html,
                                           java.lang.String htmlTag)
        This will find the (likely) places where the "non-nested HTML Elements" have become nested. For the purposes of finding mismatched elements - such as an unclosed "Italics" Element, or an "Extra" Italics Element - this method will find places where a new HTML Tag has opened before a previous one has been closed - or vice-versa (where there is an 'extra' closed-tag).

        Certainly, if "nesting" is usually acceptable (for instance the HTML divider '<DIV>...</DIV>' construct) then the results of this method would not have any meaning. Fortunately, for the vast majority of HTML Elements <I>, <B>, <A>, etc... nesting the tags is not allowed or encouraged.

        The following example use of this method should make clear the application. If a user has identified that there is an unclosed HTML italics element (<I>...</I>) somewhere on a page, for-example, and that page has numerous italics elements, this method can pinpoint the failure instantly, using this example. Note that the file-name is a Java-Doc generated output HTML file. The documentation for this package received a copious amount of attention due to the sheer number of method-names and class-names used throughout.

        Example:
         String           fStr    = FileRW.loadFileToString("javadoc/Torello/HTML/TagNode.html");
         Vector<HTMLNode> v       = HTMLPage.getPageTokens(fStr, false);
         int[]            posArr  = Balance.nonNestedCheck(v, "i");
         
         // Below, the class 'Debug' is used to pretty-print the vectorized-html page.  Here, the
         // output will find the lone, non-closed, HTML italics <I> ... </I> tag-element, and output
         // it to the terminal-window.  The parameter '5' means the nearest 5 elements (in either
         // direction) are printed, in addition to the elements at the indices in the posArr.
         // Parameter 'true' implies that two curly braces are printed surrounding the matched node.
         
         System.out.println(Debug.print(v, posArr, 5, " Skip a few ", true, Debug::K));
        
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose maximum and minimum depth-count was not 1 and 0, respectively. The precise location where the depth achieved either a negative depth, or depth greater than 1 will be returned in the integer array. In English: When two opening-tags or two closing-tags are identified, successively, then the index where the second tag was found is recorded into the output array.
        Returns:
        This will return an array of vectorized-html index-locations / index-pointers where the first instance of an extra opening, or an extra-closing tag, occurs. This will facilitate finding tags that are not intended to be nested. If "tag-nesting" (for example HTML divider, 'DIV', elements), then the results returned by this method will not be useful.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        See Also:
        FileRW.loadFileToString(String), HTMLPage.getPageTokens(CharSequence, boolean), Debug.print(Vector, int[], int, String, boolean, BiConsumer)
        Code:
        Exact Method Body:
         // Java Streams are an easier way to keep variable-length lists.  They use
         // "builders" - and this one is for an "IntStream"
        
         IntStream.Builder b = IntStream.builder();
        
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         Object o;     TagNode tn;     int len = html.size();      TC last = null;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text,
         // and not HTML Comments
        
         for (int i=0; i < len; i++)
        
             if ((o = html.elementAt(i)) instanceof TagNode) 
                 if ((tn = (TagNode) o).tok.equals(htmlTag))
                 {
                     if ((tn.isClosing)      && (last == TC.ClosingTags))    b.add(i);
                     if ((! tn.isClosing)    && (last == TC.OpeningTags))    b.add(i);
        
                     last = tn.isClosing ? TC.ClosingTags : TC.OpeningTags;
                 }
        
         return b.build().toArray();
        
      • locationsAndDepth

        🡅  🡇     🗕  🗗  🗖
        public static Ret2<int[],​int[]> locationsAndDepth​
                    (java.util.Vector<? super TagNode> html,
                     java.lang.String htmlTag)
        
        For likely greater than 95% of HTML tags - finding situations where that tag has 'nested tags' is highly unlikely. Unfortunately, two or three of the most common tags in use, for instance <DIV>, <SPAN>, finding where a mis-match has occurred (tracking down an "Unclosed divider") is an order of magnitude more difficult than finding an unclosed anchor '<A HREF...>'. This method shall return two parallel arrays. The first array will contain vector indices. The second array contains the depth (nesting level) of that tag at that position. In this way, finding an unclosed divider is tantamount to finding where all closing-dividers seem to evaluate to a depth of '1' (one) rather than '0' (zero).

        NOTE: This method can highly useful for SPAN and DIV, while the "non-standard depth locations" method can be extremely useful for simple, non-nested tags such as Anchor, Paragraph, Section, etc... - HTML Elements that are mostly never nested.

        Example:
         // Load an HTML File to a String
         String file = LFEC.loadFile("~/HTML/MyHTMLFile.html");
         
         // Parse, and convert to vectorized-html
         Vector<HTMLNode> v = HTMLPage.getPageTokens(file, false);
         
         // Run this method
         Ret2<int[], int[]> r = Balance.locationsAndDepth(v, "div");
         
         // This array has vector-indices
         int[] posArr = (int[]) r.a;
         
         // This (parallel) array has the depth at that index.
         int[] depthArr = (int[]) r.b;
         
         for (int i=0; i < posArr.length; i++) System.out.println(
             "(" + posArr[i] + ", " + depthArr[i] + "):\t" +    // Prints the Vector-Index, and Depth
             C.BRED + v.elementAt(posArr[i]).str + C.RESET      // Prints the actual HTML divider.
         );
        

        The above code would produce a list of HTML Divider elements, along with their index in the Vector, and the exact depth (number of nested, open 'DIV' elements) at that location. This is usually helpful when trying to find unclosed HTML Tags.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? super TagNode' means that a Vector<TagNode> or a Vector<HTMLNode> are both accepted by this parameter. They will not cause an exception throw.

        Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element that has an imbalanced OPEN-CLOSE ratio in the tree.
        Returns:
        Two parallel arrays, as follows:

        1. Ret2.a (int[])

          This shall be an integer array of Vector-indices where the HTML Element has been found.

        2. Ret2.b (int[])

          This shall contain an array of the value of the depth for the 'htmlTag' at the particular Vector-index identified in the first-array.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         // Java Streams are an easier way to keep variable-length lists.  They use
         // "builders" - and this one is for an "IntStream"
        
         IntStream.Builder locations         = IntStream.builder();
         IntStream.Builder depthAtLocation   = IntStream.builder();
        
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         Object o;     TagNode tn;     int len = html.size();      int depth = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
        
         for (int i=0; i < len; i++) if ((o = html.elementAt(i)) instanceof TagNode) 
        
         if ((tn = (TagNode) o).tok.equals(htmlTag))
             {
                 depth += tn.isClosing ? -1 : 1;
        
                 locations.add(i);
        
                 depthAtLocation.add(depth);
             }
        
         return new Ret2<int[], int[]>
             (locations.build().toArray(), depthAtLocation.build().toArray());
        
      • toStringDepth

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String toStringDepth​
                    (java.util.Hashtable<java.lang.String,​int[]> depthReport)
        
        Converts a depth report to a String, for printing.
        Parameters:
        depthReport - This should be a Hashtable returned by any of the depth-methods.
        Returns:
        This shall return the report as a String.
        Code:
        Exact Method Body:
         StringBuilder sb = new StringBuilder();
        
         for (String htmlTag : depthReport.keySet())
         {
             int[] arr = depthReport.get(htmlTag);
        
             sb.append(
                 "HTML Element: [" + htmlTag + "]:\t" +
                 "Min-Depth: " + arr[0] + ",\tMax-Depth: " + arr[1] + ",\tCount: " + arr[2] + "\n"
             );
         }
        
         return sb.toString();
        
      • toStringBalance

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String toStringBalance​
                    (java.util.Hashtable<java.lang.String,​java.lang.Integer> balanceCheckReport)
        
        Converts a balance report to a String, for printing.
        Parameters:
        balanceCheckReport - This should be a Hashtable returned by any of the balance-check methods.
        Returns:
        This shall return the report as a String.
        Code:
        Exact Method Body:
         StringBuilder   sb              = new StringBuilder();
         int             maxTagLen       = 0;
         int             maxValStrLen    = 0;
         int             maxAbsValStrLen = 0;
         int             val;
         String          valAsStr;
        
         // For good spacing purposes, we need the length of the longest of the tags.
         for (String htmlTag : balanceCheckReport.keySet())
             if (htmlTag.length() > maxTagLen)
                 maxTagLen = htmlTag.length();
        
         // 17 is the length of the string below, 2 is the amount of extra-space needed
         maxTagLen += 17 + 2; 
        
         for (int v : balanceCheckReport.values())
             if ((valAsStr = ("" + v)).length() > maxValStrLen)
                 maxValStrLen = valAsStr.length();
        
         for (int v : balanceCheckReport.values())
             if ((valAsStr = ("" + Math.abs(v))).length() > maxAbsValStrLen)
                 maxAbsValStrLen = valAsStr.length();
        
         for (String htmlTag : balanceCheckReport.keySet())
        
             sb.append(
                 StringParse.rightSpacePad("HTML Element: [" + htmlTag + "]:", maxTagLen) +
                 StringParse.rightSpacePad(
                     ("" + (val = balanceCheckReport.get(htmlTag).intValue())),
                     maxValStrLen
                 ) +
                 NOTE(val, htmlTag, maxAbsValStrLen) +
                 "\n"
             );
        
         return sb.toString();
        
      • toStringBalance

        🡅     🗕  🗗  🗖
        public static java.lang.String toStringBalance​
                    (int[] balanceCheckReport,
                     java.lang.String... htmlTags)
        
        Converts a balance report to a String, for printing.
        Parameters:
        balanceCheckReport - This should be a Hashtable returned by any of the balance-check methods.
        Returns:
        This shall return the report as a String.
        Throws:
        java.lang.IllegalArgumentException - This exception throws if the length of the two input arrays are not equal. It is imperative that the balance report being printed was created by the html-tags that are listed in the HTML Token var-args parameter. If the two arrays are the same length, but the tags used to create the report Hashtable are not the same ones being passed to the var-args parameter 'htmlTags' - the logic will not know the difference, and no exception is thrown.
        Code:
        Exact Method Body:
         if (balanceCheckReport.length != htmlTags.length) throw new IllegalArgumentException(
             "The balance report that you are checking was not generated using the html token " +
             "list provided, they are different lengths.  balanceCheckReport.length: " +
             "[" + balanceCheckReport.length + "]\t htmlTags.length: [" + htmlTags.length + "]"
         );
        
         StringBuilder sb = new StringBuilder();
        
         for (int i=0; i < balanceCheckReport.length; i++)
             sb.append("HTML Element: [" + htmlTags[i] + "]:\t" + balanceCheckReport[i] + "\n");
        
         return sb.toString();