Package Torello.HTML

Class Util.Count

  • Enclosing class:
    Util

    public static class Util.Count
    extends java.lang.Object



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 15 Method(s), 15 declared static
    • 0 Field(s)


    • Method Summary

       
      Count CommentNode instances
      Modifier and Type Method
      static int commentNodes​(Vector<HTMLNode> page)
      static int commentNodes​(Vector<HTMLNode> page, int sPos, int ePos)
      static int commentNodes​(Vector<HTMLNode> page, DotPair dp)
       
      Count TagNode instances
      Modifier and Type Method
      static int tagNodes​(Vector<HTMLNode> page)
      static int tagNodes​(Vector<HTMLNode> page, int sPos, int ePos)
      static int tagNodes​(Vector<HTMLNode> page, DotPair dp)
       
      Count TextNode intances
      Modifier and Type Method
      static int textNodes​(Vector<HTMLNode> page)
      static int textNodes​(Vector<HTMLNode> page, int sPos, int ePos)
      static int textNodes​(Vector<HTMLNode> page, DotPair dp)
       
      Count all New-Lines
      Modifier and Type Method
      static int newLines​(Vector<? extends HTMLNode> html)
      static int newLines​(Vector<? extends HTMLNode> html, int sPos, int ePos)
      static int newLines​(Vector<? extends HTMLNode> html, DotPair dp)
       
      Count TagNode Tokens
      Modifier and Type Method
      static Ret2<Hashtable<String,
           ​Integer>,
           ​Hashtable<String,
           ​Integer>>
      tagNodesToTable​(Vector<HTMLNode> page)
      static Ret2<Hashtable<String,
           ​Integer>,
           ​Hashtable<String,
           ​Integer>>
      tagNodesToTable​(Vector<HTMLNode> page, int sPos, int ePos)
      static Ret2<Hashtable<String,
           ​Integer>,
           ​Hashtable<String,
           ​Integer>>
      tagNodesToTable​(Vector<HTMLNode> page, DotPair dp)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • textNodes

        🡅  🡇     🗕  🗗  🗖
        public static int textNodes​(java.util.Vector<HTMLNode> page,
                                    int sPos,
                                    int ePos)
        Counts the number of TextNode's in a Vector<HTMLNode> between the demarcated array / Vector positions, 'sPos' and 'ePos'
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TextNode's in the Vector between the demarcated indices.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of text-node.
        
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) counter++;
        
         return counter;
        
      • commentNodes

        🡅  🡇     🗕  🗗  🗖
        public static int commentNodes​(java.util.Vector<HTMLNode> page,
                                       int sPos,
                                       int ePos)
        Counts the number of CommentNode's in an Vector<HTMLNode> between the demarcated array / Vector positions.
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of CommentNode's in the Vector between the demarcated indices.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of comment-node.
        
         for (int i=l.start; i < l.end; i++)  if (page.elementAt(i).isCommentNode()) counter++;
        
         return counter;
        
      • tagNodes

        🡅  🡇     🗕  🗗  🗖
        public static int tagNodes​(java.util.Vector<HTMLNode> page,
                                   int sPos,
                                   int ePos)
        Counts the number of TagNode's in a Vector<HTMLNode> between the demarcated array / Vector positions.
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TagNode's in the Vector.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of TagNode.
        
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) counter++;
        
         return counter;
        
      • tagNodesToTable

        🡅  🡇     🗕  🗗  🗖
        public static Ret2<java.util.Hashtable<java.lang.String,​java.lang.Integer>,​java.util.Hashtable<java.lang.String,​java.lang.Integer>> tagNodesToTable​
                    (java.util.Vector<HTMLNode> page,
                     int sPos,
                     int ePos)
        
        For each tag in HTML-5 (according to class HTMLTags, this method counts the number of instances of each TagNode contained by a Vector<HTMLNode>. The count is performed on nodes between the parameter-provided array-indices, and the results are placed into two Hashtable's.
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The returned Ret2 instance contains the following data:

        • ret2.a:

          A java.util.Hashtable that contains one entry for each HTML-Tag present within the page's demarcated array-indicies - 'sPos' and 'ePos'.

          The keys in this table are Java String's that contain a Lower-Case Tag-Token (such as: "div", "p", "span", etc...). The values in this table contain a count on the number of Open-Tags that were identified within the page.

        • ret2.b:

          A java.util.Hashtable with counts for each and every "Closed Tag" on the page, all in an identical manner to that which was described, above, for ret2.a - except the counts in this table are for Closed-Tag's rather than Open-Tag's - </div> tags, rather than <DIV ...> tags.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         LV      l   = new LV(page, sPos, ePos);
         TagNode tn  = null;
        
         Hashtable<String, Integer> openTags     = new Hashtable<>();
         Hashtable<String, Integer> closedTags   = new Hashtable<>();
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of TagNode.
        
         for (int i=l.start; i < l.end; i++)
         {
             if ((tn = page.elementAt(i).ifTagNode()) == null) continue;
        
             Hashtable<String, Integer>  ht      = tn.isClosing ? closedTags : openTags;
             Integer                     count   = ht.get(tn.tok);
        
             if (count == null)  count = 1;
             else                count = count + 1;
        
             ht.put(tn.tok, count);
         }
        
         return new Ret2<>(openTags, closedTags);
        
      • newLines

        🡅     🗕  🗗  🗖
        public static int newLines​(java.util.Vector<? extends HTMLNode> html,
                                   int sPos,
                                   int ePos)
        This will count the number of new-line symbols present - on the partial HTML page. The count will include a sum of every HTMLNode.str that contains the standard new-line symbols: \r\n, \r, \n, meaning that UNIX, MSFT, Apple, etc. forms of text-line rendering should all be treated equally.
        Parameters:
        html - This may be any Vectorized-HTML Web-Page (or sub-page).

        The Variable-Type Wild-Card Expression '? extends HTMLNode' means that a Vector<TagNode>, Vector<TextNode> or Vector<CommentNode> will all be accepted by this paramter without causing an exception throw.

        These 'sub-type' Vectors are often returned as search results from the classes in the 'NodeSearch'vpackage.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter.

        This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of new-line characters in all of the HTMLNode's that occur between vectorized-page positions 'sPos' and 'ePos.'

        NOTE: The regular-expression used here 'NEWLINEP' is as follows:
         private static final Pattern NEWLINEP = Pattern.compile("\\r\\n|\\r|\\n");
        
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        StringParse.NEWLINEP
        Code:
        Exact Method Body:
         int newLineCount    = 0;
         LV  l               = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++)
        
             // Uses the Torello.Java.StringParse "New Line RegEx"
             for (   Matcher m = StringParse.NEWLINEP.matcher(html.elementAt(i).str);
                     m.find();
                     newLineCount++);
        
         return newLineCount;