Package Torello.HTML

Class HTMLNode

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.CharSequence, java.lang.Cloneable
    Direct Known Subclasses:
    CommentNode, TagNode, TextNode

    public abstract class HTMLNode
    extends java.lang.Object
    implements java.lang.CharSequence, java.io.Serializable, java.lang.Cloneable
    This class is mostly a wrapper for class java.lang.String, and serves as the abstract parent of the three types of HTML elements offered by the Java HTML Library.

    This abstract class is the parent class for TagNode and also TextNode. It is an "immutable" class - meaning that the contents of an HTMLNode can never change. Roughly 80% of instances of HTMLNode will never change, and even having multiple instances of a BR, HR, H1, H2, H3, B, or I node is unnecessary. The HTMLPage.getPageTokens() is the class that worries about creating HTMLNode's. In order to change an HTMLNode, one must instantiate a new one.

    Light-Weight, Immutable Data-Class All three of the standard classes that inherit from abstract class HTMLNode are very light-weight, and do not contain any internal state what-so-ever, other than the String which represents the TagNode, CommentNode, or TextNode itself. When it is said that 100% of HTMLNode's are Immutable data-classes, this is an actual fact. The TagNode class has quite a few 'getter' methods, but each and every 'setter' method returns a new instance of TagNode that contains a completely new String inside. The classes TextNode CommentNode do not have any 'setter' methods at all.

    The three inherited classes of abstract class HTMLNode are very light-weight, and contain some amount of public methods, but do not have heavy internal-state (either static, or non-static). Below is a list of the internal field's that are added to each of the three instantiations of the ancestor HTMLNode class:

    • class TagNode adds a field public final boolean isClosing - which tells a user if this tag has a forward-slash immediately following the '<' (less-than symbol) at character position 2. This is how one identifies a 'closing-version' of the element, for instance: '</DIV>' and '</SPAN>' would both have their public final boolean isClosing fields set to TRUE. There is also a public final String tok field added to instances of TagNode that identify what html element the TagNode represents. For example an HTML Element such as: <A HREF="http://My.URL.com" TARGET=_blank>, would have it's String 'tok' field set to 'a'

    • class TextNode this inherited class from class HTMLNode does not add any internal state at all. It has the exact same internally-maintained fields as its parent-class. The public final String str field merely states what text this text-node actually represents.

    • class CommentNode for searching-purposes, and ease-of-use, class CommentNode, which is the third and final class to inherit HTMLNode keeps one extra internal-field, which is public final String body. This field is a redundant, duplicate, of the internal string public final String str - which is inherited from the HTML Node class. The subtle difference is that, since comment nodes represent the HTML <!-- and --> symbols, the 'body' of the comment sometimes needs to be searched, quickly. The public final String body leaves off these leading and ending comment delimiter symbols: <!-- and -->

    Below is the inheritance diagram (with fields) of the three concrete-classes that extend the abstract class HTMLNode:

    HTMLNode Inheritance Diagram
    See Also:
    Serialized Form


    • Constructor Summary

      Constructors 
      Modifier Constructor
      protected HTMLNode​(String s)
    • Method Summary

       
      'instanceof' Operator-Replacement Methods
      Modifier and Type Method
      boolean isCommentNode()
      boolean isTagNode()
      boolean isTextNode()
       
      Methods: interface java.lang.CharSequence
      Modifier and Type Method
      char charAt​(int index)
      int length()
      CharSequence subSequence​(int start, int end)
      String toString()
       
      Methods: class java.lang.Object
      Modifier and Type Method
      abstract HTMLNode clone()
      boolean equals​(Object o)
      int hashCode()
       
      Simple Loop Optimization Methods
      Modifier and Type Method
      TagNode openTag()
      TagNode openTagPWA()
      • Methods inherited from class java.lang.Object

        finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.lang.CharSequence

        chars, codePoints
    • Field Detail

      • serialVersionUID

        public static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final long serialVersionUID = 1;
        
      • str

        public final java.lang.String str
        This is an immutable field. It stores the complete contents of an HTML node. It can be either the "textual contents" of an HTML TagNode, or the text (directly) of the text-inside of an HTML page!

        FOR INSTANCE:

        • A subclass of HTMLNode - TagNode - could contain the String <SPAN STYLE="CSS INFO">" inside this str field here.
        • The other sub-class of HTML - TextNode - could contain the String "This is a news-page from www.Gov.CN Chinese Government Portal." inside this str field here.

        NOTE: Because sub-classes of HTMLNode are all immutable, generally, if you wish to change the contents of an HTML page, a programmer is required to create new nodes, rather than changing these fields.
        Code:
        Exact Field Declaration Expression:
        public final String str;
        
    • Constructor Detail

      • HTMLNode

        protected HTMLNode​(java.lang.String s)
        Constructor that builds a new HTMLNode
        Parameters:
        s - A valid string of an HTML element.
    • Method Detail

      • hashCode

        public final int hashCode()
        Java's hash-code requirement.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        A hash-code that may be used when storing this node in a java sorted-collection.
        Code:
        Exact Method Body:
         return this.str.hashCode();
        
      • equals

        public final boolean equals​(java.lang.Object o)
        Java's public boolean equals(Object o) requirements.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        o - This may be any Java Object, but only ones of 'this' type whose internal-values are identical will cause this method to return TRUE.
        Returns:
        TRUE If 'this' equals another object HTMLNode.
        Code:
        Exact Method Body:
         return      (this == o)
                 || (    (o != null)
                     &&  (this.getClass().equals(o.getClass()))
                     &&  (((HTMLNode) o).str.equals(this.str)));
        
      • clone

        public abstract HTMLNode clone()
        Sub-classes of HTMLNode must be Cloneable.
        Overrides:
        clone in class java.lang.Object
        Returns:
        Must return an identical copy of 'this' node. The object reference cannot be 'this' reference.
      • toString

        public final java.lang.String toString()
        Java's toString() requirement.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Specified by:
        toString in interface java.lang.CharSequence
        Overrides:
        toString in class java.lang.Object
        Returns:
        A String-representation of this HTMLNode.
        Code:
        Exact Method Body:
         return this.str;
        
      • charAt

        public final char charAt​(int index)
        Returns the char value at the specified index of the field: public final String str. An index ranges from '0' (zero) to HTMLNode.str.length() - 1. The first char value of the sequence is at index zero, the next at index one, and so on, as for array indexing.

        NOTE: If the char value specified by the index is a surrogate, the surrogate value is returned.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Specified by:
        charAt in interface java.lang.CharSequence
        Parameters:
        index - The index of the char value to be returned
        Returns:
        The specified char value
        Code:
        Exact Method Body:
         return str.charAt(index);
        
      • length

        public final int length()
        Returns the length of the field public final String str. The length is the number of 16-bit chars in the sequence.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Specified by:
        length in interface java.lang.CharSequence
        Returns:
        the number of chars in this.str
        Code:
        Exact Method Body:
         return str.length();
        
      • subSequence

        public final java.lang.CharSequence subSequence​(int start,
                                                        int end)
        Returns a CharSequence that is a subsequence of the public final String str field of 'this' HTMLNode. The subsequence starts with the char value at the specified index and ends with the char value at index end - 1. The length (in chars) of the returned sequence is end - start, so if start == end then an empty sequence is returned.

        FINAL METHOD: This method is final, and cannot be modified by sub-classes.
        Specified by:
        subSequence in interface java.lang.CharSequence
        Parameters:
        start - The start index, inclusive
        end - The end index, exclusive
        Returns:
        The specified subsequence
        Code:
        Exact Method Body:
         return str.substring(start, end);
        
      • isCommentNode

        public boolean isCommentNode()
        This method will return TRUE for any instance of 'CommentNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of CommentNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class CommentNode overrides this method, and returns TRUE.
        See Also:
        CommentNode.isCommentNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass CommentNode, where it shall return
         // TRUE.  Neither class TextNode, nor class TagNode will over-ride this method.
        
         return false;
        
      • isTextNode

        public boolean isTextNode()
        This method will return TRUE for any instance of 'TextNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of TextNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class TextNode overrides this method, and returns TRUE.
        See Also:
        TextNode.isTextNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass CommentNode, where it shall return
         // TRUE.  Neither class TextNode, nor class TagNode will over-ride this method.
        
         return false;
        
      • isTagNode

        public boolean isTagNode()
        This method will return TRUE for any instance of 'TagNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of TagNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class TagNode overrides this method, and returns TRUE.
        See Also:
        TagNode.isTagNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode, where it shall return
         // TRUE.  Neither class TextNode, nor class CommentNode will over-ride this method.
        
         return false;
        
      • openTagPWA

        public TagNode openTagPWA()
        PWA: Open Tag, 'Possibly With Attributes' This method is offered as an optimization tool for quickly finding HTML Tag's that have attributes inside of a search-loop. Since this class, class HTMLNode, is the abstract parent of all three HTML element-types, and since TagNode is the only one of those three that overrides this method, the "optimization" acheived is that, by default, most concrete-instances of this abstract-parent class HTMLNode will immediately return null when queried by a search-loop - which makes the process of finding TagNode's with a particular attribute much more efficient.

        The purpose of this method is to quickly return a node that has been cast to an instance of TagNode, if it is, indeed, a TagNode and if it has an internal-String long-enough to possibly contain attributes (inner-tags).
        Returns:
        This method shall always return null, unless this method has been overridden by a sub-class. Only TagNode overrides this method, and if the particular instance of TagNode being tested contains an internal 'str' field that's long-enough to have attributes (and it's 'isClosing' field is FALSE), then and only then shall this return a non-null value.

        When the overridden TagNode sub-class returns a non-null result, that value will always be equal to 'this'

        AGAIN: For overriding sub-class TagNode, when this method returns a non-null value (always 'this' instance), it will be because the internal-String is actually long enough to be worthy of being tested for attribute / inner-tag key-value pairs.
        See Also:
        TagNode.openTagPWA()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns null.
         // In 'TagNode' this method returns true based on the 'isClosing' field from that class,
         // and the length of the 'str' field from this class.
        
         return null;
        
      • openTag

        public TagNode openTag()
        This method is offered as an optimization tool for quickly finding HTML Tag's that are Opening-Tags. Since this class, class HTMLNode, is the abstract parent of all three HTML element-types, and since TagNode is the only one of those three that overrides this method, the "optimization" acheived is that, by default, most concrete-instances of this abstract-parent class HTMLNode will immediately return null when queried by a search-loop - which makes the process of finding opening TagNode's more efficient.
        Returns:
        This method shall always return null, unless this method has been overridden by a sub-class. Only TagNode overrides this method, and if the particular instance of TagNode being tested has a FALSE value assigned to its 'isClosing' field then and only then shall this return a non-null value.

        When the overridden TagNode sub-class returns a non-null result, that value will always be equal to 'this'
        See Also:
        TagNode.openTag()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns null.
         // In 'TagNode' this method returns true based on that class 'isClosing' field.
        
         return null;