Package Torello.HTML

Class HTMLNode

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.CharSequence, java.lang.Cloneable
    Direct Known Subclasses:
    CommentNode, TagNode, TextNode

    public abstract class HTMLNode
    extends java.lang.Object
    implements java.lang.CharSequence, java.io.Serializable, java.lang.Cloneable
    PRIMARY ANCESTOR DATA-CLASS:
    This class is the Ancestor of the Primary Java-HTML Data-Classes in this Package: TagNode, TextNode and CommentNode


    Inheritance Tree Diagram:
    Below is the inheritance diagram (with fields) of the three concrete-classes that extend the abstract class HTMLNode:

    HTMLNode Inheritance Diagram
    This class is mostly a wrapper for class java.lang.String, and serves as the abstract parent of the three types of HTML elements offered by the Java HTML Library.

    This abstract class is an "immutable" class - meaning that the contents of an HTMLNode can never change. Roughly 80% of instances of HTMLNode would never be changed - since they don't use Attributes. In those cases, even having multiple instances such tags (for instance: BR, HR, H1, H2, H3, B, or I) is unnecessary.

    When parsing HTML Elements that are 'Element-Only' (without any Attributes / Inner-Tags), the parser will return singleton-instances of the class TagNode to avoid generating extra-amounts of references in Java's Memory Heap.

    This works out fine, since these nodes are immutable, and therefore resuing the same node references in different Vector's won't have any side-effects. The class HTMLPage has a suite of methods that worry about instantiating new HTMLNode's.


    Light-Weight, Immutable Data-Class
    All three of the standard classes that inherit from the abstract class HTMLNode are very light-weight, and do not contain any internal state other than the internal String which represents the TagNode, CommentNode, or TextNode itself.

    When it is said that 100% of HTMLNode's are Immutable data-classes, it is similar to saying that these classes are not a lot more than 'extension-class' for the Java class java.lang.String - which also happens to be immutable.

    The flagship data-class of this JAR-Library, TagNode, has a number of 'getter' methods. However each and every one of the 'setter' methods actually returns a new instance of TagNode. This is quite similar to have Java handles changing or updating the contents of a String.

    The classes TextNode and CommentNode do not have any 'setter' methods at all. Also, all three concrete-subclasses of HTMLNode haver a constructor that accepts a simple java.lang.String
    See Also:
    TagNode, TextNode, CommentNode, Serialized Form


    • Field Detail

      • serialVersionUID

        🡇     🗕  🗗  🗖
        public static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final long serialVersionUID = 1;
        
      • str

        🡅  🡇     🗕  🗗  🗖
        public final java.lang.String str
        This is an immutable field. It stores the complete contents of an HTML node. It can be either the "textual contents" of an HTML TagNode, or the text (directly) of the text-inside of an HTML page!

        FOR INSTANCE:

        • A subclass of HTMLNode - TagNode - could contain the String <SPAN STYLE="CSS INFO">" inside this str field here.
        • The other sub-class of HTML - TextNode - could contain the String "This is a news-page from www.Gov.CN Chinese Government Portal." inside this str field here.

        NOTE: Because sub-classes of HTMLNode are all immutable, generally, if you wish to change the contents of an HTML page, a programmer is required to create new nodes, rather than changing these fields.
    • Constructor Detail

      • HTMLNode

        🡅  🡇     🗕  🗗  🗖
        protected HTMLNode​(java.lang.String s)
        Constructor that builds a new HTMLNode
        Parameters:
        s - A valid string of an HTML element.
    • Method Detail

      • hashCode

        🡅  🡇     🗕  🗗  🗖
        public final int hashCode()
        Java's hash-code requirement.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        A hash-code that may be used when storing this node in a java sorted-collection.
        Code:
        Exact Method Body:
         return this.str.hashCode();
        
      • equals

        🡅  🡇     🗕  🗗  🗖
        public final boolean equals​(java.lang.Object o)
        Java's public boolean equals(Object o) requirements.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        o - This may be any Java Object, but only ones of 'this' type whose internal-values are identical will cause this method to return TRUE.
        Returns:
        TRUE If 'this' equals another object HTMLNode.
        Code:
        Exact Method Body:
         if (o == null) return false;
         if (o == this) return true;
        
         if (! this.getClass().equals(o.getClass())) return false;
        
         return ((HTMLNode) o).str.equals(this.str);
        
      • clone

        🡅  🡇     🗕  🗗  🗖
        public abstract HTMLNode clone()
        Sub-classes of HTMLNode must be Cloneable.
        Overrides:
        clone in class java.lang.Object
        Returns:
        Must return an identical copy of 'this' node. The object reference cannot be 'this' reference.
      • toString

        🡅  🡇     🗕  🗗  🗖
        public final java.lang.String toString()
        Java's toString() requirement.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Specified by:
        toString in interface java.lang.CharSequence
        Overrides:
        toString in class java.lang.Object
        Returns:
        A String-representation of this HTMLNode.
        Code:
        Exact Method Body:
         return this.str;
        
      • charAt

        🡅  🡇     🗕  🗗  🗖
        public final char charAt​(int index)
        Returns the char value at the specified index of the field: public final String str. An index ranges from '0' (zero) to HTMLNode.str.length() - 1. The first char value of the sequence is at index zero, the next at index one, and so on, as for array indexing.

        NOTE: If the char value specified by the index is a surrogate, the surrogate value is returned.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Specified by:
        charAt in interface java.lang.CharSequence
        Parameters:
        index - The index of the char value to be returned
        Returns:
        The specified char value
        Code:
        Exact Method Body:
         return str.charAt(index);
        
      • length

        🡅  🡇     🗕  🗗  🗖
        public final int length()
        Returns the length of the field public final String str. The length is the number of 16-bit chars in the sequence.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Specified by:
        length in interface java.lang.CharSequence
        Returns:
        the number of chars in this.str
        Code:
        Exact Method Body:
         return str.length();
        
      • subSequence

        🡅  🡇     🗕  🗗  🗖
        public final java.lang.CharSequence subSequence​(int start,
                                                        int end)
        Returns a CharSequence that is a subsequence of the public final String str field of 'this' HTMLNode. The subsequence starts with the char value at the specified index and ends with the char value at index end - 1. The length (in chars) of the returned sequence is end - start, so if start == end then an empty sequence is returned.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Specified by:
        subSequence in interface java.lang.CharSequence
        Parameters:
        start - The start index, inclusive
        end - The end index, exclusive
        Returns:
        The specified subsequence
        Code:
        Exact Method Body:
         return str.substring(start, end);
        
      • isCommentNode

        🡅  🡇     🗕  🗗  🗖
        public boolean isCommentNode()
        This method will return TRUE for any instance of 'CommentNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of CommentNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class CommentNode overrides this method, and returns TRUE.
        See Also:
        CommentNode.isCommentNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass CommentNode, where it shall return
         // TRUE.  Neither class TextNode, nor class TagNode will over-ride this method.
        
         return false;
        
      • isTextNode

        🡅  🡇     🗕  🗗  🗖
        public boolean isTextNode()
        This method will return TRUE for any instance of 'TextNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of TextNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class TextNode overrides this method, and returns TRUE.
        See Also:
        TextNode.isTextNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass CommentNode, where it shall return
         // TRUE.  Neither class TextNode, nor class TagNode will over-ride this method.
        
         return false;
        
      • isTagNode

        🡅  🡇     🗕  🗗  🗖
        public boolean isTagNode()
        This method will return TRUE for any instance of 'TagNode'.

        The purpose of this method is to efficiently return TRUE whenever an instance of 'HTMLNode' should be checked to see if it is actually an inherited instance of TagNode. This is (marginally) more efficient than using the Java 'instanceof' operator.
        Returns:
        This (top-level inheritance-tree) method always returns FALSE. The '.java' file for class TagNode overrides this method, and returns TRUE.
        See Also:
        TagNode.isTagNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode, where it shall return
         // TRUE.  Neither class TextNode, nor class CommentNode will over-ride this method.
        
         return false;
        
      • openTagPWA

        🡅  🡇     🗕  🗗  🗖
        public TagNode openTagPWA()
        PWA: Open Tag, 'Possibly With Attributes'

        This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.

        The optimization is made possible given the following characteristics of this class:

        • This class - class 'HTMLNode' - is the 'abstract' parent class of all three node types - TagNode, TextNode and CommentNode
        • This method shall always return null, and only class 'TagNode' overrides this method to return something else.
        • Most concrete actual instance of this abstract-parent method will immediately return null when queried inside of a search-loop - except instances that are TagNode instances that actually have Attribute Key-Value Pairs.


        This makes the process of finding TagNode's having a particular attribute much more efficient.

        The purpose of this method is to quickly return a node that has been cast to an instance of TagNode, if it is, indeed, a TagNode and if it has an internal-String long-enough to possibly contain attributes (inner-tags).
        Returns:
        This method shall always return null, unless this method has been overridden by a sub-class. Only TagNode overrides this method, and this method will return 'this' instance, if and only if the following conditions hold:

        • If 'this' instance is a TagNode
        • If 'this' instance' isClosing field is false.
        • If the 'length()' of the str field is at least equal to the 'length()' of the tok field plus 4.


        AGAIN: These conditions should imply that 'this' is not only an instance of the TagNode subclass of HTMLNode, but furthermore that this is an Opening-Tag, whose internal String is long enough to "Possibly Contain Attributes" (hence the name-acronym).

        This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs.
        See Also:
        TagNode.openTagPWA(), isOpenTagPWA()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns null.
         // In 'TagNode' this method returns true based on the 'isClosing' field from that class,
         // and the length of the 'str' field from this class.
        
         return null;
        
      • openTag

        🡅  🡇     🗕  🗗  🗖
        public TagNode openTag()
        This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a Java Stream filter-method.

        The optimization is made possible given the following characteristics of this class:

        • This class - class 'HTMLNode' - is the 'abstract' parent class of all three node types - TagNode, TextNode and CommentNode
        • This method shall always return null, and only class 'TagNode' overrides this method to return something else.
        • Most concrete actual instance of this abstract-parent method will immediately return null when queried inside of a search-loop - except instances that are TagNode instances that are Opening-Tags, rather than Closing-Tags.
        Returns:
        This method shall always return null, unless this method has been overridden by a sub-class. Only TagNode overrides this method, and this method will return 'this' instance, if and only if the following conditions hold:

        • If 'this' instance is a TagNode
        • If 'this' instance' isClosing field is false.
        • If the 'length()' of the str field is at least equal to the 'length()' of the tok field plus 4.


        AGAIN: These conditions should imply that 'this' is not only an instance of the TagNode subclass of HTMLNode, but furthermore that this is an Opening-Tag, whose internal String is long enough to "Possibly Contain Attributes" (hence the name-acronym).

        This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs.

        When the overridden TagNode sub-class returns a non-null result, that value will always be equal to 'this'
        See Also:
        TagNode.openTag()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns null.
         // In 'TagNode' this method returns true based on that class 'isClosing' field.
        
         return null;
        
      • isOpenTagPWA

        🡅  🡇     🗕  🗗  🗖
        public boolean isOpenTagPWA()
        PWA: Open Tag, 'Possibly With Attributes'

        This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.

        The optimization is made possible given the following characteristics of this class:

        • This class - class 'HTMLNode' - is the 'abstract' parent class of all three node types - TagNode, TextNode and CommentNode
        • This method shall always return false, and only class 'TagNode' overrides this method to return something else.
        • Most concrete actual instance of this abstract-parent method will immediately return false when queried inside of a search-loop - except instances that are TagNode instances that actually have Attribute Key-Value Pairs.


        This method will function in an almost identical fashion to openTagPWA(), while having the subtle difference that its return-value is a 'boolean', rather than an instance of the TagNode itself. This can facilitate the use of this method inside filter calls inside of Java Stream, for instance.

        Stream Invocation-Stack:
        In the example below, a Web-Page was copied to the clip-board, and then saved to a file on the File-System. The content of the flat-file 'ChatGPT.Transcript.html is just some HTML that was block-copied from the well-known Web-Site, having that name.

        To simplify the HTML that was cut and pasted, removing all of the HTML Attributes that were added will improve readability of the page itself. However, to preserve the text-arrangements inside the CSS-Tag's present on HTML Tables, Ordered-Lists, and UnOrdered-Lists, those Tags CSS Attributes have to be preserved!

        This method, 'isOpenTagPWA()' is invoked inside of a Java Stream Invocation-Stack just to make sure that the nodes passed to Attributes.removeAll are only nodes that are TagNode instances that potentially could contain attributes, and are not nodes inside of an HTML <UL>, <OL> or <TABLE> Tag-Pair

        Example:
        Vector<HTMLNode>    webPage = HTMLPage.getPageTokens("ChatGPT.Transcript.html", false);
        List<DotPair>       dpAll   = TagNodeFindInclusive.all(webPage, "ul", "ol", "table");
        
        int[] iArr = DPUtil
            .excludedToStream(dpAll, webPage.size(), true)
            .filter((int pos) -> webPage.elementAt(pos).isOpenTagPWA())
            .toArray();
        
        Attributes.removeAll(webPage, iArr);
        
        Returns:
        This method shall always return FALSE, unless it has been overriden by a subclass. Subclass TagNode overrides this, and will return TRUE if and only if the following conditions hold:

        • If 'this' instance is a TagNode
        • If 'this' instance' isClosing field is false.
        • If the 'length()' of the str field is at least equal to the 'length()' of the tok field plus 4.


        AGAIN: These conditions should imply that 'this' is not only an instance of the TagNode subclass of HTMLNode, but furthermore that this is an Opening-Tag, whose internal String is long enough to "Possibly Contain Attributes" (hence the name-acronym).

        This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs.
        See Also:
        TagNode.isOpenTagPWA(), openTagPWA()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns false.
         // In 'TagNode' this method returns TRUE based on the 'isClosing' field from that class,
         // and the length of the 'str' field from this class.
        
         return false;
        
      • isOpenTag

        🡅  🡇     🗕  🗗  🗖
        public boolean isOpenTag()
        This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a Java Stream filter-method.

        The optimization is made possible given the following characteristics of this class:

        • This class - class 'HTMLNode' - is the 'abstract' parent class of all three node types - TagNode, TextNode and CommentNode
        • This method shall always return false, and only class 'TagNode' overrides this method to return something else.
        • Most concrete actual instance of this abstract-parent method will immediately return false when queried inside of a search-loop - except instances that are TagNode instances that are Opening-Tags, rather than Closing-Tags.


        This method will function almost identically to openTag(), with the subtle difference being that it returns a TRUE / FALSE boolean, instead of an instance-reference.
        Returns:
        This method shall always return FALSE, unless it has been overriden by a subclass. Subclass TagNode overrides this, and will return TRUE if and only if the following conditions hold:

        • If 'this' instance is a TagNode
        • If 'this' instance' isClosing field is false.


        AGAIN: These conditions should imply that 'this' is not only an instance of the TagNode subclass of HTMLNode, but furthermore that this is an opening tag, rather than a closing.
        See Also:
        TagNode.isOpenTag(), openTag()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode.
         // For instances of inheriting class TextNode and CommentNode, this always returns FALSE.
         // In 'TagNode' this method returns TRUE based on that class 'isClosing' field.
        
         return false;
        
      • ifTagNode

        🡅  🡇     🗕  🗗  🗖
        public TagNode ifTagNode()
        Loop Optimization Method

        When this method is invoked on an instance of sub-class TagNode, this method produces 'this' instance.
        Returns:
        This method is overriden by sub-class TagNode, and in that class, this method simply returns 'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null.
        See Also:
        TagNode.ifTagNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TagNode, where it shall return
         // 'this'.  Neither class TextNode, nor class CommentNode will over-ride this method.
        
         return null;
        
      • ifTextNode

        🡅  🡇     🗕  🗗  🗖
        public TextNode ifTextNode()
        Loop Optimization Method

        When this method is invoked on an instance of sub-class TextNode, this method produces 'this' instance.
        Returns:
        This method is overriden by sub-class TextNode, and in that class, this method simply returns 'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null.
        See Also:
        TextNode.ifTextNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass TextNode, where it shall return
         // 'this'.  Neither class TagNode, nor class CommentNode will over-ride this method.
        
         return null;
        
      • ifCommentNode

        🡅  🡇     🗕  🗗  🗖
        public CommentNode ifCommentNode()
        Loop Optimization Method

        When this method is invoked on an instance of sub-class CommentNode, this method produces 'this' instance.
        Returns:
        This method is overriden by sub-class CommentNode, and in that class, this method simply returns 'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null.
        See Also:
        CommentNode.ifCommentNode()
        Code:
        Exact Method Body:
         // This method will *only* be over-ridden by subclass CommentNode, where it shall return
         // 'this'.  Neither class TagNode, nor class TextNode will over-ride this method.
        
         return null;
        
      • asTagNode

        🡅  🡇     🗕  🗗  🗖
        public final TagNode asTagNode()
        Compile-Time "Syntactic Sugar" for casting an HTMLNode to a TagNode.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Returns:
        Simply returns 'this' instance. (Note that the method Class.cast(Object) doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis)
        Throws:
        java.lang.ClassCastException - IMPORTANT: If the instance is a TextNode or CommentNode, rather than a TagNode, then (naturally) the JVM will immediately throw a casting exception.
        Code:
        Exact Method Body:
         return TagNode.class.cast(this);
        
      • asTextNode

        🡅  🡇     🗕  🗗  🗖
        public final TextNode asTextNode()
        Compile-Time "Syntactic Sugar" for casting an HTMLNode to a TextNode.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Returns:
        Simply returns 'this' instance. (Note that the method Class.cast(Object) doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis)
        Throws:
        java.lang.ClassCastException - IMPORTANT: If the instance is a TagNode or CommentNode, rather than a TextNode, then (naturally) the JVM will immediately throw a casting exception.
        Code:
        Exact Method Body:
         return TextNode.class.cast(this);
        
      • asCommentNode

        🡅     🗕  🗗  🗖
        public final CommentNode asCommentNode()
        Compile-Time "Syntactic Sugar" for casting an HTMLNode to a CommentNode.

        Final Method:
        This method is final, and cannot be modified by sub-classes.
        Returns:
        Simply returns 'this' instance. (Note that the method Class.cast(Object) doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis)
        Throws:
        java.lang.ClassCastException - IMPORTANT: If the instance is a TagNode or TextNode, rather than a CommentNode, then (naturally) the JVM will immediately throw a casting exception.
        Code:
        Exact Method Body:
         return CommentNode.class.cast(this);