Package Torello.HTML
Class HTMLNode
- java.lang.Object
-
- Torello.HTML.HTMLNode
-
- All Implemented Interfaces:
java.io.Serializable,java.lang.CharSequence,java.lang.Cloneable
- Direct Known Subclasses:
CommentNode,TagNode,TextNode
public abstract class HTMLNode extends java.lang.Object implements java.lang.CharSequence, java.io.Serializable, java.lang.Cloneable
Top-Leval, Ancestor, (Abstract) Data-Class:
This class is the Ancestor of the Primary Java-HTML Data-Classes in this Package:TagNode,TextNodeandCommentNode
Inheritance Tree Diagram:
Below is the inheritance diagram (with fields) of the three concrete-classes that extend theabstractclassHTMLNode:
This class is mostly a wrapper for classjava.lang.String, and serves as the abstract parent of the three types of HTML elements offered by the Java HTML Library.
This abstract class is an "immutable" class - meaning that the contents of anHTMLNodecan never change. Roughly 80% of instances ofHTMLNodewould never be changed - since they don't use Attributes. In those cases, even having multiple instances such tags (for instance:BR, HR, H1, H2, H3, B, or I) is unnecessary.
When parsing HTML Elements that are 'Element-Only' (without any Attributes / Inner-Tags), the parser will return singleton-instances of the classTagNodeto avoid generating extra-amounts of references in Java's Memory Heap.
This works out fine, since these nodes are immutable, and therefore resuing the same node references in differentVector'swon't have any side-effects. The classHTMLPagehas a suite of methods that worry about instantiating newHTMLNode's.
Light-Weight, Immutable Data-Class
All three of the standard classes that inherit from the abstract classHTMLNodeare very light-weight, and do not contain any internal state other than the internalStringwhich represents theTagNode, CommentNode, or TextNodeitself.
When it is said that 100% ofHTMLNode'sare Immutable data-classes, it is similar to saying that these classes are not a lot more than 'extension-class' for the Java classjava.lang.String- which also happens to be immutable.
The flagship data-class of this JAR-Library,TagNode, has a number of 'getter' methods. However each and every one of the 'setter' methods actually returns a new instance ofTagNode. This is quite similar to have Java handles changing or updating the contents of aString.
The classesTextNodeandCommentNodedo not have any 'setter' methods at all. Also, all three concrete-subclasses ofHTMLNodehaver a constructor that accepts a simplejava.lang.String- See Also:
TagNode,TextNode,CommentNode, Serialized Form
Hi-Lited Source-Code:- View Here: Torello/HTML/HTMLNode.java
- Open New Browser-Tab: Torello/HTML/HTMLNode.java
File Size: 20,463 Bytes Line Count: 484 '\n' Characters Found
-
-
Field Summary
Fields Modifier and Type Field static longserialVersionUIDStringstr
-
Constructor Summary
Constructors Modifier Constructor protectedHTMLNode(String s)
-
Method Summary
'instanceof' Operator-Replacement Methods Modifier and Type Method booleanisCommentNode()booleanisTagNode()booleanisTextNode()Simple & Overloaded Type-Tests Modifier and Type Method CommentNodeifCommentNode()TagNodeifTagNode()TextNodeifTextNode()Simple Casting Syntactic Sugar Modifier and Type Method CommentNodeasCommentNode()TagNodeasTagNode()TextNodeasTextNode()TagNodeLoop Optimization MethodsModifier and Type Method booleanisOpenTag()booleanisOpenTagPWA()TagNodeopenTag()TagNodeopenTagPWA()Methods: interface java.lang.CharSequence Modifier and Type Method charcharAt(int index)intlength()CharSequencesubSequence(int start, int end)StringtoString()Methods: class java.lang.Object Modifier and Type Method booleanequals(Object o)inthashCode()Methods: interface java.lang.Cloneable Modifier and Type Method abstract HTMLNodeclone()
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
This fulfils the SerialVersion UID requirement for all classes that implement Java'sinterface java.io.Serializable. Using theSerializableImplementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
public static final long serialVersionUID = 1;
-
str
public final java.lang.String str
This is an immutable field. It stores the complete contents of an HTML node. It can be either the "textual contents" of an HTMLTagNode, or the text (directly) of the text-inside of an HTML page!
FOR INSTANCE:- A subclass of HTMLNode -
TagNode- could contain the String <SPAN STYLE="CSS INFO">" inside thisstr fieldhere. - The other sub-class of HTML -
TextNode- could contain theString"This is a news-page from www.Gov.CN Chinese Government Portal." inside thisstr fieldhere.
NOTE: Because sub-classes ofHTMLNodeare all immutable, generally, if you wish to change the contents of an HTML page, a programmer is required to create new nodes, rather than changing these fields.- Code:
- Exact Field Declaration Expression:
public final String str;
- A subclass of HTMLNode -
-
-
Constructor Detail
-
HTMLNode
protected HTMLNode(java.lang.String s)
Constructor that builds a newHTMLNode- Parameters:
s- A valid string of an HTML element.
-
-
Method Detail
-
hashCode
public final int hashCode()
Java's hash-code requirement.
Final Method:
This method is final, and cannot be modified by sub-classes.- Overrides:
hashCodein classjava.lang.Object- Returns:
- A hash-code that may be used when storing this node in a java sorted-collection.
- Code:
- Exact Method Body:
return this.str.hashCode();
-
equals
public final boolean equals(java.lang.Object o)
Java'spublic boolean equals(Object o)requirements.
Final Method:
This method is final, and cannot be modified by sub-classes.- Overrides:
equalsin classjava.lang.Object- Parameters:
o- This may be any Java Object, but only ones of'this'type whose internal-values are identical will cause this method to returnTRUE.- Returns:
TRUEIf'this'equals another objectHTMLNode.- Code:
- Exact Method Body:
if (o == null) return false; if (o == this) return true; if (! this.getClass().equals(o.getClass())) return false; return ((HTMLNode) o).str.equals(this.str);
-
clone
-
toString
public final java.lang.String toString()
Java'stoString()requirement.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
toStringin interfacejava.lang.CharSequence- Overrides:
toStringin classjava.lang.Object- Returns:
- A
String-representation of thisHTMLNode. - Code:
- Exact Method Body:
return this.str;
-
charAt
public final char charAt(int index)
Returns the char value at the specified index of the field:public final String str. An index ranges from'0'(zero) toHTMLNode.str.length() - 1.The firstcharvalue of the sequence is at index zero, the next at index one, and so on, as for array indexing.
NOTE: If thecharvalue specified by the index is a surrogate, the surrogate value is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
charAtin interfacejava.lang.CharSequence- Parameters:
index- The index of thecharvalue to be returned- Returns:
- The specified
charvalue - Code:
- Exact Method Body:
return str.charAt(index);
-
length
public final int length()
Returns the length of the fieldpublic final String str. The length is the number of 16-bit chars in the sequence.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
lengthin interfacejava.lang.CharSequence- Returns:
- the number of
charsinthis.str - Code:
- Exact Method Body:
return str.length();
-
subSequence
public final java.lang.CharSequence subSequence(int start, int end)
Returns aCharSequencethat is a subsequence of thepublic final String strfield of'this' HTMLNode. The subsequence starts with thecharvalue at the specified index and ends with thecharvalue at indexend - 1.The length (in chars) of the returned sequence isend - start, so ifstart == endthen an empty sequence is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
subSequencein interfacejava.lang.CharSequence- Parameters:
start- The start index, inclusiveend- The end index, exclusive- Returns:
- The specified subsequence
- Code:
- Exact Method Body:
return str.substring(start, end);
-
isCommentNode
public boolean isCommentNode()
This method will returnTRUEfor any instance of'CommentNode'.
The purpose of this method is to efficiently returnTRUEwhenever an instance of'HTMLNode'should be checked to see if it is actually an inherited instance ofCommentNode. This is (marginally) more efficient than using the Java'instanceof'operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE. The'.java'file forclass CommentNodeoverrides this method, and returnsTRUE. - See Also:
CommentNode.isCommentNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // TRUE. Neither class TextNode, nor class TagNode will over-ride this method. return false;
-
isTextNode
public boolean isTextNode()
This method will returnTRUEfor any instance of'TextNode'.
The purpose of this method is to efficiently returnTRUEwhenever an instance of'HTMLNode'should be checked to see if it is actually an inherited instance ofTextNode. This is (marginally) more efficient than using the Java'instanceof'operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE. The'.java'file forclass TextNodeoverrides this method, and returnsTRUE. - See Also:
TextNode.isTextNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // TRUE. Neither class TextNode, nor class TagNode will over-ride this method. return false;
-
isTagNode
public boolean isTagNode()
This method will returnTRUEfor any instance of'TagNode'.
The purpose of this method is to efficiently returnTRUEwhenever an instance of'HTMLNode'should be checked to see if it is actually an inherited instance ofTagNode. This is (marginally) more efficient than using the Java'instanceof'operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE. The'.java'file forclass TagNodeoverrides this method, and returnsTRUE. - See Also:
TagNode.isTagNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode, where it shall return // TRUE. Neither class TextNode, nor class CommentNode will over-ride this method. return false;
-
openTagPWA
public TagNode openTagPWA()
PWA: Open Tag, 'Possibly With Attributes'
This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'- is the'abstract'parent class of all three node types -TagNode,TextNodeandCommentNode - This method shall always return
null, and only class'TagNode'overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
nullwhen queried inside of a search-loop - except instances that areTagNodeinstances that actually have Attribute Key-Value Pairs.
This makes the process of findingTagNode'shaving a particular attribute much more efficient.
The purpose of this method is to quickly return a node that has been cast to an instance ofTagNode, if it is, indeed, aTagNodeand if it has an internal-Stringlong-enough to possibly contain attributes (inner-tags).- Returns:
- This method shall always return null, unless this method has been overridden by a
sub-class. Only
TagNodeoverrides this method, and this method will return'this'instance, if and only if the following conditions hold:- ① If
'this'instance is aTagNode - ② If
'this'instance'isClosingfield is false. - ③ If the
'length()'of thestrfield is at least equal to the'length()'of thetokfield plus 4.
These conditions should imply that'this'is not only an instance of theTagNodesubclass ofHTMLNode, but furthermore that this is an Opening-Tag, whose internalStringis long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs. - ① If
- See Also:
TagNode.openTagPWA(),isOpenTagPWA()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns null. // In 'TagNode' this method returns true based on the 'isClosing' field from that class, // and the length of the 'str' field from this class. return null;
- This class - class
-
openTag
public TagNode openTag()
This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a JavaStreamfilter-method.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'- is the'abstract'parent class of all three node types -TagNode,TextNodeandCommentNode - This method shall always return
null, and only class'TagNode'overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
nullwhen queried inside of a search-loop - except instances that areTagNodeinstances that are Opening-Tags, rather than Closing-Tags.
- Returns:
- This method shall always return null, unless this method has been overridden by a
sub-class. Only
TagNodeoverrides this method, and this method will return'this'instance, if and only if the following conditions hold:- ① If
'this'instance is aTagNode - ② If
'this'instance'isClosingfield is false. - ③ If the
'length()'of thestrfield is at least equal to the'length()'of thetokfield plus 4.
These conditions should imply that'this'is not only an instance of theTagNodesubclass ofHTMLNode, but furthermore that this is an Opening-Tag, whose internalStringis long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs.
When the overriddenTagNodesub-class returns a non-null result, that value will always be equal to'this' - ① If
- See Also:
TagNode.openTag()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns null. // In 'TagNode' this method returns true based on that class 'isClosing' field. return null;
- This class - class
-
isOpenTagPWA
public boolean isOpenTagPWA()
PWA: Open Tag, 'Possibly With Attributes'
This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'- is the'abstract'parent class of all three node types -TagNode,TextNodeandCommentNode - This method shall always return
false, and only class'TagNode'overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
falsewhen queried inside of a search-loop - except instances that areTagNodeinstances that actually have Attribute Key-Value Pairs.
This method will function in an almost identical fashion toopenTagPWA(), while having the subtle difference that its return-value is a'boolean', rather than an instance of theTagNodeitself. This can facilitate the use of this method insidefiltercalls inside of JavaStream, for instance.StreamInvocation-Stack:
In the example below, a Web-Page was copied to the clip-board, and then saved to a file on the File-System. The content of the flat-file'ChatGPT.Transcript.htmlis just some HTML that was block-copied from the well-known Web-Site, having that name.
To simplify the HTML that was cut and pasted, removing all of the HTML Attributes that were added will improve readability of the page itself. However, to preserve the text-arrangements inside the CSS-Tag's present on HTML Tables, Ordered-Lists, and UnOrdered-Lists, those Tags CSS Attributes have to be preserved!
This method,'isOpenTagPWA()'is invoked inside of a JavaStreamInvocation-Stack just to make sure that the nodes passed toAttributes.removeAllare only nodes that areTagNodeinstances that potentially could contain attributes, and are not nodes inside of an HTML<UL>, <OL>or<TABLE>Tag-Pair
Example:
Vector<HTMLNode> webPage = HTMLPage.getPageTokens("ChatGPT.Transcript.html", false); List<DotPair> dpAll = TagNodeFindInclusive.all(webPage, "ul", "ol", "table"); int[] iArr = DPUtil .excludedToStream(dpAll, webPage.size(), true) .filter((int pos) -> webPage.elementAt(pos).isOpenTagPWA()) .toArray(); Attributes.removeAll(webPage, iArr);
- Returns:
- This method shall always return
FALSE, unless it has been overriden by a subclass. SubclassTagNodeoverrides this, and will returnTRUEif and only if the following conditions hold:- ① If
'this'instance is aTagNode - ② If
'this'instance'isClosingfield is false. - ③ If the
'length()'of thestrfield is at least equal to the'length()'of thetokfield plus 4.
These conditions should imply that'this'is not only an instance of theTagNodesubclass ofHTMLNode, but furthermore that this is an Opening-Tag, whose internalStringis long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs. - ① If
- See Also:
TagNode.isOpenTagPWA(),openTagPWA()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns false. // In 'TagNode' this method returns TRUE based on the 'isClosing' field from that class, // and the length of the 'str' field from this class. return false;
- This class - class
-
isOpenTag
public boolean isOpenTag()
This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a JavaStreamfilter-method.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'- is the'abstract'parent class of all three node types -TagNode,TextNodeandCommentNode - This method shall always return
false, and only class'TagNode'overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
falsewhen queried inside of a search-loop - except instances that areTagNodeinstances that are Opening-Tags, rather than Closing-Tags.
This method will function almost identically toopenTag(), with the subtle difference being that it returns aTRUE / FALSEboolean, instead of an instance-reference.- Returns:
- This method shall always return
FALSE, unless it has been overriden by a subclass. SubclassTagNodeoverrides this, and will returnTRUEif and only if the following conditions hold:- ① If
'this'instance is aTagNode - ② If
'this'instance'isClosingfield is false.
These conditions should imply that'this'is not only an instance of theTagNodesubclass ofHTMLNode, but furthermore that this is an opening tag, rather than a closing. - ① If
- See Also:
TagNode.isOpenTag(),openTag()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns FALSE. // In 'TagNode' this method returns TRUE based on that class 'isClosing' field. return false;
- This class - class
-
ifTagNode
public TagNode ifTagNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classTagNode, this method produces'this'instance.- Returns:
- This method is overriden by sub-class
TagNode, and in that class, this method simply returns'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null. - See Also:
TagNode.ifTagNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode, where it shall return // 'this'. Neither class TextNode, nor class CommentNode will over-ride this method. return null;
-
ifTextNode
public TextNode ifTextNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classTextNode, this method produces'this'instance.- Returns:
- This method is overriden by sub-class
TextNode, and in that class, this method simply returns'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null. - See Also:
TextNode.ifTextNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TextNode, where it shall return // 'this'. Neither class TagNode, nor class CommentNode will over-ride this method. return null;
-
ifCommentNode
public CommentNode ifCommentNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classCommentNode, this method produces'this'instance.- Returns:
- This method is overriden by sub-class
CommentNode, and in that class, this method simply returns'this'. The other sub-classes of this (abstract) class inherit this version of this method, and therefore return null. - See Also:
CommentNode.ifCommentNode()- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // 'this'. Neither class TagNode, nor class TextNode will over-ride this method. return null;
-
asTagNode
public final TagNode asTagNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNodeto aTagNode.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'instance. (Note that the methodClass.cast(Object)doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException-
Important:
If the instance is aTextNodeorCommentNode, rather than aTagNode, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return TagNode.class.cast(this);
-
asTextNode
public final TextNode asTextNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNodeto aTextNode.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'instance. (Note that the methodClass.cast(Object)doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException-
Important:
If the instance is aTagNodeorCommentNode, rather than aTextNode, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return TextNode.class.cast(this);
-
asCommentNode
public final CommentNode asCommentNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNodeto aCommentNode.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'instance. (Note that the methodClass.cast(Object)doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException-
Important:
If the instance is aTagNodeorTextNode, rather than aCommentNode, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return CommentNode.class.cast(this);
-
-