Package Torello.HTML
Class HTMLNode
- java.lang.Object
-
- Torello.HTML.HTMLNode
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.CharSequence
,java.lang.Cloneable
- Direct Known Subclasses:
CommentNode
,TagNode
,TextNode
public abstract class HTMLNode extends java.lang.Object implements java.lang.CharSequence, java.io.Serializable, java.lang.Cloneable
PRIMARY ANCESTOR DATA-CLASS:
This class is the Ancestor of the Primary Java-HTML Data-Classes in this Package:TagNode
,TextNode
andCommentNode
Inheritance Tree Diagram:
Below is the inheritance diagram (with fields) of the three concrete-classes that extend theabstract
classHTMLNode:
This class is mostly a wrapper for classjava.lang.String
, and serves as the abstract parent of the three types of HTML elements offered by the Java HTML Library.
This abstract class is an "immutable" class - meaning that the contents of anHTMLNode
can never change. Roughly 80% of instances ofHTMLNode
would never be changed - since they don't use Attributes. In those cases, even having multiple instances such tags (for instance:BR, HR, H1, H2, H3, B, or I
) is unnecessary.
When parsing HTML Elements that are 'Element-Only' (without any Attributes / Inner-Tags), the parser will return singleton-instances of the classTagNode
to avoid generating extra-amounts of references in Java's Memory Heap.
This works out fine, since these nodes are immutable, and therefore resuing the same node references in differentVector's
won't have any side-effects. The classHTMLPage
has a suite of methods that worry about instantiating newHTMLNode's
.
Light-Weight, Immutable Data-Class
All three of the standard classes that inherit from the abstract classHTMLNode
are very light-weight, and do not contain any internal state other than the internalString
which represents theTagNode, CommentNode, or TextNode
itself.
When it is said that 100% ofHTMLNode's
are Immutable data-classes, it is similar to saying that these classes are not a lot more than 'extension-class' for the Java classjava.lang.String
- which also happens to be immutable.
The flagship data-class of this JAR-Library,TagNode
, has a number of 'getter' methods. However each and every one of the 'setter' methods actually returns a new instance ofTagNode
. This is quite similar to have Java handles changing or updating the contents of aString
.
The classesTextNode
andCommentNode
do not have any 'setter' methods at all. Also, all three concrete-subclasses ofHTMLNode
haver a constructor that accepts a simplejava.lang.String
- See Also:
TagNode
,TextNode
,CommentNode
, Serialized Form
Hi-Lited Source-Code:- View Here: Torello/HTML/HTMLNode.java
- Open New Browser-Tab: Torello/HTML/HTMLNode.java
File Size: 20,366 Bytes Line Count: 478 '\n' Characters Found
-
-
Field Summary
Fields Modifier and Type Field static long
serialVersionUID
String
str
-
Constructor Summary
Constructors Modifier Constructor protected
HTMLNode(String s)
-
Method Summary
'instanceof' Operator-Replacement Methods Modifier and Type Method boolean
isCommentNode()
boolean
isTagNode()
boolean
isTextNode()
Simple & Overloaded Type-Tests Modifier and Type Method CommentNode
ifCommentNode()
TagNode
ifTagNode()
TextNode
ifTextNode()
Simple Casting Syntactic Sugar Modifier and Type Method CommentNode
asCommentNode()
TagNode
asTagNode()
TextNode
asTextNode()
TagNode
Loop Optimization MethodsModifier and Type Method boolean
isOpenTag()
boolean
isOpenTagPWA()
TagNode
openTag()
TagNode
openTagPWA()
Methods: interface java.lang.CharSequence Modifier and Type Method char
charAt(int index)
int
length()
CharSequence
subSequence(int start, int end)
String
toString()
Methods: class java.lang.Object Modifier and Type Method boolean
equals(Object o)
int
hashCode()
Methods: interface java.lang.Cloneable Modifier and Type Method abstract HTMLNode
clone()
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
This fulfils the SerialVersion UID requirement for all classes that implement Java'sinterface java.io.Serializable
. Using theSerializable
Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
public static final long serialVersionUID = 1;
-
str
public final java.lang.String str
This is an immutable field. It stores the complete contents of an HTML node. It can be either the "textual contents" of an HTMLTagNode
, or the text (directly) of the text-inside of an HTML page!
FOR INSTANCE:- A subclass of HTMLNode -
TagNode
- could contain the String <SPAN STYLE="CSS INFO">" inside thisstr field
here. - The other sub-class of HTML -
TextNode
- could contain theString
"This is a news-page from www.Gov.CN Chinese Government Portal." inside thisstr field
here.
NOTE: Because sub-classes ofHTMLNode
are all immutable, generally, if you wish to change the contents of an HTML page, a programmer is required to create new nodes, rather than changing these fields. - A subclass of HTMLNode -
-
-
Constructor Detail
-
HTMLNode
protected HTMLNode(java.lang.String s)
Constructor that builds a newHTMLNode
- Parameters:
s
- A valid string of an HTML element.
-
-
Method Detail
-
hashCode
public final int hashCode()
Java's hash-code requirement.
Final Method:
This method is final, and cannot be modified by sub-classes.- Overrides:
hashCode
in classjava.lang.Object
- Returns:
- A hash-code that may be used when storing this node in a java sorted-collection.
- Code:
- Exact Method Body:
return this.str.hashCode();
-
equals
public final boolean equals(java.lang.Object o)
Java'spublic boolean equals(Object o)
requirements.
Final Method:
This method is final, and cannot be modified by sub-classes.- Overrides:
equals
in classjava.lang.Object
- Parameters:
o
- This may be any Java Object, but only ones of'this'
type whose internal-values are identical will cause this method to returnTRUE
.- Returns:
TRUE
If'this'
equals another objectHTMLNode.
- Code:
- Exact Method Body:
if (o == null) return false; if (o == this) return true; if (! this.getClass().equals(o.getClass())) return false; return ((HTMLNode) o).str.equals(this.str);
-
clone
-
toString
public final java.lang.String toString()
Java'stoString()
requirement.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
toString
in interfacejava.lang.CharSequence
- Overrides:
toString
in classjava.lang.Object
- Returns:
- A
String
-representation of thisHTMLNode.
- Code:
- Exact Method Body:
return this.str;
-
charAt
public final char charAt(int index)
Returns the char value at the specified index of the field:public final String str
. An index ranges from'0'
(zero) toHTMLNode.str.length() - 1.
The firstchar
value of the sequence is at index zero, the next at index one, and so on, as for array indexing.
NOTE: If thechar
value specified by the index is a surrogate, the surrogate value is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
charAt
in interfacejava.lang.CharSequence
- Parameters:
index
- The index of thechar
value to be returned- Returns:
- The specified
char
value - Code:
- Exact Method Body:
return str.charAt(index);
-
length
public final int length()
Returns the length of the fieldpublic final String str
. The length is the number of 16-bit chars in the sequence.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
length
in interfacejava.lang.CharSequence
- Returns:
- the number of
chars
inthis.str
- Code:
- Exact Method Body:
return str.length();
-
subSequence
public final java.lang.CharSequence subSequence(int start, int end)
Returns aCharSequence
that is a subsequence of thepublic final String str
field of'this' HTMLNode
. The subsequence starts with thechar
value at the specified index and ends with thechar
value at indexend - 1.
The length (in chars) of the returned sequence isend - start
, so ifstart == end
then an empty sequence is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
subSequence
in interfacejava.lang.CharSequence
- Parameters:
start
- The start index, inclusiveend
- The end index, exclusive- Returns:
- The specified subsequence
- Code:
- Exact Method Body:
return str.substring(start, end);
-
isCommentNode
public boolean isCommentNode()
This method will returnTRUE
for any instance of'CommentNode'
.
The purpose of this method is to efficiently returnTRUE
whenever an instance of'HTMLNode'
should be checked to see if it is actually an inherited instance ofCommentNode
. This is (marginally) more efficient than using the Java'instanceof'
operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE
. The'.java'
file forclass CommentNode
overrides this method, and returnsTRUE
. - See Also:
CommentNode.isCommentNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // TRUE. Neither class TextNode, nor class TagNode will over-ride this method. return false;
-
isTextNode
public boolean isTextNode()
This method will returnTRUE
for any instance of'TextNode'
.
The purpose of this method is to efficiently returnTRUE
whenever an instance of'HTMLNode'
should be checked to see if it is actually an inherited instance ofTextNode
. This is (marginally) more efficient than using the Java'instanceof'
operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE
. The'.java'
file forclass TextNode
overrides this method, and returnsTRUE
. - See Also:
TextNode.isTextNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // TRUE. Neither class TextNode, nor class TagNode will over-ride this method. return false;
-
isTagNode
public boolean isTagNode()
This method will returnTRUE
for any instance of'TagNode'
.
The purpose of this method is to efficiently returnTRUE
whenever an instance of'HTMLNode'
should be checked to see if it is actually an inherited instance ofTagNode
. This is (marginally) more efficient than using the Java'instanceof'
operator.- Returns:
- This (top-level inheritance-tree) method always returns
FALSE
. The'.java'
file forclass TagNode
overrides this method, and returnsTRUE
. - See Also:
TagNode.isTagNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode, where it shall return // TRUE. Neither class TextNode, nor class CommentNode will over-ride this method. return false;
-
openTagPWA
public TagNode openTagPWA()
PWA: Open Tag, 'Possibly With Attributes'
This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'
- is the'abstract'
parent class of all three node types -TagNode
,TextNode
andCommentNode
- This method shall always return
null
, and only class'TagNode'
overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
null
when queried inside of a search-loop - except instances that areTagNode
instances that actually have Attribute Key-Value Pairs.
This makes the process of findingTagNode's
having a particular attribute much more efficient.
The purpose of this method is to quickly return a node that has been cast to an instance ofTagNode
, if it is, indeed, aTagNode
and if it has an internal-String
long-enough to possibly contain attributes (inner-tags).- Returns:
- This method shall always return null, unless this method has been overridden by a
sub-class. Only
TagNode
overrides this method, and this method will return'this'
instance, if and only if the following conditions hold:- ① If
'this'
instance is aTagNode
- ② If
'this'
instance'isClosing
field is false. - ③ If the
'length()'
of thestr
field is at least equal to the'length()'
of thetok
field plus 4.
AGAIN: These conditions should imply that'this'
is not only an instance of theTagNode
subclass ofHTMLNode
, but furthermore that this is an Opening-Tag, whose internalString
is long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs. - ① If
- See Also:
TagNode.openTagPWA()
,isOpenTagPWA()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns null. // In 'TagNode' this method returns true based on the 'isClosing' field from that class, // and the length of the 'str' field from this class. return null;
- This class - class
-
openTag
public TagNode openTag()
This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a JavaStream
filter-method.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'
- is the'abstract'
parent class of all three node types -TagNode
,TextNode
andCommentNode
- This method shall always return
null
, and only class'TagNode'
overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
null
when queried inside of a search-loop - except instances that areTagNode
instances that are Opening-Tags, rather than Closing-Tags.
- Returns:
- This method shall always return null, unless this method has been overridden by a
sub-class. Only
TagNode
overrides this method, and this method will return'this'
instance, if and only if the following conditions hold:- ① If
'this'
instance is aTagNode
- ② If
'this'
instance'isClosing
field is false. - ③ If the
'length()'
of thestr
field is at least equal to the'length()'
of thetok
field plus 4.
AGAIN: These conditions should imply that'this'
is not only an instance of theTagNode
subclass ofHTMLNode
, but furthermore that this is an Opening-Tag, whose internalString
is long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs.
When the overriddenTagNode
sub-class returns a non-null result, that value will always be equal to'this'
- ① If
- See Also:
TagNode.openTag()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns null. // In 'TagNode' this method returns true based on that class 'isClosing' field. return null;
- This class - class
-
isOpenTagPWA
public boolean isOpenTagPWA()
PWA: Open Tag, 'Possibly With Attributes'
This method is offered as an optimization tool for quickly finding HTML Tag's which possess attributes using a search-loop.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'
- is the'abstract'
parent class of all three node types -TagNode
,TextNode
andCommentNode
- This method shall always return
false
, and only class'TagNode'
overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
false
when queried inside of a search-loop - except instances that areTagNode
instances that actually have Attribute Key-Value Pairs.
This method will function in an almost identical fashion toopenTagPWA()
, while having the subtle difference that its return-value is a'boolean'
, rather than an instance of theTagNode
itself. This can facilitate the use of this method insidefilter
calls inside of JavaStream
, for instance.Stream
Invocation-Stack:
In the example below, a Web-Page was copied to the clip-board, and then saved to a file on the File-System. The content of the flat-file'ChatGPT.Transcript.html
is just some HTML that was block-copied from the well-known Web-Site, having that name.
To simplify the HTML that was cut and pasted, removing all of the HTML Attributes that were added will improve readability of the page itself. However, to preserve the text-arrangements inside the CSS-Tag's present on HTML Tables, Ordered-Lists, and UnOrdered-Lists, those Tags CSS Attributes have to be preserved!
This method,'isOpenTagPWA()'
is invoked inside of a JavaStream
Invocation-Stack just to make sure that the nodes passed toAttributes.removeAll
are only nodes that areTagNode
instances that potentially could contain attributes, and are not nodes inside of an HTML<UL>, <OL>
or<TABLE>
Tag-Pair
Example:
Vector<HTMLNode> webPage = HTMLPage.getPageTokens("ChatGPT.Transcript.html", false); List<DotPair> dpAll = TagNodeFindInclusive.all(webPage, "ul", "ol", "table"); int[] iArr = DPUtil .excludedToStream(dpAll, webPage.size(), true) .filter((int pos) -> webPage.elementAt(pos).isOpenTagPWA()) .toArray(); Attributes.removeAll(webPage, iArr);
- Returns:
- This method shall always return
FALSE
, unless it has been overriden by a subclass. SubclassTagNode
overrides this, and will returnTRUE
if and only if the following conditions hold:- ① If
'this'
instance is aTagNode
- ② If
'this'
instance'isClosing
field is false. - ③ If the
'length()'
of thestr
field is at least equal to the'length()'
of thetok
field plus 4.
AGAIN: These conditions should imply that'this'
is not only an instance of theTagNode
subclass ofHTMLNode
, but furthermore that this is an Opening-Tag, whose internalString
is long enough to "Possibly Contain Attributes" (hence the name-acronym).
This is a much more efficient and elegant way to optimize code when searching for tags that have attribute / inner-tag key-value pairs. - ① If
- See Also:
TagNode.isOpenTagPWA()
,openTagPWA()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns false. // In 'TagNode' this method returns TRUE based on the 'isClosing' field from that class, // and the length of the 'str' field from this class. return false;
- This class - class
-
isOpenTag
public boolean isOpenTag()
This method is offered as an optimization tool for quickly finding Opening HTML Tag within a search loop or a JavaStream
filter-method.
The optimization is made possible given the following characteristics of this class:- This class - class
'HTMLNode'
- is the'abstract'
parent class of all three node types -TagNode
,TextNode
andCommentNode
- This method shall always return
false
, and only class'TagNode'
overrides this method to return something else. - Most concrete actual instance of this abstract-parent method will immediately return
false
when queried inside of a search-loop - except instances that areTagNode
instances that are Opening-Tags, rather than Closing-Tags.
This method will function almost identically toopenTag()
, with the subtle difference being that it returns aTRUE / FALSE
boolean, instead of an instance-reference.- Returns:
- This method shall always return
FALSE
, unless it has been overriden by a subclass. SubclassTagNode
overrides this, and will returnTRUE
if and only if the following conditions hold:- ① If
'this'
instance is aTagNode
- ② If
'this'
instance'isClosing
field is false.
AGAIN: These conditions should imply that'this'
is not only an instance of theTagNode
subclass ofHTMLNode
, but furthermore that this is an opening tag, rather than a closing. - ① If
- See Also:
TagNode.isOpenTag()
,openTag()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode. // For instances of inheriting class TextNode and CommentNode, this always returns FALSE. // In 'TagNode' this method returns TRUE based on that class 'isClosing' field. return false;
- This class - class
-
ifTagNode
public TagNode ifTagNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classTagNode
, this method produces'this'
instance.- Returns:
- This method is overriden by sub-class
TagNode
, and in that class, this method simply returns'this'
. The other sub-classes of this (abstract
) class inherit this version of this method, and therefore return null. - See Also:
TagNode.ifTagNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TagNode, where it shall return // 'this'. Neither class TextNode, nor class CommentNode will over-ride this method. return null;
-
ifTextNode
public TextNode ifTextNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classTextNode
, this method produces'this'
instance.- Returns:
- This method is overriden by sub-class
TextNode
, and in that class, this method simply returns'this'
. The other sub-classes of this (abstract
) class inherit this version of this method, and therefore return null. - See Also:
TextNode.ifTextNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass TextNode, where it shall return // 'this'. Neither class TagNode, nor class CommentNode will over-ride this method. return null;
-
ifCommentNode
public CommentNode ifCommentNode()
Loop Optimization Method
When this method is invoked on an instance of sub-classCommentNode
, this method produces'this'
instance.- Returns:
- This method is overriden by sub-class
CommentNode
, and in that class, this method simply returns'this'
. The other sub-classes of this (abstract
) class inherit this version of this method, and therefore return null. - See Also:
CommentNode.ifCommentNode()
- Code:
- Exact Method Body:
// This method will *only* be over-ridden by subclass CommentNode, where it shall return // 'this'. Neither class TagNode, nor class TextNode will over-ride this method. return null;
-
asTagNode
public final TagNode asTagNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNode
to aTagNode
.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'
instance. (Note that the methodClass.cast(Object)
doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException
- IMPORTANT: If the instance is aTextNode
orCommentNode
, rather than aTagNode
, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return TagNode.class.cast(this);
-
asTextNode
public final TextNode asTextNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNode
to aTextNode
.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'
instance. (Note that the methodClass.cast(Object)
doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException
- IMPORTANT: If the instance is aTagNode
orCommentNode
, rather than aTextNode
, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return TextNode.class.cast(this);
-
asCommentNode
public final CommentNode asCommentNode()
Compile-Time "Syntactic Sugar" for casting anHTMLNode
to aCommentNode
.
Final Method:
This method is final, and cannot be modified by sub-classes.- Returns:
- Simply returns
'this'
instance. (Note that the methodClass.cast(Object)
doesn't actually do *anything*, other than provide the compile-time logic some 'proof' in its type-analysis) - Throws:
java.lang.ClassCastException
- IMPORTANT: If the instance is aTagNode
orTextNode
, rather than aCommentNode
, then (naturally) the JVM will immediately throw a casting exception.- Code:
- Exact Method Body:
return CommentNode.class.cast(this);
-
-