Package Torello.HTML
Class SubSection
- java.lang.Object
-
- Torello.HTML.SubSection
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.CharSequence
,java.lang.Cloneable
,java.lang.Comparable<Replaceable>
,Replaceable
public class SubSection extends java.lang.Object implements java.lang.CharSequence, java.io.Serializable, java.lang.Cloneable, Replaceable
Allows the NodeSearch Package to simultaneously return both an HTML-Vector
sublist, and the location where that sub-list was located (as an instance ofDotPair
) where that sublist was located.
This class is is a simple data-structure-class which is used to represent vectorized-html web-page "sub-sections." This class keeps a copy of both the html (as a page-Vector
), and the location of this sub-page from whence this html was copied from the original page. The location is saved as an instance of DotPair.
If the above sounds like technical-jargon, please review class DotPair and notice that the start and end are merely pointers into a vectorized-html web-page. They 'point' to the starting and ending places on the main html web-page that stored this "sub-section" or "sub-list" html-contents.
This class implements the Replaceable interface which can encapsulate both the location of a portion of html, and its location inside of a main page - from which it was copied, as public fields in this data-structure.
STALE-DATA NOTE:
The burden of ensuring that stale-data is not contained inside an instance of classSubSection
is left as an exercise for the programmer using this class. If the original page vector is modified, even the portion of the original page being modified does not overlap this sub-section the values inside the location field could become stale - with respect to the original page. This will happen if any nodes are added or removed from the original page. Thelocation
pointers would no longer have indices that represent the original sub-section intended by the original instantiation of this class.
Implements Replaceable:
This class implements theReplaceable
interface
. This means that it may be used for efficiently modifying, updating, or replacing many segments of an HTML-Page using the method:
ReplaceNodes.r(Vector<HTMLNode>, Iterable<Replaceable>, boolean)
Whenever the Java HTML JAR Library's HTML-Vector's
are being used to modify or update an HTML-Page, it can sometimes help to remember that shifting elements in a list (in this package aVector<HTMLNode>
) can be somewhat inefficient if there are going to be many nodes inserted and removed. This is because inserting a node into aVector
does, indeed, require shifting all nodes that occur after the insertion index-location forward!
By first extracting HTML nodes or sub-lists using the NodeSearch-PackagePeek-Operations
(all of which returnReplaceable
instances), a user can operate on much smaller HTML Pieces. Once all updates have been made, the originalVector
can be instantly rebuilt using the efficient updater method (link above).
TheReplaceable
interface provides quite a number of HTML Modification methods to add, change or eliminate the original HTML found on a page. Note that when operating on a piece of HTML that has been extracted, the effects of stale index-pointers become irrelevant!- See Also:
HTMLNode
,DotPair
,NodeIndex
,TagNodePeekInclusive
,InnerTagPeekInclusive
, Serialized Form
Hi-Lited Source-Code:- View Here: Torello/HTML/SubSection.java
- Open New Browser-Tab: Torello/HTML/SubSection.java
File Size: 12,097 Bytes Line Count: 287 '\n' Characters Found
-
-
Field Summary
Serializable ID Modifier and Type Field static long
serialVersionUID
Alternate Comparator Modifier and Type Field static Comparator<SubSection>
comp2
SubSection Fields Modifier and Type Field Vector<HTMLNode>
html
DotPair
location
-
Constructor Summary
Constructors Constructor Description SubSection(DotPair location, Vector<HTMLNode> html)
This just builds a new instance of this class.
-
Method Summary
Methods: interface Torello.HTML.Replaceable Modifier and Type Method boolean
addAllInto(int index, Vector<HTMLNode> fileVec)
boolean
addAllInto(Vector<HTMLNode> fileVec)
Vector<HTMLNode>
currentNodes()
int
currentSize()
HTMLNode
firstCurrentNode()
HTMLNode
lastCurrentNode()
int
originalLocationEnd()
int
originalLocationStart()
int
originalSize()
int
update(Vector<HTMLNode> originalHTML)
Methods: interface java.lang.CharSequence Modifier and Type Method char
charAt(int index)
int
length()
CharSequence
subSequence(int start, int end)
String
toString()
Methods: interface java.lang.Cloneable Modifier and Type Method SubSection
clone()
Methods: class java.lang.Object Modifier and Type Method int
hashCode()
-
Methods inherited from class java.lang.Object
equals, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface Torello.HTML.Replaceable
clearHTML, compareTo, isSynthetic, moveAndUpdate, setHTML, setHTML
-
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
This fulfils the SerialVersion UID requirement for all classes that implement Java'sinterface java.io.Serializable
. Using theSerializable
Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
public static final long serialVersionUID = 1;
-
location
public final DotPair location
This public field identifies the sub-section location of a particular sub-section from a vectorized-html webpage. The location of the sub-page is specified by theclass DotPair
public-fields:public final int 'start'
andpublic final int 'end'
- See Also:
DotPair
- Code:
- Exact Field Declaration Expression:
public final DotPair location;
-
html
-
comp2
public static java.util.Comparator<SubSection> comp2
This is an "alternative Comparitor" that can be used for sorting instances of this class. It should work with theCollections.sort(List, Comparator)
method in the standard JDK packagejava.util.*;
Comparator Heuristic:
This simply compares thepublic
DotPair
-Typed fieldlocation
to each-other using that class' secondary instance ofComparator
.DotPair's
secondary-comparitor may be viewed at:DotPair.comp2
.- See Also:
DotPair.comp2
- Code:
- Exact Field Declaration Expression:
public static Comparator<SubSection> comp2 = (SubSection ss1, SubSection ss2) -> DotPair.comp2.compare(ss1.location, ss2.location);
-
-
Constructor Detail
-
SubSection
public SubSection(DotPair location, java.util.Vector<HTMLNode> html)
This just builds a new instance of this class. It represents a 'sub-section' of the html-page that needs to encapsulated into an object-instance. The contents of this data-structure are merely these two parameters that are passed to this constructor.- Parameters:
location
- This parameter value will be assigned immediately to the internal-fieldpublic DotPair location.
It is a two-integerVector
-index class that points to the starting index-position, inside the main html-Vector
, of the htmlclass 'SubSection'
being constructed here.html
- This parameter may be any vectorized-html web-page, but the intention is that thisVector
is an exact "cloned range" (a copy of a portion of the web-page) whose starting and ending integer indexVector
-positions are demarcated by the contents of the parameter'location'
- Throws:
java.lang.IllegalArgumentException
- This exception will throw if either of these two scenarios occur:- If the input
Vector<HTMLNode> 'html'
hashtml.size() == 0
. - If
html.size() != location.size()
- If the input
- See Also:
location
,html
,DotPair
,NodeIndex
,HTMLNode
- Code:
- Exact Constructor Body:
if (location == null) throw new NullPointerException ("Parameter 'DotPair location' to SubSection constructor was null."); if (html == null) throw new NullPointerException ("Parameter 'Vector<HTMLNode> html' to SubSection constructor was null."); if (html.size() == 0) throw new IllegalArgumentException( "Parameter 'Vector<HTMLNode> html' to SubSection constructor has size zero, but " + "this is not allowed here." ); if (location.size() != html.size()) throw new IllegalArgumentException( "Field 'public final int end' [value=" + location.end + "] of passed-parameter " + "'DotPair location' to SubSection constructor is different than the length of the " + "html-vector [" + html.size() + "]." ); this.location = location; this.html = html;
-
-
Method Detail
-
clone
public SubSection clone()
Java'sinterface Cloneable
requirements. This instantiates a newSubSection
with identicalVector<HTMLNode> html
andDotPair location
fields.- Overrides:
clone
in classjava.lang.Object
- Returns:
- A new
SubSection
whose internal fields are identical to this one. - Code:
- Exact Method Body:
return new SubSection(location, html);
-
hashCode
public int hashCode()
Java's hash-code requirement.- Overrides:
hashCode
in classjava.lang.Object
- Returns:
- A hash-code that may be used when storing this node in a java hashed-collection.
The starting location of this
SubSection
ought to be be a unique hash - Code:
- Exact Method Body:
return location.start;
-
toString
public final java.lang.String toString()
Java'stoString()
requirement.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
toString
in interfacejava.lang.CharSequence
- Overrides:
toString
in classjava.lang.Object
- Returns:
- A
String
-representation of thisHTMLNode.
- See Also:
toString()
- Code:
- Exact Method Body:
return Util.pageToString(html);
-
charAt
public final char charAt(int index)
Returns the char value at the specified index of the String defined-by an invokation of the method:Util.pageToString(html)
. An index ranges from'0'
(zero) tolength() - 1.
The firstchar
value of the sequence is at index'0'
, the next at index one, and so on, as for array indexing.
NOTE: If the char value specified by the index is a surrogate, the surrogate value is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
charAt
in interfacejava.lang.CharSequence
- Parameters:
index
- The index of the char value to be returned- Returns:
- The specified char value
- See Also:
toString()
- Code:
- Exact Method Body:
return toString().charAt(index);
-
length
public final int length()
Returns the length of theString
defined-by an invokation of the method:Util.pageToString(html)
. The length is the number of 16-bitchar's
in the sequence.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
length
in interfacejava.lang.CharSequence
- Returns:
- the number of
chars
inthis.n.str
- See Also:
toString()
- Code:
- Exact Method Body:
return toString().length();
-
subSequence
public final java.lang.CharSequence subSequence(int start, int end)
Returns ajava.lang.CharSequence
that is a subsequence of theString
defined-by an invokation of the method:Util.pageToString(html)
. The subsequence starts with thechar
value at the specified index and ends with thechar
value at indexend - 1.
The length (inchar's
) of the returned sequence isend - start
, so ifstart == end
then an empty sequence is returned.
Final Method:
This method is final, and cannot be modified by sub-classes.- Specified by:
subSequence
in interfacejava.lang.CharSequence
- Parameters:
start
- The start index, inclusiveend
- The end index, exclusive- Returns:
- The specified subsequence
- See Also:
toString()
- Code:
- Exact Method Body:
return toString().substring(start, end);
-
originalSize
public int originalSize()
Description copied from interface:Replaceable
Reports how many nodes were copied intothis
instance. For implementing classes that inheritNodeIndex
, this value will always be one. For others, it should report exactly how manyHTMLNode's
were copied.- Specified by:
originalSize
in interfaceReplaceable
- Returns:
- Number of nodes originally contained by
this
instance.
The purpose ofReplaceable's
is to allow a user to modify HTML using a smaller sub-list, without having to operate on the entire HTML-Vector
since adding & removing nodes is one variant ofVector
-modification, the original-size may often differ from the current-size.
When modifying HTML, if a web-page is broken into smaller-pieces, and changes are restricted to those smaller sub-lists (and the original page is rebuilt, all at once, after all changes have been made) then those modifications should require far-fewer time-consuming list-shift operations, tremendously improving the performance of the code. - Code:
- Exact Method Body:
return location.size();
-
currentSize
public int currentSize()
Description copied from interface:Replaceable
Returns how many nodes are currently inthis
instance.- Specified by:
currentSize
in interfaceReplaceable
- Returns:
- Number of nodes. See explanation of the original size,
versus the current size
here
- Code:
- Exact Method Body:
return html.size();
-
originalLocationStart
public int originalLocationStart()
Description copied from interface:Replaceable
Returns the start-location within the original page-Vector
from whence the HTML contents ofthis
instance were retrieved.
Start is Inclusive:
The returned value is inclusive of the actual, original-range ofthis
instance. This means the firstHTMLNode
copied intothis
instance' internal data-structure was atoriginalLocationStart()
.
Implementations of Replaceable:
The two concrete implementatons of this interface (NodeIndex
andSubSection
) - both enforce the'final'
modifier on their location-fields. (See:NodeIndex.index
andlocation
).- Specified by:
originalLocationStart
in interfaceReplaceable
- Returns:
- The
Vector
start-index from whence this HTML was copied. - Code:
- Exact Method Body:
return location.start;
-
originalLocationEnd
public int originalLocationEnd()
Description copied from interface:Replaceable
Returns the end-location within the original page-Vector
from whence the HTML contents ofthis
instance were retrieved.
Start is Exclusive:
The returned value is exclusive of the actual, original-range ofthis
instance. This means the lastHTMLNode
copied intothis
instance' internal data-structure was atoriginalLocationEnd() - 1
Implementations of Replaceable:
The two concrete implementatons of this interface (NodeIndex
andSubSection
) - both enforce the'final'
modifier on their location-fields. (See:NodeIndex.index
andlocation
).- Specified by:
originalLocationEnd
in interfaceReplaceable
- Returns:
- The
Vector
end-index from whence this HTML was copied. - Code:
- Exact Method Body:
return location.end + 1;
-
currentNodes
public java.util.Vector<HTMLNode> currentNodes()
Description copied from interface:Replaceable
All nodes currently contained by thisReplaceable
. The concrete-classes which implementReplaceable
(SubSection
&TagNodeIndex
) allow for the html they hold to be modified. The modification to aReplaceable
happens independently from the original HTML Page out of which it was copied.Replaceable's
are, sort-of, the exact opposite of Java'sList
method'subList'
. According to the Sun / Oracle Documentation forjava.util.List.subList(int fromIndex, int toIndex)
, any changes made to an instance of a'subList'
are immediately reflected back into the originalList
from where they were created.
TheList.subList
operation has the advantage of being extremely easy to work with - however, an HTML-PageVector
has the potential of being hundreds of nodes long. Any operations that involve insertion or deletion will likely be terribly inefficient.
When the HTML inside of aReplaceable
is modified - nothing happens to the originalVector
whatsoever!. Until a user requests that the original HTML-Vector
be updated to reflect all changes that he or she has made, the original HTML remains untouched. When an update request is finally issued, all changes are made all at once, and at the same time!
Again - seeReplacement.run
to understand how quick updates on HTML-Pages is done using theReplaceable
interface.- Specified by:
currentNodes
in interfaceReplaceable
- Returns:
- An HTML-
Vector
of the nodes. - Code:
- Exact Method Body:
return html;
-
firstCurrentNode
public HTMLNode firstCurrentNode()
Description copied from interface:Replaceable
The first node currently contained by thisReplaceable
- Specified by:
firstCurrentNode
in interfaceReplaceable
- Returns:
- The First Node
- Code:
- Exact Method Body:
return currentNodes().elementAt(0);
-
lastCurrentNode
public HTMLNode lastCurrentNode()
Description copied from interface:Replaceable
The last node currently contained by thisReplaceable
- Specified by:
lastCurrentNode
in interfaceReplaceable
- Returns:
- The last node
- Code:
- Exact Method Body:
return html.elementAt(html.size() - 1);
-
addAllInto
public boolean addAllInto(java.util.Vector<HTMLNode> fileVec)
Description copied from interface:Replaceable
Add all nodes currently retained inthis
instance into the HTML-Vector
parameterhtml
. The nodes are appended to the end of'html'
. Implementing classesNodeIndex
andSubSection
simply use the JavaVector
method'sadd
(forNodeIndex
) andaddAll
(forSubSection
).- Specified by:
addAllInto
in interfaceReplaceable
- Parameters:
fileVec
- The HTML-Vector
into which the nodes will be appended (to the end of thisVector
, usingVector
methodsadd
oraddAll
dependent upon whether one or more-than-one nodes are being inserted).- Returns:
- The result of
Vector
methodadd
, or methodallAll
- Code:
- Exact Method Body:
return fileVec.addAll(html);
-
addAllInto
public boolean addAllInto(int index, java.util.Vector<HTMLNode> fileVec)
Description copied from interface:Replaceable
Add all nodes currently retained inthis
instance into the HTML-Vector
parameterhtml
.- Specified by:
addAllInto
in interfaceReplaceable
- Parameters:
index
- The'html'
parameter'sVector
-index where these nodes are to be insertedfileVec
- The HTML-Vector
into which the nodes will be appended (to the end of thisVector
, usingVector
methodsadd
oraddAll
dependent upon whether one or more-than-one nodes are being inserted).- Returns:
- The result of
Vector
methodadd
, or methodallAll
- Code:
- Exact Method Body:
return fileVec.addAll(index, html);
-
update
public int update(java.util.Vector<HTMLNode> originalHTML)
Description copied from interface:Replaceable
Replaces the original range of nodes insideoriginalHTML
with the current-nodes ofthis
instance, using the original-location of the node(s).
Replaceable's Primary Value:
The main value of using theReplaceable
interface is to allow for more expedient replacing / modifying HTML Pages. If many changes need to be made to a page, first extracting and copying the sub-sections that need changing intoReplaceable's
instances (using the Peek operations in package NodeSearch), and then re-copying those sections back into the original page-Vector
after changing them - avoids the cost that would be incurred from repeatedly inserting and shifting a long list of nodes in a large HTML Page.
Therefore, this method is probably best avoided, as it is defeating the entire-purpose of aRelaceable
. This method will update the nodes at the location in the original-Vector
, which is fine, but if more than one update / change is needed, using this method over-and-over again will re-introduce the exact shifting that was supposed to be avoided by (and is the whole reason for...) usingReplaceable's
in the first place!
The following example should make this clear:
Example:
Vector<HTMLNode> page = HTMLPage.getPageTokens(new URL("http://some.url.com/"), false); Vector<SubSection> myTableRows = TagNodePeekInclusive.all(page, "tr"); TagNode OPEN_SPAN = HTMLTags.hasTag("SPAN", TC.OpeningTags); TagNode CLOSE_SPAN = HTMLTags.hasTag("SPAN", TC.ClosingTags); int counter = 1; for (SubSection tableRow : myTableRows) { // Retrieve the <TR> Tag & Give it a CSS-ID TagNode tr = tableRow.html.elementAt(0).asTagNode().setID("ROW" + counter++, null); // Put the newly created <TR ID=..> into the vector. It was the first-element in the SubSection tableRow.html.setElementAt(tr, 0); // Add a <SPAN>...</SPAN> surrounding the first line of text // NOTE: This assumes that tableRow[1] (second SubSection node) is a TextNode with text tableRow.html.insertElementAt(OPEN_SPAN, 1); tableRow.html.insertElementAt(CLOSE_SPAN, 3); } // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // This version DESTROYS THE BENEFIT of using TagNodePeekInclusive // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // // Here, if the original html-page was thousands of nodes long, every table-row // update will force thousands of nodes to be shifted to the right over-and-over // again! for (SubSection tableRow : myTableRows) tableRow.update(page); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // This builds a new Vector much more efficiently, avoiding costly node-shifting // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** page = ReplaceNodes.r(page, myTableRows, false).a;
- Specified by:
update
in interfaceReplaceable
- Parameters:
originalHTML
- The original page-Vector
where the nodes inthis
instance were retrieved- Returns:
- The change in the size of the
Vector
- See Also:
Replacement.run(Vector, Iterable, boolean)
- Code:
- Exact Method Body:
return ReplaceNodes.r(originalHTML, location, html);
-
-