Package Torello.HTML.Tools.NewsSite
Interface Pause
-
- All Superinterfaces:
java.io.Serializable
public interface Pause extends java.io.Serializable
When the main iteration-loop for downloading news-articles is running, the loop-variables are kept current to this class; so if (while watching the downloader), the programmer has decided to go take a break (and pressesControl-^C
), 'download progress' won't be lost and starting over with articles that have already been saved won't be necessary.
This interface allows a user to stop a download of a large number of URL's, and restart the download without beginning at the very beginning of the article URL list. This interface is only included as a separate interface, rather than as some simple static methods inside the downloader class to allow a user to specify where the state shall be saved. If the Java Virtual Machine is halted during the download process while iterating hundreds of articles, saving the intermediate state is beneficial. This interface allows a user to identify where that state shall be saved.
Theinterface Pause
provides a simple factory method which returns an implementation of theinterface Pause
that uses just a file-name, and the file-system to save intermediate state. If this is unacceptable to the user, writing a non-file-system dependant implementation ofinterface Pause
should be easy. The only requirements made herein are saving and retrieving, when requested, three integer "state-parameters."
Also, there is no means provided for actually halting the downloading process. This 'Pause' interface 'Pause' does not actually stop the program, but rather merely saves the intermediate and relevant counter information, and vector-index information to a small file. This enables the downloader, when requested, to start the article-download process where it left-off - if the user halted the process manually, or the process crashed during the download.
IMPORTANT: If there are hundreds and hundreds of articles to download - which can occur if a first-time scrape of a news website is being performed, the best way to "halt the downloading" is simply to just"Press Control-C"
on the keyboard. The last successful download will be in the "State Backup Monitor" - which is what this class is.
Hi-Lited Source-Code:- View Here: Torello/HTML/Tools/NewsSite/Pause.java
- Open New Browser-Tab: Torello/HTML/Tools/NewsSite/Pause.java
File Size: 5,319 Bytes Line Count: 101 '\n' Characters Found
-
-
Field Summary
Serializable ID Modifier and Type Field static long
serialVersionUID
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Modifier and Type Method static Pause
getFSInstance(String saveFileName)
void
initialize()
Ret4<Vector<Vector<DownloadResult>>,
Integer,
Integer,
Integer>loadState()
void
saveState(Vector<Vector<DownloadResult>> results, int outerCounter, int innerCounter, int successCounter)
-
-
-
Field Detail
-
serialVersionUID
static final long serialVersionUID
This fulfils the SerialVersion UID requirement for all classes that implement Java'sinterface java.io.Serializable
. Using theSerializable
Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
public static final long serialVersionUID = 1;
-
-
Method Detail
-
saveState
void saveState(java.util.Vector<java.util.Vector<DownloadResult>> results, int outerCounter, int innerCounter, int successCounter) throws PauseException
This method needs to save the current download state. The three integers provided are all that the download logic needs in order to identify which newspaper articleURL's
have already downloaded - and, therefore, where to begin the download process after a pause or break. The instance ofVector
that is required by this method's parameter list contain the "Download Results" for each news-Article
in theURL
list.- Parameters:
results
- This is the two dimensionalVector
that contains instances of'DownloadResult'
. Each news-Article
in each section of a newspaper website has a specific location in this two dimensionalVector
. As the downloader retrieves (or fails) to scrape news-Article's
, the result of the scrape (or scrape-attempt) are inserted into this 2-DVector
.outerCounter
- This is the outer-Vector
index of the lastURL
downloaded.innerCounter
- This is the inner-Vector
index of the lastURL
downloaded.successCounter
- This is how many of theURL's
that were downloaded without throwing any exceptions.- Throws:
PauseException
-
loadState
Ret4<java.util.Vector<java.util.Vector<DownloadResult>>,java.lang.Integer,java.lang.Integer,java.lang.Integer> loadState () throws PauseException
This method loads the state of the downloader. This can be helpful if the user wishes to "pause" the download when long-lists of articleURL's
are being retrieved. Also, if the downloader exits due to an exception, the state of download is maintained.- Returns:
- An instance of
Ret4<Vector<Vector<DownloadResult>>, Integer, Integer, Integer>
Ret4.a
- The current state of the "ReturnVector
". This two dimensionalVector
fills up with instances of enumerated-typeDownloadResult
.Ret2.b
- The outer-Vector
index of the last attempted newspaper articleURL
download.Ret2.c
- The inner-Vector
index of the last attempted newspaper articleURL
download.Ret2.d
- The number of articleURL's
that have successfully downloaded.
- Throws:
PauseException
-
initialize
void initialize() throws PauseException
If thePause
implementation needs initialization, it ought to implement this method.
IMPORTANT: The initialize process should ensure that a call toloadState()
will return aRet4
data-structure whose integer fields are all equal to zero. These fields are counters, and when download begins, if they are not-zero, then many news-articles will not be scraped.
ALSO: On initialization, the value for the 2-DVector
in theRet4
data-structure need only be present - it does not matter what values have been inserted into it, nor the sizes of the sub-Vector's
. Do note that it's values will be clobbered by the downloader if / when the downloader determines that the download process is starting at the beginning.- Throws:
PauseException
- This exception is thrown if the implementation of thisinterface
fails to init or load.
-
getFSInstance
static Pause getFSInstance(java.lang.String saveFileName) throws PauseException
This method is astatic
-factory method that returns an instance of thisinterface Pause
that uses the file-system for saving the state to a user-specified file-name.- Parameters:
saveFileName
- This is just the name of the data-file where state shall be saved. This state contains only two integers, and is, therefore, an extremely small data-file.- Returns:
- A functioning instance of this interface - one that uses a flat file for saving state.
- Throws:
PauseException
- Code:
- Exact Method Body:
return new PauseFS(saveFileName);
-
-