Package Torello.HTML.Tools.Images
Class URLIterator
- java.lang.Object
-
- Torello.HTML.Tools.Images.URLIterator
-
- All Implemented Interfaces:
java.util.Iterator<java.net.URL>
public class URLIterator extends java.lang.Object implements java.util.Iterator<java.net.URL>
AnIterator
that is intended to be used for retrieving the image-URL's from the page.
Generally, writing anIterator
is really easy, however, if you have ever scraped a few of these sites - it can seem really repetitive and error-prone. It is here so I don't have to retype it again.
Hi-Lited Source-Code:- View Here: Torello/HTML/Tools/Images/URLIterator.java
- Open New Browser-Tab: Torello/HTML/Tools/Images/URLIterator.java
File Size: 6,685 Bytes Line Count: 149 '\n' Characters Found
-
-
Constructor Summary
Constructors Constructor Description URLIterator(int start, int end, IntFunction<URL> urlGetter)
Perhaps as more of these "wonderful" photo-bomb sites are published, more versions of this iterator shall occur.
-
Method Summary
Methods: interface java.lang.Iterator Modifier and Type Method boolean
hasNext()
URL
next()
Methods: Static Factory Builder, Typical Modifier and Type Method static URLIterator
usual(String baseURLStr, int startPageNum, int lastPageNum)
static URLIterator
usual(String url, String appendParamStr, int startPageNum, int lastPageNum)
Methods: Internal Exception Check Modifier and Type Method static void
CHECK_EXCEPTIONS(String url, int startPageNum, int lastPageNum)
-
-
-
Constructor Detail
-
URLIterator
public URLIterator(int start, int end, java.util.function.IntFunction<java.net.URL> urlGetter)
Perhaps as more of these "wonderful" photo-bomb sites are published, more versions of this iterator shall occur. Right now, the easiest way to deal with iterating through the forty or fifty pages of photos, is to indicate the start and end number of the pages, and require the user/programmer to provide a lambda function "making" the URL out of a cur-position number.- Parameters:
start
- This is the integer that is the "first" page of the site.
HTML Elements:
<!-- This URL has a lot of "Cute Little Bears" being saved in Siberia The way you can scrape all 39 photos quickly is to iterator through each of the PHP calls via the value passed to "page=" --> <A HREF='https://www.jerusalemonline.com/view/bear-cubs-jol/?page=1'>
end
- This is the integer that contains the last page of the photo-site collection. In the particular case of the "Bears who lost their momma in Siberia" - the last page that is currently available is page number 39.urlGetter
- Any programmer that is familiar with Java Lambda Functions, should know this is just Java's version of a "Function Pointer" from C and C++. This function pointer must be a function that takes as input and integer (which is a page number), and returns as output a URL. This will be called once for each page on the site.
Example:
// Generally, one might think this should be a single-line lambda expression. Though // single line function pointers are quite common, because calling the constructor to a // URL can generate a MalformedURLException, and because these exceptions are not // sub-classes of RunTimeException, this short lambda has to include a try-catch. Here, // the checked exception is simply converted to NullPointerException - which is // unchecked. The reality is that if proper values are entered for start and end, no // exceptions will occur. URLIterator iter = new URLIterator(1, 39, (int curPage) -> { try { return new URL(urlStr + curPage); } catch (MalformedURLException e) { throw new NullPointerException("Malformed URL Exception" + e.toString()); } }
-
-
Method Detail
-
hasNext
public boolean hasNext()
Just checks if there are more elements available.- Specified by:
hasNext
in interfacejava.util.Iterator<java.net.URL>
- Returns:
TRUE
if there are more pages to check, andFALSE
otherwise.- Code:
- Exact Method Body:
return cur < end;
-
next
public java.net.URL next()
Meeting the requirements of an instance of Java's standard iterator instance.- Specified by:
next
in interfacejava.util.Iterator<java.net.URL>
- Returns:
- This shall return the "next" URL element from the Photo Site.
- Code:
- Exact Method Body:
cur++; if (cur > end) throw new NoSuchElementException( "The current iteration counter is: " + cur + " but unfortunately, the max-page-number you passed to the constructor is: " + end ); return getter.apply(cur);
-
usual
public static URLIterator usual(java.lang.String baseURLStr, int startPageNum, int lastPageNum) throws java.net.MalformedURLException
- Throws:
java.net.MalformedURLException
- Code:
- Exact Method Body:
CHECK_EXCEPTIONS(baseURLStr, startPageNum, lastPageNum); return new URLIterator(startPageNum, lastPageNum, (int curPage) -> { try { return new URL(baseURLStr + curPage); } catch (MalformedURLException e) { throw new NullPointerException("Malformed URL Exception" + e.toString()); } // CHEAP-TRICK: Compile-Time Exception to Runtime Exception... However, the // base-URL has already been tested, and therefore this exception NEEDS to be // suppressed... NOTE: This exception should *NEVER* throw... });
-
usual
public static URLIterator usual(java.lang.String url, java.lang.String appendParamStr, int startPageNum, int lastPageNum) throws java.net.MalformedURLException
- Throws:
java.net.MalformedURLException
- Code:
- Exact Method Body:
CHECK_EXCEPTIONS(url + 1 + appendParamStr, startPageNum, lastPageNum); return new URLIterator(startPageNum, lastPageNum, (int curPage) -> { try { return new URL(url + curPage + appendParamStr); } catch (MalformedURLException e) { throw new NullPointerException("Malformed URL Exception" + e.toString()); } // CHEAP-TRICK: Compile-Time Exception to Runtime Exception... However, the // base-URL has already been tested, and therefore this exception NEEDS to be // suppressed... NOTE: This exception should *NEVER* throw... });
-
CHECK_EXCEPTIONS
public static void CHECK_EXCEPTIONS(java.lang.String url, int startPageNum, int lastPageNum) throws java.net.MalformedURLException
- Throws:
java.net.MalformedURLException
- Code:
- Exact Method Body:
// FAIL-FAST: Check user input before the iterator starts iterating. if (startPageNum < 0) throw new IllegalArgumentException( "The value passed to the starting-page-number parameter [" + startPageNum + "], " + "was negative. Most often it is 1 or, possibly, 0." ); if (lastPageNum < 0) throw new IllegalArgumentException( "The value passed to the ending-page-number parameter [" + lastPageNum + "], was negative." ); if (startPageNum > lastPageNum) throw new IllegalArgumentException( "The value passed to the ending-page-number parameter [" + startPageNum + "], was greater " + "than the value passed to ending-page-number parameter [" + lastPageNum + "]." ); if (url == null) throw new NullPointerException ("A null value was passed as a url."); // FAIL-FAST: This should be a valid URL as a String. This invocation will throw the // MalformedURLException if it is not. new URL(url);
-
-