Class URLIterator

  • All Implemented Interfaces:
    java.util.Iterator<java.net.URL>

    public class URLIterator
    extends java.lang.Object
    implements java.util.Iterator<java.net.URL>
    An Iterator that is intended to be used for retrieving the image-URL's from the page.

    Generally, writing an Iterator is really easy, however, if you have ever scraped a few of these sites - it can seem really repetitive and error-prone. It is here so I don't have to retype it again.


    • Constructor Summary

      Constructors 
      Constructor Description
      URLIterator​(int start, int end, IntFunction<URL> urlGetter)
      Perhaps as more of these "wonderful" photo-bomb sites are published, more versions of this iterator shall occur.
    • Method Summary

       
      Methods: interface java.lang.Iterator
      Modifier and Type Method
      boolean hasNext()
      URL next()
       
      Methods: Static Factory Builder, Typical
      Modifier and Type Method
      static URLIterator usual​(String baseURLStr, int startPageNum, int lastPageNum)
      static URLIterator usual​(String url, String appendParamStr, int startPageNum, int lastPageNum)
       
      Methods: Internal Exception Check
      Modifier and Type Method
      static void CHECK_EXCEPTIONS​(String url, int startPageNum, int lastPageNum)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining, remove
    • Constructor Detail

      • URLIterator

        🡇     🗕  🗗  🗖
        public URLIterator​(int start,
                           int end,
                           java.util.function.IntFunction<java.net.URL> urlGetter)
        Perhaps as more of these "wonderful" photo-bomb sites are published, more versions of this iterator shall occur. Right now, the easiest way to deal with iterating through the forty or fifty pages of photos, is to indicate the start and end number of the pages, and require the user/programmer to provide a lambda function "making" the URL out of a cur-position number.
        Parameters:
        start - This is the integer that is the "first" page of the site.

        HTML Elements:
         <!-- This URL has a lot of "Cute Little Bears" being saved in Siberia
              The way you can scrape all 39 photos quickly is to iterator through
              each of the PHP calls via the value passed to "page=" -->
         <A HREF='https://www.jerusalemonline.com/view/bear-cubs-jol/?page=1'>
        
        end - This is the integer that contains the last page of the photo-site collection. In the particular case of the "Bears who lost their momma in Siberia" - the last page that is currently available is page number 39.
        urlGetter - Any programmer that is familiar with Java Lambda Functions, should know this is just Java's version of a "Function Pointer" from C and C++. This function pointer must be a function that takes as input and integer (which is a page number), and returns as output a URL. This will be called once for each page on the site.

        Example:
         // Generally, one might think this should be a single-line lambda expression.  Though
         // single line function pointers are quite common, because calling the constructor to a
         // URL can generate a MalformedURLException, and because these exceptions are not 
         // sub-classes of RunTimeException, this short lambda has to include a try-catch.  Here,
         // the checked exception is simply converted to NullPointerException - which is
         // unchecked.  The reality is that if proper values are entered for start and end, no
         // exceptions will occur.
         URLIterator iter = new URLIterator(1, 39, (int curPage) ->
         {   
             try
                 { return new URL(urlStr + curPage); }
             catch (MalformedURLException e)
                 { throw new NullPointerException("Malformed URL Exception" + e.toString()); }
         }
        
    • Method Detail

      • hasNext

        🡅  🡇     🗕  🗗  🗖
        public boolean hasNext()
        Just checks if there are more elements available.
        Specified by:
        hasNext in interface java.util.Iterator<java.net.URL>
        Returns:
        TRUE if there are more pages to check, and FALSE otherwise.
        Code:
        Exact Method Body:
         return cur < end;
        
      • next

        🡅  🡇     🗕  🗗  🗖
        public java.net.URL next()
        Meeting the requirements of an instance of Java's standard iterator instance.
        Specified by:
        next in interface java.util.Iterator<java.net.URL>
        Returns:
        This shall return the "next" URL element from the Photo Site.
        Code:
        Exact Method Body:
         cur++;
         if (cur > end) throw new NoSuchElementException(
             "The current iteration counter is: " + cur +
             " but unfortunately, the max-page-number you passed to the constructor is: " + end 
         );
         return getter.apply(cur);
        
      • usual

        🡅  🡇     🗕  🗗  🗖
        public static URLIterator usual​(java.lang.String baseURLStr,
                                        int startPageNum,
                                        int lastPageNum)
                                 throws java.net.MalformedURLException
        Throws:
        java.net.MalformedURLException
        Code:
        Exact Method Body:
         CHECK_EXCEPTIONS(baseURLStr, startPageNum, lastPageNum);
        
         return new URLIterator(startPageNum, lastPageNum, (int curPage) ->
         {   
             try
                 { return new URL(baseURLStr + curPage); }
             catch (MalformedURLException e)
                 { throw new NullPointerException("Malformed URL Exception" + e.toString()); }
                 // CHEAP-TRICK: Compile-Time Exception to Runtime Exception...  However, the 
                 // base-URL has already been tested, and therefore this exception NEEDS to be 
                 // suppressed...  NOTE: This exception should *NEVER* throw...
         });
        
      • usual

        🡅  🡇     🗕  🗗  🗖
        public static URLIterator usual​(java.lang.String url,
                                        java.lang.String appendParamStr,
                                        int startPageNum,
                                        int lastPageNum)
                                 throws java.net.MalformedURLException
        Throws:
        java.net.MalformedURLException
        Code:
        Exact Method Body:
         CHECK_EXCEPTIONS(url + 1 + appendParamStr, startPageNum, lastPageNum);
        
         return new URLIterator(startPageNum, lastPageNum, (int curPage) ->
         {   
             try
                 { return new URL(url + curPage + appendParamStr); }
             catch (MalformedURLException e)
                 { throw new NullPointerException("Malformed URL Exception" + e.toString()); }
                 // CHEAP-TRICK: Compile-Time Exception to Runtime Exception...  However, the 
                 // base-URL has already been tested, and therefore this exception NEEDS to be 
                 // suppressed...  NOTE: This exception should *NEVER* throw...
         });
        
      • CHECK_EXCEPTIONS

        🡅     🗕  🗗  🗖
        public static void CHECK_EXCEPTIONS​(java.lang.String url,
                                            int startPageNum,
                                            int lastPageNum)
                                     throws java.net.MalformedURLException
        Throws:
        java.net.MalformedURLException
        Code:
        Exact Method Body:
         // FAIL-FAST: Check user input before the iterator starts iterating.
         if (startPageNum < 0) throw new IllegalArgumentException(
             "The value passed to the starting-page-number parameter [" + startPageNum + "], " +
             "was negative.  Most often it is 1 or, possibly, 0."
         );
        
         if (lastPageNum < 0) throw new IllegalArgumentException(
             "The value passed to the ending-page-number parameter [" + lastPageNum + "], was negative."
         );
        
         if (startPageNum > lastPageNum) throw new IllegalArgumentException(
             "The value passed to the ending-page-number parameter [" + startPageNum + "], was greater " +
             "than the value passed to ending-page-number parameter [" + lastPageNum + "]."
         );
        
         if (url == null) throw new NullPointerException
             ("A null value was passed as a url.");
        
         // FAIL-FAST:   This should be a valid URL as a String.  This invocation will throw the
         //              MalformedURLException if it is not.
         new URL(url);