Class ImageScraper.AdditionalParameters

  • Enclosing class:
    ImageScraper

    public static class ImageScraper.AdditionalParameters
    extends java.lang.Object
    A class that allows a user to further configure how images are downloaded.

    The configuration parameters requested by the ImageScraper constructor are all mandatory. It is necessary to specify a "source" and "target" for images being retrieved from the internet. These "extra" parameters allow the user to do things like add "prepended" (prefix) String's to each file-name downloaded, decide what to do if a downloaded URL generates and exception, among other features.


    • Field Summary

      Fields 
      Modifier and Type Field Description
      String fileNamePrefix
      When this field is null, it is ignored; but if not null, this String will be prepended to each file-name that is saved or stored to the file-system.
      ImageScraper.FileNameRetriever getImageFileSaveName
      When this field is null, it is ignored; but if not null, each time an image is written to the file-system, this java.util.function.Function<URL, String> will be queried for a file-name before writing the the image-file to the file-system.
      long maxDownloadWaitTime
      If you do not want the downloader to hang on an image, which is sometimes an issue depending upon the site from which you are making a request, set this parameter, and the downloader will not wait past that amount of time to download an image.
      static long serialVersionUID
      boolean skipBase64EncodedImages
      This scraper has the ability to decode and save Base-64 Images, and they may be downloaded or skipped - based on this boolean.
      boolean skipOnIOException
      When this field is TRUE, if an attempt to download an image generates an exception, the exception-throw will not halt the download, but rather the image will be skipped, and download attempt will be performed on the next image in the list.
      Predicate<URL> skipURL
      When this field is null, it is ignored; but if this field is not null, then before any URL is connected for download, the downloaded mechanism will ask this URL-Predicate for permission first.
      boolean useDefaultCounterForImageFileNames
      When true, images will be saved according to a counter; when this is FALSE, the software will attempt to save these images using their original filenames - picked from the URL.
      TimeUnit waitTimeUnits
      This is the "unit of measurement" for the field long maxDownloadWaitTime.
    • Constructor Summary

      Constructors 
      Constructor Description
      AdditionalParameters()
      This constructor will return an instance of AdditionalParameters whose values provide the following MOST COMMON behaviour choices:
      ParameterValue skipOnIOExceptionTRUE useDefaultCounterForImageFileNamesTRUE skipBase64EncodedImagesFALSE All other parameters set to 'null', and will be ignored.
      AdditionalParameters​(boolean skipOnIOException, Predicate<URL> skipURL, String fileNamePrefix, boolean useDefaultCounterForImageFileNames, ImageScraper.FileNameRetriever getImageFileSaveName, boolean skipBase64EncodedImages, long maxDownloadWaitTime, TimeUnit waitTimeUnits)
      Use this constructor to instantiate this class.
    • Method Summary

      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • serialVersionUID

        🡇    
        public static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        public static final long serialVersionUID = 1;
        
      • skipOnIOException

        🡅  🡇    
        public final boolean skipOnIOException
        When this field is TRUE, if an attempt to download an image generates an exception, the exception-throw will not halt the download, but rather the image will be skipped, and download attempt will be performed on the next image in the list. The exception will be stored in the 'Results' return object.
        Code:
        Exact Field Declaration Expression:
        public final boolean skipOnIOException;
        
      • skipURL

        🡅  🡇    
        public final java.util.function.Predicate<java.net.URL> skipURL
        When this field is null, it is ignored; but if this field is not null, then before any URL is connected for download, the downloaded mechanism will ask this URL-Predicate for permission first. If this Predicate returns FALSE for a particular URL, that image will not be downloaded, and instead, skipped.
        Code:
        Exact Field Declaration Expression:
        public final Predicate<URL> skipURL;
        
      • fileNamePrefix

        🡅  🡇    
        public final java.lang.String fileNamePrefix
        When this field is null, it is ignored; but if not null, this String will be prepended to each file-name that is saved or stored to the file-system.
        Code:
        Exact Field Declaration Expression:
        public final String fileNamePrefix;
        
      • useDefaultCounterForImageFileNames

        🡅  🡇    
        public final boolean useDefaultCounterForImageFileNames
        When true, images will be saved according to a counter; when this is FALSE, the software will attempt to save these images using their original filenames - picked from the URL. Saving using a counter is the default behaviour for this class.
        Code:
        Exact Field Declaration Expression:
        public final boolean useDefaultCounterForImageFileNames;
        
      • getImageFileSaveName

        🡅  🡇    
        public final ImageScraper.FileNameRetriever getImageFileSaveName
        When this field is null, it is ignored; but if not null, each time an image is written to the file-system, this java.util.function.Function<URL, String> will be queried for a file-name before writing the the image-file to the file-system. If this field is non-null, but images are being sent to Consumer<BufferedImage, IF> downloadedImageAltTarget, rather than being saved to the file-system, then this field is also ignored.
        Code:
        Exact Field Declaration Expression:
        public final FileNameRetriever getImageFileSaveName;
        
      • skipBase64EncodedImages

        🡅  🡇    
        public final boolean skipBase64EncodedImages
        This scraper has the ability to decode and save Base-64 Images, and they may be downloaded or skipped - based on this boolean. If an Iterable<TagNode> is passed to the constructor, and one of those TagNode's contain an Image Element (<IMG SRC="data:image/jpeg;base64,...data">) this class has the ability to interpret and save the image to a regular image file. By default, Base-64 images are skipped, but they can also be downloaded as well.
        Code:
        Exact Field Declaration Expression:
        public final boolean skipBase64EncodedImages;
        
      • maxDownloadWaitTime

        🡅  🡇    
        public final long maxDownloadWaitTime
        If you do not want the downloader to hang on an image, which is sometimes an issue depending upon the site from which you are making a request, set this parameter, and the downloader will not wait past that amount of time to download an image. The default value for this parameter is 10 seconds. If you do not wish to set the max-wait-time "the download time-out" counter, then leave the parameter "waitTimeUnits" set to null, and this parameter will be ignored.
        Code:
        Exact Field Declaration Expression:
        public final long maxDownloadWaitTime;
        
      • waitTimeUnits

        🡅  🡇    
        public final java.util.concurrent.TimeUnit waitTimeUnits
        This is the "unit of measurement" for the field long maxDownloadWaitTime.

        NOTE: This parameter may be null, and if it is both this parameter and the parameter long maxDownloadWaitTime will be ignored, and the default maximum-wait-time (download time-out settings) will be used instead.

        READ: java.util.concurrent.*; package, and about the class java.util.concurrent.TimeUnit for more information.
        Code:
        Exact Field Declaration Expression:
        public final TimeUnit waitTimeUnits;
        
    • Constructor Detail

      • AdditionalParameters

        🡅  🡇    
        public AdditionalParameters​
                    (boolean skipOnIOException,
                     java.util.function.Predicate<java.net.URL> skipURL,
                     java.lang.String fileNamePrefix,
                     boolean useDefaultCounterForImageFileNames,
                     ImageScraper.FileNameRetriever getImageFileSaveName,
                     boolean skipBase64EncodedImages,
                     long maxDownloadWaitTime,
                     java.util.concurrent.TimeUnit waitTimeUnits)
        
        Use this constructor to instantiate this class. Read what each of these parameters means to the downloader, by reading the comment information for each of these fields in this class (above).
        Parameters:
        skipOnIOException - This will "skip" an image, and prevent the downloading process from halting if an image fails to download
        skipURL - A java Predicate for deciding which images should be skipped. This parameter may be 'null.' If it is, it will be ignored, and the downloader will attempt to download all images.
        fileNamePrefix - A standard Java-String may be inserted before the file-name of each image downloaded, as a 'file-name prefix'. This parameter may be null, and if it is file-name prefixes will not be used.
        useDefaultCounterForImageFileNames - It is usually a good idea to replace the file-name for an image retrieved from a web-site with a simple, three-digit, counter-name. Image file names on a web-site can often be long PKID Strings obtained from SQL database queries. To use a standard "counter" set this parameter to TRUE.
        getImageFileSaveName - This parameter may be used to convert image file-names used on a web-page to user-generated image-file-names. This parameter may be null, and if it is - it will be ignored. If this parameter is non-null, it takes precedence over the boolean passed to parameter 'useDefaultCounterForImageFileNames'
        skipBase64EncodedImages - This will order the downloader to convert and save HTML Image Elements whose image-data was encoded into HTML Element, itself, using Base-64 Image-Encoding. Thumbnails and other small images are sometimes stored on web-pages using such encoding.
        maxDownloadWaitTime - This parameter will be ignored unless a non-null value has been passed to parameter 'waitTimeUnits'. This may be used to prevent the downloader from hanging when collecting images for a web-page.
        waitTimeUnits - This is java class TimeUnit parameter for describing what units are being used for the previous parameter, 'maxDownloadWaitTime'.
        Code:
        Exact Constructor Body:
         this.skipOnIOException                      = skipOnIOException;
         this.skipURL                                = skipURL;
         this.fileNamePrefix                         = fileNamePrefix;
         this.useDefaultCounterForImageFileNames     = useDefaultCounterForImageFileNames;
         this.getImageFileSaveName                   = getImageFileSaveName;
         this.skipBase64EncodedImages                = skipBase64EncodedImages;
         this.maxDownloadWaitTime                    = maxDownloadWaitTime;
         this.waitTimeUnits                          = waitTimeUnits;
        
         if (maxDownloadWaitTime < 0) throw new IllegalArgumentException(
             "You have passed a negative number for parameter maxDownloadWaitTime, and this is " +
             "not allowed here."
         );
        
      • AdditionalParameters

        🡅    
        public AdditionalParameters()
        This constructor will return an instance of AdditionalParameters whose values provide the following MOST COMMON behaviour choices:
        ParameterValue
        skipOnIOExceptionTRUE
        useDefaultCounterForImageFileNamesTRUE
        skipBase64EncodedImagesFALSE
        All other parameters set to 'null', and will be ignored.
        Code:
        Exact Constructor Body:
         this(true, null, null, true, null, false, 0, null);