Package Torello.HTML.Tools.NewsSite
Class NewsSites
- java.lang.Object
-
- Torello.HTML.Tools.NewsSite.NewsSites
-
public class NewsSites extends java.lang.Object
This class is nothing more than an 'Example Class' that contains some foreign-language based news web-pages, from both overseas and from Latin America.
This class provides five example News Websites with all of the necessary configurations that would be passed toScrapeURLs
, and (subsequently)ScrapeArticles
.
The following news-oriented web-sites are provided in this "example" (of sorts) class.- https://abc.es
- https://elnacional.com
- https://elespectador.com
- https://www.gov.cn
- https://elpulso.mx
Side Note: Scraping major Associated Press news-sites such as Fox-News, CNN, MSNBC, and Yahoo! News is not a problem for this software - although taking both spiritual and moral stances against the terror that these organizations have caused the world is largely the driving force behind wanting to scrape foreign news sites.
Hi-Lited Source-Code:- View Here: Torello/HTML/Tools/NewsSite/NewsSites.java
- Open New Browser-Tab: Torello/HTML/Tools/NewsSite/NewsSites.java
File Size: 36,013 Bytes Line Count: 748 '\n' Characters Found
-
-
Field Summary
Example of (Extremely-Simple) News Web-Sites: Instantiated Singleton Constants Modifier and Type Field static NewsSite
ABCES
static NewsSite
ElEspectador
static NewsSite
ElNacional
static NewsSite
GovCN
static NewsSite
GovCNCarousel
static NewsSite
Pulso
-
Method Summary
Functional-Interface Lambda-Target Methods (Functions for 'Function-Pointers') Modifier and Type Method static Vector<String>
ABC_LINKS_GETTER(URL url, Vector<HTMLNode> page)
static Vector<String>
EL_ESPECTADOR_LINKS_GETTER(URL url, Vector<HTMLNode> page)
static Vector<String>
EL_NACIONAL_LINKS_GETTER(URL url, Vector<HTMLNode> page)
static Vector<String>
GOVCN_CAROUSEL_LINKS_GETTER(URL url, Vector<HTMLNode> page)
Command Line Invocation Methods Modifier and Type Method static void
main(String[] argv)
static void
runExample()
-
-
-
Field Detail
-
ABCES
public static final NewsSite ABCES
This is theNewsSite
definition for the Newspaper located at:https://www.abc.es/
.Parameter Significance Newspaper Name ABC España Country of Origin Spain Website URL https://abc.es
Newspaper Printing Language Spanish Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File StrFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) Filter'HREF'
must end with'.html'
See:StrFilter.comparitor(TextComparitor, String[])
See:TextComparitor.EW_CI
LinksGet
Used to manually retrieve Article-Link URL's
Invokes method ABC_LINKS_GETTER(URL, Vector)
ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <MAIN>...</MAIN>
See:ArticleGet.usual(String)
View a copy of the logs that are generated from using thisNewsSite
instance.-
ABC.ES ScrapeURLs LOG
-
ScrapeArticles
IMPORTANT NOTE: ThoughScrapeURL's
code will check for duplicateURL's
that may be returned within any given-section,Article URL's
may be repeated among the different sections of the newspaper. Since theURL
-scrape returned nearly 3,000 articles, the log of anArticle
scrape is not included here. Proper duplicateURL
checking code has obviously been written, but would be too complicated to show in this example.
CHANGE: There are no guarantees when scraping HTML from the Internet. If any of the news-providers in this example-class were to modify or update the HTML that servers their news-stories, there is a real chance that the "Getters" and "Filters" in these examples would no longer be valid. It is important to realize, though, that although the HTML wrappers for theArticle Bodies
or alsoArticle Links
might change on the source news-site... updating theLinks
andArticle
Getters (or theLinks
Filter) is at most a change of 5 lines of code.
If at some point, use of this class results in a long stream of messages indicating that noArticle URL
-Links were identified, or that theArticle-Bodies
failed to be extracted, simply look at the raw-HTML from the site and change the getters orRegular-Expressions
accordingly.
NOTE: The logs included in this class' documentation were generated by scrapes in September of 2020. -
-
Pulso
public static final NewsSite Pulso
This is theNewsSite
definition for the Newspaper located at:https://www.elpulso.mx/
.Parameter Significance Newspaper Name El Pulso, México Country of Origin México Website URL https://elpulso.mx
Newspaper Printing Language Spanish Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File StrFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) FilterHREF
must match:http://some.domain/YYYY/MM/DD/<article-name>/
LinksGet
Used to manually retrieve Article-Link URL's
null
. Retrieves all Anchor-Links on a Section-Page. Note thatURL's
must still pass the previousStrFilter
(above) in order to be parsed asArticle
's.ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <DIV CLASS="entry-content">...</DIV>
See:ArticleGet.usual(TextComparitor, String[])
See:TextComparitor.C
-
ElNacional
public static final NewsSite ElNacional
This is theNewsSite
definition for the Newspaper located at:https://www.elnacional.com/
.Parameter Significance Newspaper Name El Nacional Country of Origin Venezuela Website URL https://elnacional.com
Newspaper Printing Language Spanish Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File URLFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) Filternull
. TheLinksGet
provided here will only return validArticle URL's
, so there is no need for aURLFilter
.LinksGet
Used to manually retrieve Article-Link URL's
Invokes method EL_NACIONAL_LINKS_GETTER(URL, Vector)
ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <ARTICLE>...</ARTICLE>
See:ArticleGet.usual(String)
View a copy of the logs that are generated from using thisNewsSite
.
CHANGE: There are no guarantees when scraping HTML from the Internet. If any of the news-providers in this example-class were to modify or update the HTML that servers their news-stories, there is a real chance that the "Getters" and "Filters" in these examples would no longer be valid. It is important to realize, though, that although the HTML wrappers for theArticle Bodies
or alsoArticle Links
might change on the source news-site... updating theLinks
andArticle
Getters (or theLinks
Filter) is at most a change of 5 lines of code.
If at some point, use of this class results in a long stream of messages indicating that noArticle URL
-Links were identified, or that theArticle-Bodies
failed to be extracted, simply look at the raw-HTML from the site and change the getters orRegular-Expressions
accordingly.
NOTE: The logs included in this class' documentation were generated by scrapes in September of 2020.
-
ElEspectador
public static final NewsSite ElEspectador
This is theNewsSite
definition for the Newspaper located at:https://www.elespectador.com/
.Parameter Significance Newspaper Name El Espectador Country of Origin Columbia Website URL https://elespectador.com
Newspaper Printing Language Spanish Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File StrFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) FilterHREF
must end with a forward-slash'/'
character.
See:TextComparitor.ENDS_WITH
LinksGet
Used to manually retrieve Article-Link URL's
Invokes method EL_NACIONAL_LINKS_GETTER(URL, Vector)
ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <DIV CLASS="l-main">...</DIV>
See:ArticleGet.usual(TextComparitor, String[])
See:TextComparitor.C
View a copy of the logs that are generated from using thisNewsSite
.
CHANGE: There are no guarantees when scraping HTML from the Internet. If any of the news-providers in this example-class were to modify or update the HTML that servers their news-stories, there is a real chance that the "Getters" and "Filters" in these examples would no longer be valid. It is important to realize, though, that although the HTML wrappers for theArticle Bodies
or alsoArticle Links
might change on the source news-site... updating theLinks
andArticle
Getters (or theLinks
Filter) is at most a change of 5 lines of code.
If at some point, use of this class results in a long stream of messages indicating that noArticle URL
-Links were identified, or that theArticle-Bodies
failed to be extracted, simply look at the raw-HTML from the site and change the getters orRegular-Expressions
accordingly.
NOTE: The logs included in this class' documentation were generated by scrapes in September of 2020.
-
GovCNCarousel
public static final NewsSite GovCNCarousel
This is theNewsSite
definition for the Newspaper located at:https://www.gov.cn/
.
The "Carousels" are just the emphasized or "HiLighted" links that are on three separate pages. There is a complete-linkNewsSite
definition that will retrieve all links - not just the links hilited by the carousel.Parameter Significance Newspaper Name Chinese Government Web Portal Country of Origin People's Republic of China Website URL https://gov.cn
Newspaper Printing Language Mandarin Chinese Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File StrFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) FilterHREF
must match:"^http://www.gov.cn/(?:.+?/)?\\d{4}-\\d{2}/\\d{2}/(?:.+?/)?content_\\d+.htm(?:l)?(#\\d+)?"
LinksGet
Used to manually retrieve Article-Link URL's
Invokes method GOVCN_CAROUSEL_LINKS_GETTER(URL, Vector)
ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <DIV CLASS="article ...">...</DIV>
See:ArticleGet.usual(TextComparitor, String[])
See:TextComparitor.C
View a copy of the logs that are generated from using thisNewsSite
.
CHANGE: There are no guarantees when scraping HTML from the Internet. If any of the news-providers in this example-class were to modify or update the HTML that servers their news-stories, there is a real chance that the "Getters" and "Filters" in these examples would no longer be valid. It is important to realize, though, that although the HTML wrappers for theArticle Bodies
or alsoArticle Links
might change on the source news-site... updating theLinks
andArticle
Getters (or theLinks
Filter) is at most a change of 5 lines of code.
If at some point, use of this class results in a long stream of messages indicating that noArticle URL
-Links were identified, or that theArticle-Bodies
failed to be extracted, simply look at the raw-HTML from the site and change the getters orRegular-Expressions
accordingly.
NOTE: The logs included in this class' documentation were generated by scrapes in September of 2020.
-
GovCN
public static final NewsSite GovCN
This is theNewsSite
definition for the Newspaper located at:https://www.gov.cn/
.
This version of the "Gov.CN" website will scour a larger set of sectionURL's
, and will not limit the returned Article-Links to just those found on the java-script carousel. The Java-Script Carousel will almost always have a total of five news-article links available. This definition of'NewsSite'
may return up to thirty to forty different articles per news-section.Parameter Significance Newspaper Name Chinese Government Web Portal Country of Origin People's Republic of China Website URL https://gov.cn
Newspaper Printing Language Mandarin Chinese Parameter Purpose Value Newspaper Article Groups / Sections Scrape Sections Retrieved from Data File StrFilter
News Web-Site Section-Page Aritlce-Link ( <A HREF=...>
) FilterHREF
must match:"^http://www.gov.cn/(?:.+?/)?\\d{4}-\\d{2}/\\d{2}/(?:.+?/)?content_\\d+.htm(?:l)?(#\\d+)?"
LinksGet
Used to manually retrieve Article-Link URL's
null
. Retrieves all Anchor-Links on a Section-Page. Note thatURL's
must still pass the previousStrFilter
(above) in order to be parsed asArticle
's.ArticleGet
Retrieves Article-Body Content from an Article-Link Web-Page <DIV CLASS="article ...">...</DIV>
See:ArticleGet.usual(TextComparitor, String[])
See:TextComparitor.C
View a copy of the logs that are generated from using thisNewsSite
.
CHANGE: There are no guarantees when scraping HTML from the Internet. If any of the news-providers in this example-class were to modify or update the HTML that servers their news-stories, there is a real chance that the "Getters" and "Filters" in these examples would no longer be valid. It is important to realize, though, that although the HTML wrappers for theArticle Bodies
or alsoArticle Links
might change on the source news-site... updating theLinks
andArticle
Getters (or theLinks
Filter) is at most a change of 5 lines of code.
If at some point, use of this class results in a long stream of messages indicating that noArticle URL
-Links were identified, or that theArticle-Bodies
failed to be extracted, simply look at the raw-HTML from the site and change the getters orRegular-Expressions
accordingly.
NOTE: The logs included in this class' documentation were generated by scrapes in September of 2020.
-
-
Method Detail
-
runExample
public static void runExample() throws java.io.IOException
This example will run the news-site scrape on the Chinese Government News Article Carousel.
IMPORTANT NOTE: This will method will create a directory called "cnb" on your file-system where it will write the contents of (most likely) 15 news-paper articles to disk as HTML files. The output log generated by this method may be viewed here:Gov.CN.log.html
- Throws:
java.io.IOException
- This throws for IO errors that may occur when reading the web-server, or when saving the web-pages or images to the file-system.- See Also:
FileRW.delTree(String, boolean, Appendable)
,NewsSite
,FileRW.writeFile(CharSequence, String)
,C.toHTML(String, boolean, boolean, boolean)
- Code:
- Exact Method Body:
StorageWriter log = new StorageWriter(); // This directory will contain ".dat" files that are simply "Serialized" HTML Vectors. // Each ".dat" file will contain precisely one HTML page. final String dataFilesDir = "cnb" + File.separator + "articleData" + File.separator; // This directory will contain sub-directories with ".html" files (and image-files) // for each news-article that is saved / downloaded. final String htmlFilesDir = "cnb" + File.separator + "articleHTML" + File.separator; // This CLEARS WHATEVE DATA IS CURRENTLY IN THE DIRECTORY (by deleting all its contents) // The following code is the same as the UNIX Shell Command: // rm -r cnb/articleData/ // mkdir cnb/articleData FileRW.delTree(dataFilesDir, true, log); // The following code is the same as the UNIX Shell Command: // rm -r cnb/articleHTML/ // mkdir cnb/articleHTML FileRW.delTree(htmlFilesDir, true, log); // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // Previous Download Data Erased (if any), Start today's News-Site Scrape // *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** // Use the "GovCNCarousel" instance that is created in this class as a NewsSite NewsSite ns = NewsSites.GovCNCarousel; // Call the "Scrape URLs" class to retrieve all of the available newspaper articles // on the Java-Script "Article Carousel" Again, the "Article Carousel" is just this // little widget at the top of the page that rotates (usually) five hilited / emphasized // news-article links for today Vector<Vector<String>> articleURLs = ScrapeURLs.get(ns, log); // This is usually not very important if only a small number of articles are being // scraped. When downloading hundreds of articles - being able to pause if there is a // web-site IOError (And restart) is very important. // // The standard factory-generated "getFSInstance" creates a small file on the file-system // for saving the "Download State" while downloading... Pause pause = Pause.getFSInstance("cnb" + File.separator + "state.dat"); pause.initialize(); // The "Scraped Articles" will be sent to the directory named by "dataFilesDir" // Using the File-System to save these articles is the default-factory means for // saving article-data. Writing a customized "ScapedArticleReceiver" to do anything // from saving article-data to a Data-Base up to and including e-mailing article data // is possible using a self-written "ScrapedArticleReceiver" ScrapedArticleReceiver receiver = ScrapedArticleReceiver.saveToFS(dataFilesDir); // This will download each of the article's from their web-page URL. The web-page // article URL's were retrieved by "Scraped URLs". The saved HTML (as HTML Vectors) // is sent to the "Article Receiver" (defined in the previous step). These news articles // are saved as ".dat" since they are serialized java-objects. // // Explaining some "unnamed parameters" passed to the method invocation below: // // true: [skipArticlesWithoutPhotos] Skips Mandarin Chinese Newspaper Articles that do not // include at least one photo. Photos usually help when reading foreign news articles. // null: [bannerAndAdFinder] Some sites include images for Facebook links or advertising. // Gov.CN usually doesn't have these, but occasionally there are extraneous links. // for the purposes of this example, this parameter is ignored, and passed null. // false: [keepOriginalPageHTML] The "Complete Page" - content before the Article Body is // extracted from the Article Web-Page is not saved. This can occasionally be useful // if the HTML <HEAD>...</HEAD> has JSON or React-JS data to extract. ScrapeArticles.download (receiver, articleURLs, ns.articleGetter, true, null, false, pause, log); // Now this will convert each of the ".dat" files to an ".html" file - and also it // will download the pictures / image included in the article. // // Explaining some "unnamed parameters" passed to the method invocation below: // // true: [cleanIt] This runs some basic HTML remove operations. The best way to see // what the parameter "cleanIt" asks to have removed is to view the class "ToHTML" // null: [HTMLModifier] Cleaning up other extraneous links and content in an newspaper // article body like advertising or links to other articles is usually necessary. // Anywhere between 1 and 10 lines of NodeSearch Removal Operations will get rid of // unnecessary HTML. For the purposes of this example, such a cleaning operation is // not done here - although the final articles do include some "links to other // articles" that is not "CLEANED" like it should be. ToHTML.convert(dataFilesDir, htmlFilesDir, true, null, log); // NOTE: The log of running this command on Debian UNIX / LINUX may be viewed in the // JavaDoc Comments in the top of this method. If this method is run in an MS-DOS // or Windows Environment, there will be no screen colors available to view. FileRW.writeFile( C.toHTML(log.getString(), true, true, true), "cnb" + File.separator + "Gov.CN.log.html" );
-
main
public static void main(java.lang.String[] argv) throws java.io.IOException
Prints the contents of the Data File. Invoking this command allows a programmer to see which "sub-sections" are ascribed to each of the different news-paper definitions in this class. Each "sub-section" is nothing more than aURL
-branch of the primary web siteURL
.
HTML Elements:
<!-- If the following were the primary news-site --> http://news.baidu.com <!-- This would be a "sub-section" of the primary site --> http://news.baidu.com/sports
Can be called from the command line.
If a single command-line argument is passed to"argv[0]"
, the contents of the "Sections URL Data File" will be output to a text-file that is named using theString
passed to"argv[0]"
.- Parameters:
argv
- These are the command line arguments passed by the JRE to this method.- Throws:
java.io.IOException
- If there are any problems while attempting to save the output to the the output file (if one was named / requested).- Code:
- Exact Method Body:
// Uncomment this line to run the example code (instead of section-data print) // runExample(); System.exit(0); // The data-file is loaded into private field "newsPaperSections" // This private field is a Hashtable<String, Vector<URL>>. Convert each of // these sections so that they may be printed to terminal and maybe to a text // file. StringBuilder sb = new StringBuilder(); for (String newspaper : newsPaperSections.keySet()) { sb.append(newspaper + '\n'); for (URL section : newsPaperSections.get(newspaper)) sb.append(section.toString() + '\n'); sb.append("\n\n***************************************************\n\n"); } String s = sb.toString(); System.out.println(s); // If there is a command-line parameter, it shall be interpreted a file-name. // The contents of the "sections data-file" (as text) will be written a file on the // file-system using the String-value of "argv[0]" as the name of the output-filename. if (argv.length == 1) FileRW.writeFile(s, argv[0]);
-
ABC_LINKS_GETTER
public static java.util.Vector<java.lang.String> ABC_LINKS_GETTER (java.net.URL url, java.util.Vector<HTMLNode> page)
The News Site at address:"https://www.abc.es/"
is slightly more complicated when retrieving News-Article Links.
Notice that each newspaper articleURL
-link is "wrapped" in an HTML'<ARTICLE>...</ARTICLE>'
Element.
If this code were translated into an "XPath Query" or "CSS Selector", it would read:article a
. Specifically it says to find all'Anchor'
elements that are descendants of'Article'
Elements.- See Also:
TagNodeFindL1Inclusive.all(Vector, String)
,TagNodeGet.first(Vector, int, int, TC, String[])
,TagNode.AV(String)
- Code:
- Exact Method Body:
Vector<String> ret = new Vector<>(); TagNode tn; String urlStr; // Links are kept inside <ARTICLE> ... </ARTICLE> on the main / section page. for (DotPair article : TagNodeFindL1Inclusive.all(page, "article")) // Now find the <A HREF=...> ... </A> if ((tn = TagNodeGet.first(page, article.start, article.end, TC.OpeningTags, "a")) != null) if ((urlStr = tn.AV("href")) != null) ret.add(urlStr); return ret;
-
EL_NACIONAL_LINKS_GETTER
public static java.util.Vector<java.lang.String> EL_NACIONAL_LINKS_GETTER (java.net.URL url, java.util.Vector<HTMLNode> page)
The News Site at address:"https://www.ElNacional.com/"
is slightly more complicated when retrieving News-Article Links.
Notice that each newspaper articleURL
-link is "wrapped" in an HTML'<DIV CLASS="td-module-thumb">...</DIV>'
Element.
If this code were translated into an "XPath Query" or "CSS Selector", it would read:div.td-module-thumb a
. Specifically it says to find all'Anchor'
elements that are descendants of'DIV'
Elements where said Divider's CSSCLASS
contains'td-module-thumb'
.- See Also:
InnerTagFindInclusive.all(Vector, String, String, TextComparitor, String[])
,TagNodeGet.first(Vector, int, int, TC, String[])
,TagNode.AV(String)
- Code:
- Exact Method Body:
Vector<String> ret = new Vector<>(); TagNode tn; String urlStr; // Links are kept inside <DIV CLASS=td-module-thumb> ... </DIV> on the main / section page. for (DotPair article : InnerTagFindInclusive.all (page, "div", "class", TextComparitor.C, "td-module-thumb")) // Now find the <A HREF=...> ... </A> if ((tn = TagNodeGet.first (page, article.start, article.end, TC.OpeningTags, "a")) != null) if ((urlStr = tn.AV("href")) != null) ret.add(urlStr); return ret;
-
EL_ESPECTADOR_LINKS_GETTER
public static java.util.Vector<java.lang.String> EL_ESPECTADOR_LINKS_GETTER (java.net.URL url, java.util.Vector<HTMLNode> page)
The News Site at address:"https://www.ElEspectador.com/"
is slightly more complicated when retrieving News-Article Links.
Notice that each newspaper articleURL
-link is "wrapped" in an HTML'<DIV CLASS="Card ...">...</DIV>'
Element.
If this code were translated into an "XPath Query" or "CSS Selector", it would read:div.Card a.card-link
. Specifically it says to find all'Anchor'
elements whose CSSClass
contains'card-link'
and which are descendants of'DIV'
Elements where said Divider's CSSCLASS
contains'Card'
.- See Also:
InnerTagFindInclusive.all(Vector, String, String, TextComparitor, String[])
,InnerTagGet.first(Vector, int, int, String, String, TextComparitor, String[])
,TagNode.AV(String)
- Code:
- Exact Method Body:
Vector<String> ret = new Vector<>(); TagNode tn; String urlStr; // Links are kept inside <DIV CLASS="Card ..."> ... </DIV> on the main / section page. for (DotPair article : InnerTagFindInclusive.all (page, "div", "class", TextComparitor.C, "Card")) // Now find the <A CLASS="card-link" HREF=...> ... </A> if ((tn = InnerTagGet.first (page, article.start, article.end, "a", "class", TextComparitor.C, "card-link")) != null) if ((urlStr = tn.AV("href")) != null) ret.add(urlStr); return ret;
-
GOVCN_CAROUSEL_LINKS_GETTER
public static java.util.Vector<java.lang.String> GOVCN_CAROUSEL_LINKS_GETTER (java.net.URL url, java.util.Vector<HTMLNode> page)
The News Site at address:"https://www.gov.cn/"
has a Java-Script "Links Carousel". Essentially, there is a section with "Showcased News Articles" that are intended to be emphasize anywhere between four and eight primary articles.
This Links-Carousel is wrapped in an HTML Divider Element as below:<DIV CLASS="slider-carousel">
.
If this code were translated into an "XPath Query" or "CSS Selector", it would read:div[class=slider-carousel] a
. Specifically it says to find all'Anchor'
elements that are descendants of'<DIV CLASS="slider-carousel">'
Elements.- See Also:
InnerTagGetInclusive.first(Vector, String, String, TextComparitor, String[])
,TagNodeGet.all(Vector, TC, String[])
,TagNode.AV(String)
- Code:
- Exact Method Body:
Vector<String> ret = new Vector<>(); String urlStr; // Find the first <DIV CLASS="slider-carousel"> ... </DIV> section Vector<HTMLNode> carouselDIV = InnerTagGetInclusive.first (page, "div", "class", TextComparitor.CN_CI, "slider-carousel"); // Retrieve any HTML Anchor <A HREF=...> ... </A> found within the contents of the // Divider. for (TagNode tn: TagNodeGet.all(carouselDIV, TC.OpeningTags, "a")) if ((urlStr = tn.AV("href")) != null) ret.add(urlStr); return ret;
-
-