Package Torello.Browser
Class Example01
- java.lang.Object
-
- Torello.Browser.Example01
-
public class Example01 extends java.lang.Object
An example of this package's utility. This class is used to initiate a connection to a headless Chrome-Instace, and visiting a page.
Viewing the Output:
The text output which is generated by this Example - the text printed to the Terminal Output - may be viewed in the link below:Example01.out.html
Installing Chrome in GCP Cloud Shell:
These are the commands that I type inside of a GCP (Google Cloud Platform) Debian Terminal/Shell to make sure that a Chrome Headless Browser is working. ChatGPT explained it to me, and wrote me a shell script to do the installation. I only do development on cloud servers, rather than local machines. I use laptops way too much.
π If you are programming using your own computer, you likely already have a CDP compatible web browser installed. You should skip the intallation step completely, if so.
βοΈ If you need to install chome, here's the script that A.I. wrote for me in the summer of '25. It still work great in GCP.
UNIX or DOS Shell Command:
## # Update package list sudo apt-get update # Install just the essentials for headless Chrome sudo apt-get install -y \ fonts-liberation \ libnss3 \ libatk1.0-0 \ libxss1 \ libgdk-pixbuf2.0-0 \ libgtk-3-0 \ libasound2t64 \ libnspr4 \ xdg-utils \ wget \ ca-certificates # Download Chrome manually wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb # Install Chrome and auto-fix dependencies sudo dpkg -i google-chrome-stable_current_amd64.deb || sudo apt-get -fy install
The above Shell-Commands, again, were generated by Chat-GPT on July 11th, 2025. They seem to have produced a perfect working copy being installed inside my Linux-Instance, without any errors occurring. The generated by the above commands are reproduced here.
Starting Chrome in the Cloud:
Once Google Chrome has been installed in your GCP Cloud Shell environment, you can start a headless Chrome instance that continues running in the background β even if you hit^C, close your terminal, or go refill your drink at Starbucks.
This isn't the same as launching a full Compute Engine instance β you're just spinning up a background terminal process inside your ephemeral Cloud Shell session. The process will live until you shut down your shell, or until the session times out.
To launch Chrome headlessly in a way that ignores^Cand keyboard input:
UNIX or DOS Shell Command:
nohup google-chrome --headless --disable-gpu --remote-debugging-port=9222 \ --no-sandbox --disable-dev-shm-usage > /dev/null 2>&1 & disown
This command uses:nohup- Prevents the process from dying when the terminal closes or is interrupted.&- Puts the Chrome process in the background immediately.disown- Detaches the process from the shell's job control, so^Chas no effect.
To check if Chrome is running later:
UNIX or DOS Shell Command:
ps aux | grep '[g]oogle-chrome' ## The above command should produce output such as: narrati+ 5916 5.9 1.5 34396956 249664 pts/3 S<l 20:20 0:01 /usr/bin/google-chrome ...
To kill the headless Chrome instance when you're done:
UNIX or DOS Shell Command:
pkill -f 'google-chrome.*--headless' ## To kill by Process-ID kill <PID>Page originally drafted by ChatGPT on July 11th, 2025.
Edited and formatted by Ralph Torello for use in the Java HTML Library documentation.
Hi-Lited Source-Code:- View Here: Torello/Browser/Example01.java
- Open New Browser-Tab: Torello/Browser/Example01.java
File Size: 15,133 Bytes Line Count: 388 '\n' Characters Found
-
-
Field Summary
Fields Modifier and Type Field protected static ConnRecordconnRecprotected static StringsamAltmanURL
-
Method Summary
Main Method Modifier and Type Method static voidmain(String[] argv)Example Steps Modifier and Type Method protected static WebSocketSenderSTEP_01_openBrowserWebSocket()protected static voidSTEP_02_closeAllPages(WebSocketSender bws)protected static StringSTEP_03_openSamAltmanPage(WebSocketSender bws)protected static WebSocketSenderSTEP_04_getPageWebSocket(String targetID)protected static StringSTEP_05_runJavaScript(WebSocketSender pws)protected static String[]STEP_06_extractImageURLs(String html)protected static voidSTEP_07_downloadImages(String[] imageURLs)
-
-
-
Field Detail
-
samAltmanURL
protected static final java.lang.String samAltmanURL
The URL that is being scraped in this example- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
protected static final String samAltmanURL = "https://en.wikipedia.org/wiki/Sam_Altman";
-
connRec
protected static final ConnRecord connRec
- Code:
- Exact Field Declaration Expression:
protected static final ConnRecord connRec = new ConnRecord();
-
-
Method Detail
-
main
public static void main(java.lang.String[] argv) throws java.lang.Exception
This class is intended to be invoked from the Command Line.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
// Opening a WebSocket Browser-Connection to the currently running Chrome-Instance final WebSocketSender bws = STEP_01_openBrowserWebSocket(); // Close any currently opened pages / tabs inside the browser STEP_02_closeAllPages(bws); // Open a Browser-Page (using 'bws') for reading Sam Altman's Wikipedia Profile final String targetID = STEP_03_openSamAltmanPage(bws); // Create / Build a WebSocket-Connection object to the newly opened Sam Altman Page. final WebSocketSender pws = STEP_04_getPageWebSocket(targetID); // Execute some Java-Script so that the scrape code may run final String html = STEP_05_runJavaScript(pws); // Print the Image-URL's, retrieve those URL's too final String[] imgURLs = STEP_06_extractImageURLs(html); // Download the Images into a download folder STEP_07_downloadImages(imgURLs); bws.disconnect(); pws.disconnect();
-
STEP_01_openBrowserWebSocket
protected static WebSocketSender STEP_01_openBrowserWebSocket () throws java.lang.Exception
This method demonstrates the first step in connecting to Chrome via the Chrome DevTools Protocol (CDP). It launches a headless instance of Chrome with remote debugging enabled and establishes the primary WebSocket connection that will be used for all subsequent CDP communication. This connection targets the browser-level control endpoint, not a tab-specific page socket.
Internally, the method starts Chrome with a--remote-debugging-port=9222flag, waits a few seconds to ensure Chrome is fully initialized, and queries the/json/versionendpoint to retrieveWebSocketmetadata. It uses that metadata to construct aWebSocketSenderfor JSON request-response communication with Chrome.
If you're trying to automate or control browser behavior from Java, this is where it all begins: getting a workingWebSocketconnection to the Chrome backend.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
Printing.notice("Opening a WebSocket Browser Connection..."); final BrowserConn browserConn = BrowserConn.getBrowserConn(9222, false); System.out.println( '\n' + BCYAN + "Example01.java: " + RESET + BRED + "Opened Browser Connection:\n" + RESET + browserConn.toString() ); final WebSocketSender bws = browserConn.createSender(Example01.connRec); // Chat-GPT once suggested this line. I just haven't removed it. It's not hurting anyone! Thread.sleep(1000); return bws;
-
STEP_02_closeAllPages
protected static void STEP_02_closeAllPages(WebSocketSender bws) throws java.lang.Exception
This step closes all existing pages (i.e., browser tabs) currently open in the Chrome instance. CDP allows enumeration of all tabs via a call to/json/list, and each tab provides atargetIdproperty that can be passed toTarget.closeTarget
The method callsTarget.getTargets()to obtain all open targets, then iterates through them and sends aTarget.closeTarget(tID)command for each one that represents a page. This is useful to start from a clean browser state before performing automation.
If Chrome was already running with many tabs open, this call helps ensure that subsequent tab-based automation starts in a predictable environment.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
Printing.notice("Closing All Currently Open Pages, using BrowserConn"); // This is currently unused. I used to filter for only the opened Wiki-Pages, but now this // method simply closes every open page. No sense in deleting this line, though final Predicate<Target.TargetInfo> isSamAltman = (Target.TargetInfo t) -> t.type.equals("page") && (t.url != null) && (t.url.startsWith(samAltmanURL)); System.out.println ('\n' + BCYAN + "Example01.java: " + RESET + "Getting all tabs..."); final Target.TargetInfo[] allTabs = Target .getTargets(null /* FilterEntry[] */) .exec(bws) .await(); System.out.println ('\n' + BCYAN + "Example01.java: " + RESET + "Found " + allTabs.length + " tabs."); if (allTabs.length > 0) for (int i = 0; i < allTabs.length; i++) { final String tid = allTabs[i].targetId; System.out.println(BRED + "Closing Tab: " + RESET + tid); Target.closeTarget(tid).exec(bws).await(); }
-
STEP_03_openSamAltmanPage
protected static java.lang.String STEP_03_openSamAltmanPage (WebSocketSender bws) throws java.lang.Exception
This step creates a new browser tab (a new "target") by invokingTarget.createTargetwith a specific URL β in this case, the Sam Altman Wikipedia page. This uses theWebSocketSenderconnection previously established to send a CDP request and parse the result.
The return value is aTarget.TargetID, as ajava.lang.Stringobject containing the tab identifier, which will be used in the next step to get its associated WebSocket.
This step illustrates how CDP allows opening URLs without user interaction β one of the key features that powers headless automation.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
Printing.notice("Opening a Sam Altman Wikipedia Page, using BrowserConn."); final String targetID = Target .createTarget() .accept("url", samAltmanURL) .build() .exec(bws) .await(); final Target.TargetInfo targetInfo = Target .getTargetInfo(targetID) .exec(bws) .await(); System.out.println( '\n' + BCYAN + "Example01.java: " + RESET + BRED + "Created New Tab:\n" + RESET + targetInfo.toString() ); // I leave these one second delays here. AGAIN - Chat-GPT suggested them to me once. // Chat-GPT, in every sense of the word, knows more about my code than I do! (The CDP // Protocol is a very well understood protocol - just not in Java so much) Thread.sleep(1000); return targetID;
-
STEP_04_getPageWebSocket
protected static WebSocketSender STEP_04_getPageWebSocket (java.lang.String targetID) throws java.lang.Exception
Once a tab is opened with a knowntargetId, this step retrieves the specific WebSocket endpoint associated with that tab. CDP uses one WebSocket per tab, and this is necessary for interacting with page-level domains such asPage,Runtime, orDOM.
The method uses the/json/listHTTP-Endpoint to get metadata for all tabs and filters bytargetIdto find the matchingwebSocketDebuggerUrl. Then, it opens a newWebSocketSenderfor that tab.
From this point forward, CDP messages targeting the loaded page must use this tab-specificWebSocket.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
Printing.notice("Create PageConn Web-Socket Connection to Altman's Wiki"); // Attach to that Sam Altman Page (switch to tab-level WebSocket) final PageConn pageConn = PageConn .getAllPageConn(9222, false) .filter((PageConn pc) -> pc.id.equals(targetID)) .findFirst() .orElseThrow(() -> new RuntimeException("The Page-Connection was Not found !!!")); System.out.println( '\n' + BCYAN + "Example01.java: " + RESET + BRED + "Found Page Connection to Sam Altman Wiki:\n" + RESET + pageConn.toString() ); final WebSocketSender pws = pageConn.createSender(Example01.connRec); // I think this is the last one... Wait 1 second, it might make a difference while the // page actually loads, and the Web-Socket connects... I have no idea! It's just 1 second! Thread.sleep(1000); return pws;
-
STEP_05_runJavaScript
protected static java.lang.String STEP_05_runJavaScript (WebSocketSender pws) throws java.lang.Exception
Before sending any JavaScript commands to the browser tab, certain CDP domains must be enabled. This method sendsPage.enable()andRunTime.enable()commands to inform Chrome that you intend to receive events and execute script.
Without this step, attempts to run JavaScript, viaRunTime.evaluate(), would fail or be ignored. Enabling the domains registers yourWebSocket> session as a subscriber for those event types.
Think of this as turning on the light switches β telling Chrome what features you intend to use during the session.- Throws:
java.lang.Exception- Code:
- Exact Method Body:
Printing.notice("Execute the needed Java Script, so the Scraper can Run"); // Enable the Page domain System.out.println('\n' + BCYAN + "Example01.java: " + RESET + "Page.enable()"); Page.enable(null /* Boolean */).exec(pws).await(); // Enable the DOM domain System.out.println('\n' + BCYAN + "Example01.java: " + RESET + "DOM.enable()"); DOM.enable(null /* String */).exec(pws).await(); // Enable the Runtime domain System.out.println('\n' + BCYAN + "Example01.java: " + RESET + "RunTime.enable()"); RunTime.enable().exec(pws).await(); // This is the actual last one. Make sure that the DOM & RunTime modules are running! Thread.sleep(1000); // 5. Evaluate the HTML via JavaScript System.out.println('\n' + BCYAN + "Example01.java: " + RESET + "RunTime.evaluate()"); final RunTime.evaluate$$RET r = RunTime .evaluate() .accept("expression", "document.documentElement.outerHTML") .accept("returnByValue", true) .build() .exec(pws) .await(); System.out.println( '\n' + BCYAN + "Example01.java: " + RESET + "Response RemoteObject:" + '\n' + r.result.toString() ); final String html = ((JsonString) r.result.value).getString(); return html;
-
STEP_06_extractImageURLs
protected static java.lang.String[] STEP_06_extractImageURLs (java.lang.String html) throws java.lang.Exception
This method executes a custom JavaScript snippet inside the browser page and extracts the result. It usesRuntime.evaluatewith theawaitPromiseflag to execute asynchronous JS code and wait for a result.
The JavaScript command fetches all image elements on the page withsrcattributes matching Flickrβsstaticflickr.comdomain. The result is a list of image URLs returned back to Java and parsed into aString[].
This is the first real instance of cross-boundary data flow β using CDP to run code inside Chrome and pull results into your Java program.- Throws:
java.lang.Exception- See Also:
HTMLPage.getPageTokens(CharSequence, boolean),TagNodeFind,Attributes.retrieve(Vector, int[], String)- Code:
- Exact Method Body:
Printing.notice("Parsing HTML for Images Printing the URL's"); final Vector<HTMLNode> altPage = HTMLPage.getPageTokens(html, false); final int[] images = TagNodeFind.all(altPage, TC.OpeningTags, "img"); final String[] imgURLs = Attributes.retrieve(altPage, images, "src"); final int numImg = imgURLs.length; System.out.println ('\n' + BCYAN + "Example01.java: " + RESET + "Number of Images Found: " + numImg); for (int i = 0; i < numImg; i++) System.out.println(" " + imgURLs[i]); return imgURLs;
-
STEP_07_downloadImages
protected static void STEP_07_downloadImages(java.lang.String[] imageURLs) throws java.lang.Exception
The final step downloads each imageURLretrieved in the previous step and saves the results to disk. The filenames are derived from the tail end of theURLpath, and all downloads are saved to a configurable local directory.
This method doesn't involve CDP β it's just traditional HTTP file downloading usingImageScraper.download()But it completes the use case: open a tab, run JS to scrape content, and persist the result.
This step closes the automation loop: going from page navigation to content extraction and finally saving that content offline.
Make sure that a directory namedimage-downloads/exists as a sub-directory of the directory from which this method is invoked.- Throws:
java.lang.Exception- See Also:
ImageScraper.download(Request, Appendable),Request,Results,ImageScraper.shutdownTOThreads()- Code:
- Exact Method Body:
Printing.notice("Download the Image's into a folder"); final Stream.Builder<String> builder = Stream.builder(); for (int i = 0; i < imageURLs.length; i++) if (imageURLs[i].startsWith("//")) builder.accept("https:" + imageURLs[i]); // Build a Request-Object final List<String> imgURLsList = builder.build().collect(Collectors.toList()); final Request req = Request.buildFromStrIter(imgURLsList); // Add a few more Scraper-Configurations to the Request Object req.targetDirectory = "image-downloads/"; req.useDefaultCounterForImageFileNames = true; req.skipOnDownloadException = true; req.verbosity = Verbosity.Normal; try // Run the scraper, Send all Text-Output to 'System.out' (Ignore / Discard Results) { final Results results = ImageScraper.download(req, System.out); } catch (Exception e) { System.out.println(EXCC.toString(e)); } finally // This needs to happen, or this entire program will hang / lock up the terminal { ImageScraper.shutdownTOThreads(); }
-
-