Package Torello.HTML

Class Escape


  • public final class Escape
    extends java.lang.Object
    Easy utilities for escaping and un-escaping HTML characters such as  , and even code-point based Emoji's.

    There are dozens of "Escaped HTML" symbols in the HTML language. This class helps convert from an "escaped character" to the underlying/actual UTF-8 or ASCII 'char' (or in-the-reverse / vice-versa).



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 11 Method(s), 11 declared static
    • 6 Field(s), 6 declared static, 6 declared final


    • Method Summary

       
      Basic Methods
      Modifier and Type Method
      static boolean hasHTMLEsc​(char c)
      static void printHTMLEsc()
       
      Escape Characters to HTML Escape-Strings
      Modifier and Type Method
      static String escChar​(char c, boolean use16BitEscapeSequence)
      static String escCodePoint​(int codePoint, boolean use16BitEscapeSequence)
      static String htmlEsc​(char c)
       
      Un-Escape HTML Escape-Strings to Characters
      Modifier and Type Method
      static char escHTMLToChar​(String escHTML)
      static String replace​(String s)
      static String replaceAll​(String s)
      static String replaceAll_DEC​(String str)
      static String replaceAll_HEX​(String str)
      static String replaceAll_TEXT​(String str)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • printHTMLEsc

        🡇     🗕  🗗  🗖
        public static void printHTMLEsc()
        Print's the HTML Escape Character lookup table to System.out. This is useful for debugging.

        View Escape-Codes:
        The JAR Data-File List included within the page attached (below) is a complete list of all text-String HTML Escape Sequences that are known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
      • escHTMLToChar

        🡅  🡇     🗕  🗗  🗖
        public static char escHTMLToChar​(java.lang.String escHTML)
        Converts a single String from an HTML-escape sequence into the appropriate character.

        &[escape-sequence]; ==> actual ASCII or UniCode character.
        Parameters:
        escHTML - An HTML escape sequence.
        Returns:
        the ASCII or Unicode character represented by this escape sequence.

        This method will return '0' if the input it does not represent a valid HTML Escape sequence.
        Code:
        Exact Method Body:
         if (! escHTML.startsWith("&") || ! escHTML.endsWith(";")) return (char) 0;
        
         String  s = escHTML.substring(1, escHTML.length() - 1);
        
         // Temporary Variable.
         int     i = 0;
        
         // Since the EMOJI Escape Sequences use Code Point, they cannot, generally be
         // converted into a single Character.  Skip them.
        
         if (HEX_CODE.matcher(s).find())
         {
             if ((i = Integer.parseInt(s.substring(2), 16)) < Character.MAX_VALUE)
                 return (char) i;
             else
                 return 0;
         }
        
         // Again, deal with Emoji's here...  Parse the integer, and make sure it is a
         // character in the standard UNICODE range.
        
         if (DEC_CODE.matcher(s).find()) 
         {
             if ((i = Integer.parseInt(s.substring(1))) < Character.MAX_VALUE)
                 return (char) i;
             else
                 return 0;
         }
        
         // Now check if the provided Escape String is listed in the htmlEscChars Hashtable.
         Character c = htmlEscChars.get(s);
        
         // If the character was found in the table that lists all escape sequence characters,
         // then return it.  Otherwise just return ASCII zero.
        
         return (c != null) ? c.charValue() : 0;
        
      • replaceAll_HEX

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String replaceAll_HEX​(java.lang.String str)
        Will generate a String whereby any & all Hexadecimal Escape Sequences have been removed and subsequently replaced with their actual ASCII/UniCode un-escaped characters!

        Hexadecimal HTML Escape-Sequence Examples:
        Substring from Input:Web-Browser Converts To:
        &#xAA;'ª' within a browser
        &#x67;'g' within a browser
        &#x84;'„' within a browser

        This method might be thought of as similar to the older C/C++ 'Ord()' function, except it is for HTML.
        Parameters:
        str - any String that contains an HTML Escape Sequence &#x[HEXADECIMAL VALUE];
        Returns:
        a String, with all of the hexadecimal escape sequences removed and replaced with their equivalent ASCII or UniCode Characters.
        See Also:
        replaceAll_DEC(String str), StrReplace.r(String, String[], char[])
        Code:
        Exact Method Body:
         // This is the RegEx Matcher from the top.  It matches string's that look like: &#x\d+;
         Matcher m = HEX_CODE.matcher(str);
        
         // Save the escape-string regex search matches in a TreeMap.  We need to use a
         // TreeMap because it is much easier to check if a particular escape sequence has already
         // been found.  It is easier to find duplicates with TreeMap's.
        
         TreeMap<String, Character> escMap = new TreeMap<>();
        
         while (m.find())
         {
             // Use Base-16 Integer-Parse
             int i = Integer.valueOf(m.group(1), 16);
        
             // Do not un-escape EMOJI's... It makes a mess - they are sequences of characters
             // not single characters.
        
             if (i > Character.MAX_VALUE) continue;
        
             // Retrieve the Text Information about the HTML Escape Sequence
             String text = m.group();
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if (! escMap.containsKey(text)) escMap.put(text, Character.valueOf((char) i));
         }
                
         // Build the matchStr's and replaceChar's arrays.  These are just the KEY's and
         // the VALUE's of the TreeMap<String, Character> which was just built.
         // NOTE: A TreeMap is used *RATHER THAN* two parallel arrays in order to avoid keeping
         //       duplicates when the replacement occurs.
        
         String[]    matchStrs       = escMap.keySet().toArray(new String[escMap.size()]);
         char[]      replaceChars    = new char[escMap.size()];
        
         // Lookup each "ReplaceChar" in the TreeMap, and put it in the output "replaceChars"
         // array.  The class StrReplace will replace all the escape squences with the actual
         // characters.
        
         for (int i=0; i < matchStrs.length; i++) replaceChars[i] = escMap.get(matchStrs[i]);
        
         return StrReplace.r(str, matchStrs, replaceChars);
        
      • replaceAll_DEC

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String replaceAll_DEC​(java.lang.String str)
        This method functions the same as replaceAll_HEX(String) - except it replaces only HTML Escape sequences that are represented using decimal (base-10) values. 'replaceAll_HEX(...)' works on hexadecimal (base-16) values.

        Base-10 HTML Escape-Sequence Examples:
        Substring from Input:Web-Browser Converts To:
        &#48;'0' in your browser
        &#64;'@' in your browser
        &#123;'{' in your browser
        &#125;'}' in your browser

        Base-10 & Base-16 Escape-Sequence Difference:
        • &#x[hex base-16 value]; There is an 'x' as the third character in the String
        • &#[decimal base-10 value]; There is no 'x' in the escape-sequence String!

        This short example delineates the difference between an HTML escape-sequence that employs Base-10 numbers, and one using Base-16 (Hexadecimal) numbers.
        Parameters:
        str - any String that contains the HTML Escape Sequence &#[DECIMAL VALUE];.
        Returns:
        a String, with all of the decimal escape sequences removed and replaced with ASCII UniCode Characters.

        If this parameter does not contain such a sequence, then this method will return the same input-String reference as its return value.
        See Also:
        replaceAll_HEX(String str), StrReplace.r(String, String[], char[])
        Code:
        Exact Method Body:
         // This is the RegEx Matcher from the top.  It matches string's that look like: &#\d+;
         Matcher m = DEC_CODE.matcher(str);
        
         // Save the escape-string regex search matches in a TreeMap.  We need to use a
         // TreeMap because it is much easier to check if a particular escape sequence has already
         // been found.  It is easier to find duplicates with TreeMap's.
        
         TreeMap<String, Character> escMap = new TreeMap<>();
        
         while (m.find())
         {
             // Use Base-10 Integer-Parse
             int i = Integer.valueOf(m.group(1));
        
             // Do not un-escape EMOJI's... It makes a mess - they are sequences of characters
             // not single characters.
        
             if (i > Character.MAX_VALUE) continue;
        
             // Retrieve the Text Information about the HTML Escape Sequence
             String text = m.group();
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if (! escMap.containsKey(text)) escMap.put(text, Character.valueOf((char) i));
         }
                
         // Build the matchStr's and replaceChar's arrays.  These are just the KEY's and
         // the VALUE's of the TreeMap<String, Character> which was just built.
         // NOTE: A TreeMap is used *RATHER THAN* two parallel arrays in order to avoid keeping
         //       duplicates when the replacement occurs.
        
         String[]    matchStrs       = escMap.keySet().toArray(new String[escMap.size()]);
         char[]      replaceChars    = new char[escMap.size()];
        
         // Lookup each "ReplaceChar" in the TreeMap, and put it in the output "replaceChars"
         // array.  The class StrReplace will replace all the escape sequences with the actual
         // characters.
        
         for (int i=0; i < matchStrs.length; i++) replaceChars[i] = escMap.get(matchStrs[i]);
        
         return StrReplace.r(str, matchStrs, replaceChars);
        
      • replaceAll_TEXT

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String replaceAll_TEXT​(java.lang.String str)
        Replaces all HTML Escape Sequences that contain text-word escape-sequences.

        Standard (Text) HTML Escape-Sequence Examples:
        ASCII or UNICODE:Can be Escaped Using:
        " (double-quote)&quot; (in HTML)
        & (ampersand)&amp; (in HTML)
        < (less-than)&lt; (in HTML)
        > (greater-than)&gt; (in HTML

        View Escape-Codes:
        The list included within the page attached (below) is a complete list of all Text-String HTML Escape Sequences known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
        Parameters:
        str - any String that contains HTML Escape Sequences that need to be converted to their ASCII-UniCode character representations.
        Returns:
        a String, with all of the decimal escape sequences removed and replaced with ASCII UniCode Characters.
        Throws:
        java.lang.IllegalStateException
        See Also:
        replaceAll_HEX(String str), StrReplace.r(String, boolean, String[], Torello.Java.Function.ToCharIntTFunc)
        Code:
        Exact Method Body:
         // We only need to find which escape sequences are in this string.
         // use a TreeSet<String> to list them.  It will
        
         Matcher                 m        = TEXT_CODE.matcher(str);
         TreeMap<String, String> escMap   = new TreeMap<>();
        
         while (m.find())
         {
             // Retrieve the Text Information about the HTML Escape Sequence
             String text     = m.group();
             String sequence = text.substring(1, text.length() - 1);
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if ((! escMap.containsKey(text)) && htmlEscChars.containsKey(sequence))
                 escMap.put(text, sequence);
         }
                
         // Convert the TreeSet to a String[] array... and use StrReplace
         String[] escArr = new String[escMap.size()];
        
         return StrReplace.r(
             str, false, escMap.keySet().toArray(escArr),
             (int i, String sequence) -> htmlEscChars.get(escMap.get(sequence))
         );
        
      • replaceAll

        🡅  🡇     🗕  🗗  🗖
        @Deprecated
        public static java.lang.String replaceAll​(java.lang.String s)
        Deprecated.
        Calls all of the HTML Escape Sequence convert/replace String functions at once.
        Parameters:
        s - This may be any Java String which may (or may not) contain HTML Escape sequences.
        Returns:
        a new String where all HTML escape-sequence substrings have been replaced with their natural character representations.
        See Also:
        replaceAll_DEC(String), replaceAll_HEX(String), replaceAll_TEXT(String)
        Code:
        Exact Method Body:
         return replaceAll_HEX(replaceAll_DEC(replaceAll_TEXT(s)));
        
      • replace

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String replace​(java.lang.String s)
        This is an optimized HTML String-replacement method. It will substitute all HTML Escape Sequences with the actual characters they represent.

        Emoji's:
        In keeping with the other methods in this class, if there are any HTML Emoji Escape Sequences, these shall not be replaced. Emoji's work on the principle of Code-Point, and though replacing such escape sequences is not difficult, because they work in the Code-Point space, their substitutions are never single character representations (there are always at least two Java char's used per one Code Point).

        There is an alternate method that can substitute the actual Java char's for a Code-Point Escape-Sequence.

        Code-Point:
        For those familiar with Code Point, the wau this method works is that it just skips any escaped sequence that use Base-10 or Base-16 Representations if the number inside the Escape-Sequence is larger than Character.MAX_VALUE.

        It is important to remember that all Java String's are simply char-Arrays which are wrapped in an java.lang.String class instance. Since the Primitive Type 'char' is fundamentally a 16-bit character, no character can be converted if it is larger than this value. Although Code Point works just fine in Java, it is left as a separate method in this class.

        Rendering Emoji's:
        Many standard web-pages use very little of the more advanced Escape-Sequences. Emoji's are somewhat popular. The issue isn't about whether the 'Code Point' based Escape-Sequences can be converted or handled, but rather it is about whether or not your really want to leave the comfortable world of HTML Escape-Sequences for your Code Point related characters.

        Once a Code Point sequence has been un-escaped, it will only be visible in text-editors / viewers that are capable of rendering Code Point's or Emoji's (and not all text editors can do this!)
        Parameters:
        s - This may be any Java String which may (or may not) contain HTML Escape sequences.
        Returns:
        a new String where all HTML escape-sequence substrings have been replaced with their natural character representations.
        Code:
        Exact Method Body:
         // The primary optimization is to do this the "C" way (As in The C Programming Language)
         // The String to Escape is converted to a character array, and the characters are shifted
         // as the Escape Sequences are replaced.  This is all done "in place" without creating
         // new substring's in memory.
        
         char[] c = s.toCharArray();
        
         // These two pointers are kept as the "Source Character" - as in the next character to
         // "Read" ... and the "Destination Character" - as in the next location to write.
        
         int sourcePos   = 0;
         int destPos     = 0;
        
         while (sourcePos < c.length)
        
             // All Escape Sequences begin with the Ampersand Symbol.  If the next character
             // does not begin with the Ampersand, we should skip and move on.  Copy the next source
             // character to the next destination location, and continue the loop.
        
             if (c[sourcePos] != '&')
             { c[destPos++]=c[sourcePos++];  continue; }
            
             // Here, an Ampersand has been found.  Now check if the character immediately 
             // following the Ampersand is a Pound Sign.  If it is a Pound Sign, that implies
             // this escape sequence is simply going to be a number.
        
             else if ((sourcePos < (c.length-1)) && (c[sourcePos + 1] == '#'))
             {
                 int     evaluatingPos   = sourcePos + 1;
                 boolean isHex           = false;
        
                 // If the Character after the Pound Sign is an 'X', it means that the number
                 // that has been escaped is a Base 16 (Hexadecimal) number.
                 // IMPORTANT: Check to see that the Ampersand wasn't the last char in the String
        
                 if (evaluatingPos + 1 < c.length)
                     if (c[evaluatingPos + 1] == 'x')
                     { isHex = true; evaluatingPos++; }
        
                 // Keep skipping the numbers, until a non-digit character is identified.
                 while ((++evaluatingPos < c.length) && Character.isDigit(c[evaluatingPos]));
        
                 // If the character immediately after the last digit isn't a ';' (Semicolon),
                 // then this entire thing is NOT an escaped HTML character.  In this case, make
                 // sure to copy the next source-character to the next destination location in the
                 // char[] array...  Then continue the loop to the next 'char' (after Ampersand)
        
                 if ((evaluatingPos == c.length) || (c[evaluatingPos] != ';'))
                     { c[destPos++]=c[sourcePos++];  continue; }
        
                 int escapedChar;
        
                 try
                 { 
                     // Make sure to convert 16-bit numbers using the 16-bit radix using the
                     // standard java parse integer way.
        
                     escapedChar = isHex
                         ? Integer.parseInt(s.substring(sourcePos + 3, evaluatingPos), 16)
                         : Integer.parseInt(s.substring(sourcePos + 2, evaluatingPos));
                 }
        
                 // If for whatever reason java was unable to parse the digits in the escape
                 // sequence, then copy the next source-character to the next destination-location
                 // and move on in the loop.
        
                 catch (NumberFormatException e)
                     { c[destPos++]=c[sourcePos++];  continue; }
        
                 // If the character was an Emoji, then it would be a number greater than
                 // 2^16.  Emoji's use Code Points - which are multiple characters used up
                 // together.  Their escape sequences are always characters larger than 65,535.
                 // If so, just copy the next source-character to the next destination location, and
                 // move on in the loop.
        
                 if (escapedChar > Character.MAX_VALUE)
                     { c[destPos++]=c[sourcePos++];  continue; }
        
                 // Replace the next "Destination Location" with the (un) escaped char.
                 c[destPos++] = (char) escapedChar;
        
                 // Skip the entire HTML Escape Sequence by skipping to the location after the
                 // position where the "evaluation" (all this processing) was occurring.  This
                 // just happens to be the next-character immediately after the semi-colon
        
                 sourcePos = evaluatingPos + 1;  // will be pointing at the ';' (semicolon)
             }
        
             // An Ampersand was just found, but it was not followed by a '#' (Pound Sign).  This
             // means that it is not a "numbered" (to invent a term) HTML Escape Sequence.  Instead
             // we shall check if there is a valid Escape-String (before the next semi-colon) that
             // can be identified in the Hashtable 'htmlEscChars'
        
             else if (sourcePos < (c.length - 1))
             {
                 // We need to create a 'temp variable' and it will be called "evaluating position"
                 int evaluatingPos = sourcePos;
        
                 // All text (non "Numbered") HTML Escape String's are comprised of letter or digits
                 while ((++evaluatingPos < c.length) && Character.isLetterOrDigit(c[evaluatingPos]));
        
                 // If the character immediately after the last letter or digit is not a semi-colon,
                 // then there is no way this is an HTML Escape Sequence.  Copy the next source to
                 // the next destination location, and continue with the loop.
        
                 if ((evaluatingPos == c.length) || (c[evaluatingPos] != ';'))
                     { c[destPos++]=c[sourcePos++];  continue; }
        
                 // Get the replacement character from the lookup table.
                 Character replacement = htmlEscChars.get(s.substring(sourcePos + 1, evaluatingPos));
        
                 // The lookup table will return null if there this was not a valid escape sequence.
                 // If this was not a valid sequence, just copy the next character from the source
                 // location, and move on in the loop.
        
                 if (replacement == null)
                     { c[destPos++]=c[sourcePos++];  continue; }
        
                 c[destPos++]    = replacement;
                 sourcePos       = evaluatingPos + 1;
             }
        
             else
                 { c[destPos++]=c[sourcePos++];  continue; }
        
         return new String(c, 0, destPos);
        
      • escChar

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String escChar​(char c,
                                               boolean use16BitEscapeSequence)
        This method shall simply escape any char into an HTML Escape String.
        Input 'char'Returned String's
        '中' (Middle / China) "&#20013;" (Base 10)
        "&#x4E2D;" (Base 16)
        '日' (Japan / Sun) "&#26085;" (Base 10)
        "&#x65E5;" (Base 16)
        'Ñ' (Spanish Tilda) "&#209;" (Base 10)
        "&#xD1;" (Base 16)
        'ñ' (Lower-Case Tilda) "&#241;" (Base 10)
        "&#xF1;" (Base 16)
        '☃' (Snowman Glyph) "&#9731;" (Base 10)
        "&#x2603;" (Base 16)

        Java 'char' Primitive-Type:
        The java primitive 'char' type, which, again, is a 16-bit (2^16 65,535) type essentially equates to the primary plane (plane 0) of the 17 UNICODE planes. This is also known as the Basic Multi-Lingual Plane.

        Here, likely any foreign language character, needed by a programmer (including all Chinese Character Glyphs) are easily found with a bit of searching. Any modern web-browser can display these characters, if they are escaped using an the HTML Escape Sequences returned by this method.

        Modern-Browsers & UTF-8:
        As an aside, if a programmer includes the HTML Element: <META CHARSET="utf-8"> in the <HEAD>...</HEAD> portion of an HTML Page, it becomes easy to include such characters (from the Multi-Lingual Plane) without even needing to use Escape-Sequences for the characters.

        Any Web-Browser which knows before-hand that non-ASCII characters (higher than character #255 / 0xFF) are being transmitted, will interpret them using UTF-8. In this case escaping the char's them becomes unnecessary.
        Parameters:
        c - Any Java Character. Note that the Java Primitive Type 'char' is a 16-bit type. This parameter equates to the UNICODE Characters 0x0000 up to 0xFFFF.
        use16BitEscapeSequence - If the user would like the returned, escaped, String to use Base 16 for the escaped digits, pass TRUE to this parameter. If the user would like to retrieve an escaped String that uses standard Base 10 digits, then pass FALSE to this parameter.
        Returns:
        The passed character parameter 'c' will be converted to an HTML Escape Sequence. For instance if the character 'ᡃ', which is the Chinese Character for I, Me, Myself were passed to this method, then the String "&#25105;" would be returned.

        If the parameter 'use16BitEscapeSequence' had been passed TRUE, then this method would, instead, return the String "&#x6211;".
        Code:
        Exact Method Body:
         return use16BitEscapeSequence
             ? "&#" + ((int) c) + ";"
             : "&#x" + Integer.toHexString((int) c).toUpperCase() + ";";
        
      • escCodePoint

        🡅  🡇     🗕  🗗  🗖
        public static java.lang.String escCodePoint​(int codePoint,
                                                    boolean use16BitEscapeSequence)
        This method shall simply escape any Code Point point integer into an HTML Escape String. Below is a list of a few examples of Code Points commonly used. As stated, most of the Basic Multi Lingual Plane - which is Plane 0 of the UNICODE Space fits into the 16-bit java Primitive Type 'char'. For such situations, "Code Points" have very little application to software. Essentially, Java's 16-bit 'char' primitive type gives that to the programmer "for free" - without needing to think past, again, Java's primitive-type 'char'.

        Although "Code Points" were developed decades ago, today, one of the most common uses for them are the Emoji's being used on numerous web-sites. It is important to note that not all Emoji's will fit into a single Code Point, and, as such, equating a "Code Point" with an "Emoji" is actually incorrect. However, for the more complicated Emoji's available, all that is really going on is that sequences of code points are being sent and interpreted by the web-browser - as a single glyph or character-image.

        Escaping Emoji's:
        Just as with Foreign-Language characters, the code-points themselves (without having been escaped) can be included directly into a text file, as long as the HTML-File indicates that non-ASCII, or UTF-8 data is being transmitted. In such cases, to avoid using these Escape-Sequences at all, just include the usual Java char's in the meta tag in the HTML <HEAD>...</HEAD> section, as follows:

        HTML-Tag to Include: <META CHARSET="utf-8">.

        And here is a (very) brief sample table of Emoji's and their HTML Escape-Sequences:
        Input Code Point (int)Returned String's
        😀 (Grinning Face)
        (128512)
        "&#128512;" (Base 10)
        "&#x1F600;" (Base 16)
        👍 (Thumb's Up)
        (128077)
        "&#128077;" (Base 10)
        "&#x1F44D;" (Base 16)
        🌮 (Taco)
        (127790)
        "&#127790;" (Base 10)
        "&#x1F32E;" (Base 16)
        'A' (Upper-Case A)
        (ASCII# 65)
        "&#65;" (Base 10)
        "&#x41;" (Base 16)
        '0' (Number Zero)
        (ASCII# 48)
        "&#48;" (Base 10)
        "&#x30;" (Base 16)
        '中' (Middle-China)
        (20013)
        "&#20013;" (Base 10)
        "&#x4E2D;" (Base 16)
        'ü' (German Umlaut)
        (252)
        "&#252;" (Base 10)
        "&#xFC;" (Base 16)
        'Ñ' (Spanish Tilda)
        (209)
        "&#209;" (Base 10)
        "&#xD1;" (Base 16)

        Again, If the '.html' files you are providing to a web-browser indicate the <META CHARSET="utf-8">, it is not necessary to provide HTML escape sequences for an Emoji, or any 'Code Point' at all. Instead, if the text-editor you are using to edit your '.html' files can handle code points, they may be included directly into the 'html' file itself.

        Multi-Code-Point Emoji's:
        There are numerous Emoji's that are represented by sequences of code-points, AND NOT just a single code point integer. In such cases, providing HTML escape sequences will actually prevent the browser from rendering the "conglomerate" Emoji.

        The Emoji's below do not need to be escaped, (because they are sequences of code points, rather than just single code points). Instead, their code points must be included directly into the '.html' file itself - or they will not be properly rendered by the web-browser...
        EmojiCode Point Sequence
        👁️‍🗨️

        "Eye in Speech"
        U+1F441 U+200D U+1F5E8 ==>

        👁 (Eye - 0x1F441;) +

        GLUE (0X200D;) +

        🗨 (Speech Bubble - 0x1F5E8)
        👉🏿

        "Index-Finger Pointing, Dark Hand"
        "U+1F449 U+1F3FF" ==>

        👉 (Index Finger Pointing - U+1F449) +

        Dark Skin Color - U+1F3FF
        Parameters:
        codePoint - This will take any integer. It will be interpreted as a UNICODE code point.

        NOTE: Java uses 16-bit values for it's primitive 'char' type. This is also the "first plane" of the UNICODE Space and actually referred to as the Basic Multi Lingual Plane. Any value passed to this method that is lower than 65,535 would receive the same escape-String that it would from a call to the method escChar(char, boolean).
        use16BitEscapeSequence - If the user would like the returned, escaped, String to use Base 16 for the escaped digits, pass TRUE to this parameter. If the user would like to retrieve an escaped String that uses standard Base 10 digits, then pass FALSE to this parameter.
        Returns:
        The code point will be converted to an HTML Escape Sequence, as a java.lang.String. For instance if the code point for "the snowman" glyph (character ☃), which happens to be represented by a code point that is below 65,535 (and, incidentally, does "fit" into a single Java 'char') - this method would return the String "&#9731;".

        If the parameter 'use16BitEscapeSequence' had been passed TRUE, then this method would, instead, return the String "&#x2603;".
        Throws:
        java.lang.IllegalArgumentException - Java has a method for determining whether any integer is a valid code point. Not all of the integers "fit" into the 17 Unicode "planes". Note that each of the planes in 'Unicode Space' contain 65,535 (or 2^16) characters.
        Code:
        Exact Method Body:
         if (! Character.isValidCodePoint(codePoint)) throw new IllegalArgumentException(
             "The integer you have passed to this method [" + codePoint + "] was deemed an " +
             "invalid Code Point after a call to: [java.lang.Character.isValidCodePoint(int)].  " +
             "Therefore this method is unable to provide an HTML Escape Sequence."
         );
        
         return use16BitEscapeSequence
             ? "&#" + codePoint + ";"
             : "&#x" + Integer.toHexString(codePoint).toUpperCase() + ";";
        
      • hasHTMLEsc

        🡅  🡇     🗕  🗗  🗖
        public static boolean hasHTMLEsc​(char c)
        Check the internal Escape Sequence Lookup Table. If there is an escape sequence String associated with the char provided to this method, then return TRUE. If there is no such Escape Sequence in the Lookup Table associated with parameter 'c', then return FALSE.

        The Lookup Table can identify whether char parameter 'c' has an associated HTML Escape Sequence, or not. Escape sequences are always short, text-String's that were selected by the w3C (long ago, in the 1990's).

        Returns TRUE if there is an associated String escape-sequence for char-parameter 'c' parameter, and FALSE otherwise. Please review the brief sample table below:
        Input Character:Method Return Value:
        '&' (ampersand) TRUE
        'A' (letter-A) FALSE
        '<' (less-than-symbol) TRUE
        '9' (number-9) FALSE
        '>' (less-than-symbol) TRUE

        View Escape-Codes:
        The list included within the page attached (below) is a complete list of all Text-String HTML Escape-Sequences that are known to this class. This list, does not include any Code-Point, Hex or Decimal-Number sequences.

        All HTML Escape Sequences
        Parameters:
        c - Any ASCII or UNICODE Character
        Returns:
        TRUE if there is a String escape sequence for this character, and FALSE otherwise.
        See Also:
        htmlEsc(char)
        Code:
        Exact Method Body:
         return htmlEscSeq.get(Character.valueOf(c)) != null;
        
      • htmlEsc

        🡅     🗕  🗗  🗖
        public static java.lang.String htmlEsc​(char c)
        Check the internal Escape Sequence Lookup Table. If there is an escape sequence String associated with the char provided to this method, then return it.

        For Instance:
        Input Character:Method Return Value:
        '&' "amp"
        'A' (letter-A) null
        '<' (less-than-symbol) "lt"
        '9' (number-9) null
        '>' (greater-than-symbol) "gt"

        View Escape Codes:
        The list included within the page attached (below) is a complete list of all Text-String HTML Escape-Sequences that are known to this class. This list, does not include any Code-Point, Hex or Decimal-Number sequences.

        All HTML Escape Sequences
        Parameters:
        c - Any ASCII or UNICODE Character
        Returns:
        The String that is used by web-browsers to escape this ASCII / Uni-Code character - if there is one saved in the internal Lookup Table. If the character provided does not have an associated HTML Escape String, then 'null' is returned.

        NOTE: The entire escape-String is not provided, just the inner-characters. The leading '&' (Ampersand) and the trailing ';' (Semi-Colon) are not appended to the returned String.
        See Also:
        hasHTMLEsc(char)
        Code:
        Exact Method Body:
         return htmlEscSeq.get(Character.valueOf(c));