Package Torello.CSS

Class UnicodeRange

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.CharSequence, java.lang.Comparable<java.lang.CharSequence>

    public class UnicodeRange
    extends CSSToken
    implements java.lang.CharSequence, java.io.Serializable, java.lang.Comparable<java.lang.CharSequence>
    This is a Token Data-Class. It is a descendant of the root CSSToken-Class: CSSToken. Instances of the class are usually are produced by the CSSTokenizer class. Many (but not all) of these subclasses maintain a static-method for building instances of this class named 'build'. Any CSSToken-subclass that is neither a singleton-instance, nor an "Error-Subtype" should have such a builder. Singeton instances do not need builders, and the two Error-Subtype Classes can only be generated by the tokenizer.

    All CSSToken subclasses have a CSSToken.str field which contains the exact character data that was extracted and used to construct instances of this class. All sub-casses also have several "Loop Optimization" methods. These are methods that may or may not be useful in light of some of the newer additions to JDK 17 & 21 including the 'instanceof varName' conditional-expression variable-naming features.

    The algorithms used to write this tokenizer were generated based solely on the CSS Working-Group's Syntax-Documentation. This document may be viewed here: CSS Working-Group CSS-Syntax. There is an external site that maintain all thing CSS located at drafts.csswg.org
    Represents a range of characters in Unicode.
    See Also:
    Serialized Form


    • Field Detail

      • serialVersionUID

        🡇     🗕  🗗  🗖
        protected static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
         protected static final long serialVersionUID = 1;
        
    • Method Detail

      • build

        🡅  🡇     🗕  🗗  🗖
        public static UnicodeRange build​(java.lang.String rangeStr)
        Static-Builder Method for creating an instance of this class. This Static-Method is a substitute for an actual Constructor. Because many of the 'consume(...)' methods in the Token Classe for Torello.CSS actually generate / spit-out more than CSSToken instance, writing publicly available constructors is largely impossible.

        The upside to this approach is that the build methods and the consume methods share identical code. Furthermore this code is (nearly) perfectly based on the Pseudo-Code on the CSS Working-Group Website.
        Parameters:
        rangeStr - Any Java-String that can be parsed into an instance of Str
        Returns:
        An instance of Str.

        If the contents of the Input-String parameter 'rangeStr' cannot be consumed, exactly, by this class' 'consume' method, then an exception shall throw.
        Throws:
        TokenizeException - This exception may be thrown for any number of reasons involving the inability to parse input parameter 'rangeStr'.
      • is

        🡅  🡇     🗕  🗗  🗖
        public static boolean is​(int[] css,
                                 int sPos)
        Checks whether or not the next token to consume is a Unicode Range.

        Tokenizer: Escape-Sequence Check Method, Pseudo-Code

        Making use of the CSS Parser DOES NOT require any knowledge of how the underlying Pass 1 Tokenizer actually works. Browser-War people are usually pretty convincing that parsing CSS is a "Moving Target" type of operation, not to be engaged by mere mortals.

        Below is the CSS Working Group's Escape-Sequence Pseudo-Code. You may review it if you are at wit's end, and have nothing better to do. There is no need to actually invoke this method, it is here solely for informational purposes.

        These Parsing Pseudo-Code Instructions and Rail-Road Diagrams have been copied from the CSS-Working-Group Web-Site:
        https://drafts.csswg.org/css-syntax/#check-if-three-code-points-would-start-a-unicode-range

        4.3.11. Check if three code points would start a unicode-range

        This section describes how to check if three code points would start a unicode-range. The algorithm described here can be called explicitly with three code points, or can be called with the input stream itself. In the latter case, the three code points in question are the current input code point and the next two input code points, in that order.

        Note: This algorithm will not consume any additional code points.

        If all of the following are true:

        • The first code point is either U+0055 LATIN CAPITAL LETTER U (U) or U+0075 LATIN SMALL LETTER U (u)

        • The second code point is U+002B PLUS SIGN (+).

        • The third code point is either U+003F QUESTION MARK (?) or a hex digit

        then return true.

        Otherwise return false.

        Parameters:
        css - CSS-String as an array of code-points.
        sPos - The array-index where the tokenizer is to consume its next token
        Returns:
        TRUE if and only if the next token in the array is a Unicode-Range
      • consume

        🡅     🗕  🗗  🗖
        protected static void consume​
                    (int[] css,
                     ByRef<java.lang.Integer> POS,
                     java.util.function.Consumer<CSSToken> returnParsedToken)
        
        This is a tokenizer method which "consumes" the next UnicodeRange-Token from the input Code-Point Array.

        Tokenizer: UnicodeRange Consume Method, Pseudo-Code

        Making use of the CSS Parser DOES NOT require any knowledge of how the underlying Pass 1 Tokenizer actually works. Browser-War people are usually pretty convincing that parsing CSS is a "Moving Target" type of operation, not to be engaged by mere mortals.

        Below is the CSS Working Group's UnicodeRange Pseudo-Code. You may review it if you are at wit's end, and have nothing better to do. There is no need to actually invoke this method, it is here solely for informational purposes.

        These Parsing Pseudo-Code Instructions and Rail-Road Diagrams have been copied from the CSS-Working-Group Web-Site:
        https://drafts.csswg.org/css-syntax/#consume-unicode-range-token

        Consume a unicode-range token

        This section describes how to consume a unicode-range token from a stream of code points. It returns a <unicode-range-token>.

        Note: This algorithm does not do the verification of the first few code points that are necessary to ensure the returned code points would constitute an <unicode-range-token>. Ensure that the stream would start a unicode-range before calling this algorithm.

        Note: This token is not produced by the tokenizer under normal circumstances. This algorithm is only called during consume the value of a unicode-range descriptor, which itself is only called as a special case for parsing the unicode-range descriptor; this single invocation in the entire language is due to a bad syntax design in early CSS.

        1. Consume the next two input code points and discard them.

        2. Consume as many hex digits as possible, but no more than 6. If less than 6 hex digits were consumed, consume as many U+003F QUESTION MARK (?) code points as possible, but no more than enough to make the total of hex digits and U+003F QUESTION MARK (?) code points equal to 6.

          Let first segment be the consumed code points.

        3. If first segment contains any question mark code points, then:

          1. Replace the question marks in first segment with U+0030 DIGIT ZERO (0) code points, and interpret the result as a hexadecimal number. Let this be start of range.

          2. Replace the question marks in first segment with U+0046 LATIN CAPITAL LETTER F (F) code points, and interpret the result as a hexadecimal number. Let this be end of range.

          3. Return a new <unicode-range-token> starting at start of range and ending at end of range.

        4. Otherwise, interpret first segment as a hexadecimal number, and let the result be start of range.

        5. If the next 2 input code points are U+002D HYPHEN-MINUS (-) followed by a hex digit, then:

          1. Consume the next input code point.

          2. Consume as many hex digits as possible, but no more than 6. Interpret the consumed code points as a hexadecimal number. Let this be end of range.

          3. Return a new <unicode-range-token> starting at start of range and ending at end of range.

        6. Otherwise, return a new <unicode-range-token> both starting and ending at start of range.

        <unicode-range-token>
        U u + hex digit 1-6 times hex digit 1-5 times ? 1 to (6-digits) times hex digit 1-6 times - hex digit 1-6 times