======== Patterns ======== .. Modified: 2018-10-23/05:38-0400 btiffin .. Copyright 2016 Brian Tiffin .. GPL 3.0+ :ref:`license` .. This file is part of the Unicon Programming documentation .. image:: images/unicon.png :align: center .. only:: html :ref:`genindex` :floatright:`<a href="http://unicon.sourceforge.net/">Unicon</a>` .. index:: pair: pattern; scanning single: patterns .. _patterns: Unicon Pattern data =================== .. index:: pair: patterns; SNOBOL .. _snobol patterns: SNOBOL patterns --------------- Unicon version 13 alpha has :ref:`SNOBOL` inspired pattern matching. New functions and operators were added to emulate the very powerful, and well studied ``SNOBOL`` pattern matching features. This augments ``String scanning`` quite nicely. These features introduce a new datatype, ``pattern``. Details are in Technical Report UTR18a, http://unicon.org/utr/utr18.pdf. SNOBOL is still relevant to many developers and SNOBOL4 implementations have been made freely available, thanks in large part to Catspaw Inc. There is also a very comprehensive tutorial hosted at http://www.snobol4.org. Chapter 4 of the tutorial is about Pattern Matching. http://www.snobol4.org/docs/burks/tutorial/ch4.htm This is a conversion (with some changes to add a test pass, and outputting results) of the small program listed in section 4.7 of that page: .. literalinclude:: examples/snobols.icn :language: unicon :start-after: ##+ .. only:: html .. rst-class:: rightalign :download:`examples/snobols.icn` .. program-output:: unicon -s snobols.icn -x :cwd: examples :ref:`Clint`, along with Sudarshan Gaikaiwari and John Goettsche carefully designed this feature set to be an almost one to one correspondence to ``SNOBOL`` patterns. It provides a highly viable path for porting old, beloved, ``SNOBOL`` programs to Unicon. *Unicon currently lacks the full* ``eval`` *potential of* :ref:`SNOBOL` *but ameliorates that downside, somewhat, by allowing invocation of functions and methods along with variable and field references inside patterns.* .. index:: patterns; internals Internals ......... To see a little bit of how the implementation actually works, let's take a look at the preprocessor output. *The listing below has extra blank lines squeezed out,* ``cat -s``\ *, and is reformatted,* ``fmt``\ *. This is only for human curiousity and the listing below is not the version sent to the compiler.* .. command-output:: unicon -s -E snobols.icn | cat -s | fmt :cwd: examples :shell: Nice. The SNOBOL operators are actually a new class of functions. *I talked with Clinton about this, and for now, those functions are for compiler internal use only. Much smarter, and cleaner, to use the operators.* .. index:: regular expressions, patterns; regex .. _regular expressions: Regular expressions ------------------- When SNOBOL patterns were added to Unicon, regular expression features were also added. This means Unicon has the power of :ref:`string scanning`, :ref:`snobol patterns` and ``regular expressions`` available. And all three features can be freely mixed in string manipulation expressions. *Raising the bar*. Regular expression literals are surrounded by angle brackets, not quotes. Pattern matching uses a :ref:`?? <binop??>` operator. As of early Unicon release 13, regular expressions are limited to ``basic`` regex patterns. .. literalinclude:: examples/hello-regex.icn :language: unicon :start-after: ##+ Displays a message when the subject includes some form of ``Hello, world``. In the example, the first and last elements of the string list do not match. The regular expression looks for Hello with or without a capital H, an optional comma, any number of spaces or tabs (including zero), followed by World (or world), with an optional exclamation mark. .. program-output:: unicon -s hello-regex.icn -x :cwd: examples .. index:: patterns; operators Pattern operators ----------------- - ``??`` - a variant form of string scanning, s ?? p matching a pattern, not a general Unicon expression as with :ref:`? <binop?>` scanning. Unanchored. - ``=p`` - anchored match of pattern, p. - ``.|`` - a pattern alternation. Accepts Unicon expressions as an operand. - ``->`` - conditional assignment. - ``=>`` - immediate assignment, (regardless of an actual successful match result). - ``.>`` - cursor position assignment. - ``<r>`` - a regular expression literal is surrounded in angle brackets (chevrons). .. index:: patterns; syntax Regex syntax ------------ Regular expressions can include the following components - ``r`` - ordinary symbol that matches to r. - ``r1 r2`` - juxtaposition is concatenation. - ``r1 | r2`` - regular expression alternate (not a generator). - ``r*`` - match zero or more occurrences of r. - ``r+`` - match one or more occurrences of r. - ``r?`` - match zero or one occurrences of r. - ``r{n}`` - braces surround an integer count, match n occurrences of r. - ``"lit"`` - match the literal string, with the usual escapes allowed. - ``'lit'`` - cset literal matching any one character of the set, escapes allowed. - ``[chars]`` - cset literal with dash range syntax. - ``.`` - match any character *except newline*. - ``(r)`` - parentheses are used for grouping. .. only:: html .. -------- :ref:`genindex` | Previous: :doc:`strings` | Next: :doc:`objects` |