========
Patterns
========

.. Modified: 2018-10-23/05:38-0400 btiffin
.. Copyright 2016 Brian Tiffin
.. GPL 3.0+ :ref:`license`
.. This file is part of the Unicon Programming documentation

.. image:: images/unicon.png
   :align: center

.. only:: html

    :ref:`genindex`
    :floatright:`<a href="http://unicon.sourceforge.net/">Unicon</a>`

.. index::
   pair: pattern; scanning
   single: patterns

.. _patterns:

Unicon Pattern data
===================

.. index::
   pair: patterns; SNOBOL

.. _snobol patterns:

SNOBOL patterns
---------------
Unicon version 13 alpha has :ref:`SNOBOL` inspired pattern matching.  New
functions and operators were added to emulate the very powerful, and well
studied ``SNOBOL`` pattern matching features.  This augments ``String
scanning`` quite nicely. These features introduce a new datatype, ``pattern``.

Details are in Technical Report UTR18a, http://unicon.org/utr/utr18.pdf.

SNOBOL is still relevant to many developers and SNOBOL4 implementations have
been made freely available, thanks in large part to Catspaw Inc.

There is also a very comprehensive tutorial hosted at http://www.snobol4.org.

Chapter 4 of the tutorial is about Pattern Matching.

http://www.snobol4.org/docs/burks/tutorial/ch4.htm

This is a conversion (with some changes to add a test pass, and outputting
results) of the small program listed in section 4.7 of that page:

.. literalinclude:: examples/snobols.icn
   :language: unicon
   :start-after: ##+

.. only:: html

    .. rst-class:: rightalign

        :download:`examples/snobols.icn`

.. program-output:: unicon -s snobols.icn -x
   :cwd: examples

:ref:`Clint`, along with Sudarshan Gaikaiwari and John Goettsche carefully
designed this feature set to be an almost one to one correspondence to
``SNOBOL`` patterns.  It provides a highly viable path for porting old,
beloved, ``SNOBOL`` programs to Unicon.

*Unicon currently lacks the full* ``eval`` *potential of* :ref:`SNOBOL` *but
ameliorates that downside, somewhat, by allowing invocation of functions and
methods along with variable and field references inside patterns.*

.. index:: patterns; internals

Internals
.........
To see a little bit of how the implementation actually works, let's take a
look at the preprocessor output. *The listing below has extra blank lines
squeezed out,* ``cat -s``\ *, and is reformatted,* ``fmt``\ *.  This is only
for human curiousity and the listing below is not the version sent to the
compiler.*

.. command-output:: unicon -s -E snobols.icn | cat -s | fmt
   :cwd: examples
   :shell:

Nice.  The SNOBOL operators are actually a new class of functions.

*I talked with Clinton about this, and for now, those functions are for
compiler internal use only. Much smarter, and cleaner, to use the operators.*

 
.. index:: regular expressions, patterns; regex

.. _regular expressions:

Regular expressions
-------------------
When SNOBOL patterns were added to Unicon, regular expression features were
also added.  This means Unicon has the power of :ref:`string scanning`,
:ref:`snobol patterns` and ``regular expressions`` available.  And all three
features can be freely mixed in string manipulation expressions. *Raising the
bar*.

Regular expression literals are surrounded by angle brackets, not quotes.
Pattern matching uses a :ref:`?? <binop??>` operator. As of early Unicon
release 13, regular expressions are limited to ``basic`` regex patterns.


.. literalinclude:: examples/hello-regex.icn
   :language: unicon
   :start-after: ##+

Displays a message when the subject includes some form of ``Hello, world``.
In the example, the first and last elements of the string list do not match.
The regular expression looks for Hello with or without a capital H, an
optional comma, any number of spaces or tabs (including zero), followed by
World (or world), with an optional exclamation mark.

.. program-output:: unicon -s hello-regex.icn -x
   :cwd: examples


.. index:: patterns; operators

Pattern operators
-----------------

- ``??`` - a variant form of string scanning, s ?? p matching a pattern, not a
  general Unicon expression as with :ref:`? <binop?>` scanning.  Unanchored.
- ``=p`` - anchored match of pattern, p.
- ``.|`` - a pattern alternation.  Accepts Unicon expressions as an operand.
- ``->`` - conditional assignment.
- ``=>`` - immediate assignment, (regardless of an actual successful match
  result).
- ``.>`` - cursor position assignment.
- ``<r>`` - a regular expression literal is surrounded in angle brackets
  (chevrons).


.. index:: patterns; syntax

Regex syntax
------------
Regular expressions can include the following components

- ``r`` - ordinary symbol that matches to r.
- ``r1 r2`` - juxtaposition is concatenation.
- ``r1 | r2`` - regular expression alternate (not a generator).
- ``r*`` - match zero or more occurrences of r.
- ``r+`` - match one or more occurrences of r.
- ``r?`` - match zero or one occurrences of r.
- ``r{n}`` - braces surround an integer count, match n occurrences of r.
- ``"lit"`` - match the literal string, with the usual escapes allowed.
- ``'lit'`` - cset literal matching any one character of the set, escapes
  allowed.
- ``[chars]`` - cset literal with dash range syntax.
- ``.`` - match any character *except newline*.
- ``(r)`` - parentheses are used for grouping.

 
.. only:: html
   
   ..

   --------

   :ref:`genindex` | Previous: :doc:`strings` | Next: :doc:`objects`

   |