=========
Datatypes
=========
.. Modified: 2017-07-16/02:54-0400 btiffin
.. Copyright 2016 Brian Tiffin
.. This file is part of the Unicon Programming document
.. GPL 3.0+ :ref:`license`
.. image:: images/unicon.png
:align: center
.. only:: html
:ref:`genindex`
:floatright:`Unicon`
`Icon` is rich in datatypes. ``Unicon`` is just that much richer.
Immutable Unicon Datatypes
==========================
``Unicon`` starts out with some immutable types:
- Integer (arbitrary size)
- floating point Real numbers
- String
- Cset (sets of characters - ASCII)
Note: :ref:`String` is an immutable type. New strings will be formed for
operations that look like they are modifying a string in place. *This has
consequences*, detailed in the :ref:`String` entry.
.. index:: !Integer, datatype; integer
.. _Integer:
Integer
-------
Integers in Unicon can be any size, (when the ``large integers`` feature is
compiled in) and are always exact values.
.. index:: radix, base
Radix prefix
............
Integer literals in source code can be of any base from 2 through 36, by using
a radix prefix; the default being decimal, base 10. The radix is always a
decimal number, followed by the letter ``r`` or ``R`` followed by digits of
the value.
.. sourcecode:: unicon
42
2r101010
5r132
16r2A
16R2a
36r16
The above, are all valid literals for the value forty-two. Base 0 and base 1
are invalid, and will produce a compile time error
.. literalinclude:: examples/numeric-literals-errors.icn
:language: unicon
:start-after: ##+
.. only:: html
.. rst-class:: rightalign
:download:`examples/numeric-literals-errors.icn`
Sample run (with errors):
.. command-output:: unicon -s numeric-literals-errors.icn -x
:cwd: examples
:returncode: 1
Getting rid of the troublesome lines (and adding some larger numbers):
.. literalinclude:: examples/numeric-literals.icn
:language: unicon
:start-after: ##+
.. only:: html
.. rst-class:: rightalign
:download:`examples/numeric-literals.icn`
Sample run:
.. command-output:: unicon -s numeric-literals.icn -x
:cwd: examples
Base radix also influences the ``digits`` allowed in the value. For base
eight, 8 and 9 are illegal, not being part of the valid set of digit symbols.
For base two, only 0 and 1 are allowed. For bases above ten, the alphabet is
used. ``A (or a)`` represents ten, ``B/b`` is eleven, ``Z/z``
represents thirty-five with radix 36.
Given
.. sourcecode:: unicon
write(3r42)
``unicon`` will report a syntax error
::
File numbers.icn; Line 16 # invalid integer literal
numbers.icn:16: # "42": syntax error (104;258)
Case sensitivity
................
Unlike most of Unicon (being case sensitive), the Radix indicator ``R`` is
case insensitive, ``R`` and ``r`` both work. As do upper and lower case
letters when used as ``digits``. ``2A`` and ``2a`` are both valid
representations of forty-two in hexadecimal (assuming the ``16r`` radix is
given first). All of ``16R2a``, ``16R2A``, ``16r2a`` and ``16r2A`` represent
``42`` in base 10.
Octal
.....
Unicon does NOT follow the C convention of ``0`` prefixed literals being
treated as octal (base eight), values. Use ``8r0777`` if you feel the need to
prefix your octal constants with a zero. ``8r777`` will work just as well
for representing five hundred eleven. ``042`` is forty two, not thirty four
as a Unicon numeric literal, unlike C and some other languages.
.. sourcecode:: unicon
042 ~= 34
042 == 42
042 == 8r52
.. index:: scaling suffix
Scaling suffix
..............
And just to add to the flexibility, Unicon supports a trailing suffix that
closely resembles the International System of Units ``SI`` standard, but
scaled for binary computers and not the normal decimal base in thousands.
Unicon uses ``1024`` based scaling.
- K (or k) kilo, literal is multiplied by :math:`1024`
- M (or m) mega, literal is multiplied by :math:`1024^2`
- G (or g) giga, :math:`1024^3`
- T (or t) tera, :math:`1024^4`
- P (or p) peta, :math:`1024^5`
.. sourcecode:: unicon
write(42)
write(42K)
write(42M)
write(42g)
write(42t)
write(42P)
Gives
::
42
43008
44040192
45097156608
46179488366592
47287796087390208
To make for some sanity, the suffixes are only supported for decimal (base
ten) literals. *Even considering the scaling is actually a binary and not
decimal thousands based scaling*. For instance ``36r16K`` is a base 36 literal
of the value ``16K`` (1532 decimal), not ``36r16`` modified by a suffix ``K``.
The scaling suffixes are also case insensitive.
Positive and negative
.....................
The ``+`` (plus) and ``-`` (minus) signs can be used with any of these
literals, and come before any radix specifier.
.. sourcecode:: unicon
-2r101010 == -42
-16r2A == -42
-16R2a == -42
+36r16 == +42
There is no such thing as a negative base in ``Unicon`` so the sign always
effects the value, never the radix (or the meaning of the scaling suffix).
.. index:: integer; large
Arbitrary magnitude
...................
One of the nicer things with Unicon is the unlimited integer size.
.. sourcecode:: unicon
n := 123456789012345678901234567890123456789012345678901234T
write(n)
Gives that long literal (scaled by tera, ``T``, 1024^4), which displays as
::
135742175046962388768696238876869623887686962388768695614475075584
Unicon Integer data is *always* exact, regardless of magnitude. *Unless you
run out of memory, or other error condition has been triggered*.
Play nice
.........
Given all that flexibility, do yourself, and everyone else, a favour and stick
with literals that conceptually make sense for the task at hand. Don't use
base thirteen literals, just because you can. Stick with the ten fingers for
most code, and go to another radix only when it makes sense. Use ``8r`` octal
numbers when dealing with things like Unix permissions, or ``2r`` for bit
patterns. Base twenty-three literals will just cause confusion, for no reason
and slow down everyone that wants to read through your program sources.
*Using* ``unicon`` *is proof enough that you are smart cookie.*
.. index:: !Real, datatype; real
.. _Real:
Real numbers
------------
Unicon also supports double precision floating point real numbers (in base
ten, decimal).
+/- digits, a decimal point (period), more digits, optional +/- E exponent
.. sourcecode:: unicon
write(1.23)
write(.23)
write(1.)
write(1.23E42)
write(-1.23E-42)
Floating point is an inexact science. The internal representation is an
approximation brought on by differences between binary and decimal notation.
``0.5``, is exact, both in decimal and base two arithmetic. But many values,
such as ``0.3`` can cause problems when scaled, multiplied and divided. Be
wary of precisions, rounding errors and keep a healthy skepticism when dealing
with floating point double precision values.
*Don't rely on floating point math for financial calculations. Use fixed
point integer math or use something like* :ref:`GnuCOBOL` *for problems that
require bank safe computations.*
Having said that, there *is* science behind floating point representation and
for most engineering problems, *close enough*, is usually a realistic
expectation.
.. index:: coercion; floating point
Floating point coercion
.......................
Unicon will always attempt to match integer and floating point calculations,
promoting values to and from integer and real as hinted at by the code and the
datatypes forming the computation.
.. sourcecode:: unicon
write("Integer division")
every i := 1 to 8 do write(2 / i)
write()
write("Real division")
every i := 1 to 8 do write(2.0 / i)
Give two completely different sets of output. The first ``every`` loop uses
integer division, fractions lost and mostly zeros. The second uses floating
point math, the ``2.0`` literal *forcing* the floating point computation, and
a ``Real`` result:
::
prompt$ unicon numbers.icn -x
Integer division
2
1
0
0
0
0
0
0
Real division
2.0
1.0
0.6666666666666666
0.5
0.4
0.3333333333333333
0.2857142857142857
0.25
Any ``Real`` value in a computational input will cause a ``Real`` result.
When all the input values are ``Integer``, the result is Integer, even if it
seems like it should cause a fractional answer.
.. index:: coercion; string to numeric
String to number coercion
.........................
Unicon will do similar implicit coercion of data types when given a ``String``
value as part of a numerical equation. If a ``String`` can safely be
converted to a numeric value, ``Integer`` or ``Real``, it will be. If not, it
will raise a run-time error.
.. sourcecode:: unicon
write("String as Integer division")
every i := 1 to 4 do write("2"/i)
write()
write("String as Real division")
every i := 1 to 4 do write("2.0"/i)
write()
write("String as garbage division")
every i := 1 to 4 do write("2.o"/i)
Gives::
String as Integer division
2
1
0
0
String as Real division
2.0
1.0
0.6666666666666666
0.5
String as garbage division
Run-time error 102
File numbers.icn; Line 40
numeric expected
offending value: "2.o"
Traceback:
main()
{"2.o" / 1} from line 40 in numbers.icn
Explicit type conversions
.........................
Conversion of data types can also be explicit.
.. sourcecode:: unicon
write("String as explict Real division")
every i := 1 to 4 do write(real("2")/i)
Produces::
String as explicit Real division
2.0
1.0
0.6666666666666666
0.5
The built-in functions ``real(s)`` and ``integer(s)`` can be used to convert
``String``, ``Real`` and ``Integer`` data to the given numeric form.
``integer(r)`` will be a truncation conversion. ``integer(2.3)`` and
``integer(2.9)`` both return 2. ``real(i)`` may lose precision once the
integer value exceeds what can be stored in a floating point double precision
value.
.. sourcecode:: unicon
n := 123456789012345678901234567890123456789012345678901234
write(n)
write(real(n))
write(integer(real(n)))
Shows as
::
123456789012345678901234567890123456789012345678901234
1.234567890123457e+53
123456789012345677902421375322642595439917720609488896
A huge loss of precision occurs after 16 digits of decimal. Don't be firing
any NASA spacecraft out toward Jupiter without very careful consideration of
floating point Real number precision and accuracy. Unicon is not at fault
here, it is in the nature of approximation with floating point
representations.
Again, in most day to day real world operations, unless you are trying to fire
a rocket with sub-nanometre accuracy across a trillion kilometre distance,
Unicon Real values will be close enough. For pure mathematics? *Not even in
the same ball park*.
.. index:: !Cset, datatype; cset
.. _Cset:
Cset
----
A character set. A highly efficient datatype for pattern matching.
Csets are limited to single byte values, 0 through 255. There are no
duplicates within a Cset. Cset literals use single quotes in Unicon.
.. literalinclude:: examples/cset-samples.icn
:language: unicon
:start-after: ##+
.. only:: html
.. rst-class:: rightalign
:download:`examples/cset-samples.icn`
Sample run:
.. command-output:: unicon -s cset-samples.icn -x
:cwd: examples
.. index:: !String, datatype; string
.. _String:
String
------
This, is where Unicon shines. String manipulation is at the heart and soul of
:ref:`Icon`, Unicon, and back all the way to :ref:`SNOBOL`. Pattern matching,
string scanning, slicing, dicing and transformation operations you probably
haven't even thought of yet. All nicely packaged in the Unicon executable,
very clearly, and very concisely. Ralph Griswold was one of those genius
level computer scientists, who led the core Icon developers to exceed
expectations and go above and beyond the norm. Clint Jeffery and his team are
now pushing those expectations out even further with Unicon. *Release 13 of
Unicon has SNOBOL inspired pattern matching operations built in, a huge
testament to the legacy and future of Unicon programming. More on that in
the* :ref:`SNOBOL patterns ` *chapter.*
In ``Unicon``, String data is *immutable*, and is never changed in place. New
String data is created, as required.
.. sourcecode:: unicon
s[2:3] := "D"
That expression does not change the existing character of ``s`` at index 2,
but is equivalent to the expression:
.. sourcecode:: unicon
s := s[1:2] || "D" || s[3:0]
Creating a new string by copying existing parts. *Internal memory management
means that this is a safe thing to do, as many millions of times as an
algorithm may need. The heaps will be efficiently managed by the Unicon
runtime*.
.. index:: indexing, subscripts
.. _indexing:
Indexing
........
String indexing positions (or *subscripting*) is calculated using a cursor
that floats "between" characters. Indexing starts at 1, with the virtual
cursor positioned before the first character. The end is position "0", and
can count backwards, -1 being the position between the last two characters of
the string.
For instance::
"ABC"
That string has positions: 1, 2, 3, 4 (or 0) counting from start to end.
Using negative indexing, the positions are: -3, -2, -1, and 0, counting from
the end to the start going backwards.
.. blockdiag::
blockdiag indexing {
start [shape=beginpoint];
a [label="A", shape="square"];
b [label="B", shape="square"];
c [label="C", shape="square"];
end [shape=endpoint];
start -> a [label="1"];
a -> b [label="2"];
b -> c [label="3"];
c -> end [label="4 / 0"];
}
From the end that is
.. blockdiag::
blockdiag backwards {
start [shape=beginpoint];
a [label="A", shape="square"];
b [label="B", shape="square"];
c [label="C", shape="square"];
end [shape=endpoint];
start -> a [label="-3", dir="back"];
a -> b [label="-2", dir="back"];
b -> c [label="-1", dir="back"];
c -> end [label="0", dir="back"];
}
It is important to get used to the idea of position values being *before*,
*between* and *after* characters, and that zero is the position past the end
of a string.
Pattern
-------
Unicon now supports SNOBOL based pattern data.
Regular Expression
------------------
Along with patterns, Unicon has also added regex literals for string matching.
Basic regex at this point.
Non computational types
=======================
Unicon also supports a variety of *non computational* types. These vary from
File, to Window to other internally managed (usually) *non-mutable* types.
.. note:: Unicon does not have pointers, but does manage internal references.
.. index:: File
.. _file:
File
----
A value returned from :ref:`open` when using file modes of ``r``, ``w``,
``a``.
.. index:: Window
.. _window:
Window
------
A graphics context.
Co-expression
-------------
A very handy code datatype.
Next up, structures
...................
``Icon`` and ``Unicon`` then add a very nice set of aggregate :ref:`structures
` and other high level datatypes. Most of these types are
*mutable*. Slices will be changed in place and not copied as frequently as
:ref:`string` data.
From this point on, Icon won't be mentioned as often, this is a Unicon book.
.. only:: html
..
----------
:ref:`genindex` | Previous: :doc:`unicon` | Next: :doc:`structures`