.. index:: blog; 2017/01/02

.. Modified: 2017-01-07/02:09-0500

Unicon FFI
==========

Well met,

Let 2017 be the year of the ``uniffi``.  Unicon Foreign Function Interface.

Ok, Unicon already has a Foreign Function Interface, `loadfunc` and similar C
function interfacing has been in Unicon since its inception, dating back to
at least Icon version 8.10, March of 1993.  There were two C interfaces
documented for that release, outbound, `callout` and inbound ``icon_call``.

Sadly, the inbound code in Unicon for ``icon_call`` is no longer available,
*but read on for a possible future, perhaps better alternative*.

The outbound interface `callout` is still in Unicon version 13, but requires a
special build of the entire compiler/runtime system to replace an internal
stub function called ``extcall``, in :file:`src/runtime/extcall.r` which by
default just returns and error code 216.  Anyone is free to dig into this
interface, actually fairly well documented by `ralph` in IPD217,
http://www2.cs.arizona.edu/icon/ftp/doc/ipd217.pdf

It's old, and usable, but all the recent activity has been focused on
`loadfunc`.  A small layer of code was added in version 9, (the base Icon used
for Unicon core, much has changed in Unicon since then) to load C function
entry points at runtime, from dynamic shared object libraries.  And `loadfunc`
was born.  Foreign functions could/can be loaded into Unicon at runtime
without need of special builds that ``extcall`` and `callout` require.

There are a lot of `loadfunc` examples peppered throughout the Unicon
Programming document set.  It opens up doors to C libraries, which are
numerous and ubiquitous.

One issue with `loadfunc` is that the functions called have to comply with a
Unicon calling convention.  Routines are passed an ``argc argv`` style Unicon
frame, using a count of passed in descriptors.  These descriptors need to be
manually converted to C native data, passed on to other C routines, and then
converted back to Unicon data types for returning results.  There are copious
examples of managing this protocol, and support macros in
:file:`ipl/cfuncs/icall.h` that make this all pretty easy.  But, it is still
an extra layer of burden placed on a Unicon programmer aiming to use an
existing C library solution to a problem, or for a speed boost.

And now a step up.

libffi
------

``libffi`` is a foreign function interface library, that manages the call
frame setup for all kinds of different calling conventions.  32bit, 64bit and
many different operating systems are all supported.  This layer was put to use
to alleviate the need to use `loadfunc` for many/most/all C functions that a
Unicon programmer may want to call.  Once loaded (the experimental
``native(...)`` function is not built into Unicon, so it uses `loadfunc` to
bootstrap), all a Unicon programmer needs to do is call ``native``:

.. sourcecode:: unicon

    dlHandle := addLibrary("libraryName")

    result := native("function", returnType, arguments,...)
    more := native("otherFunction", returnType, argumens,...)
    ...

And that's it.  Under the covers the ``native`` function finds an entry point
(usually after a supporting call to ``addLibrary`` which is the name of a
Dynamic Shared Object module archive (a DLL)), marshals the :t:`Unicon`
arguments by for use by :t:`C`, and dispatches a call/return sequence.
Results from :t:`C` are converted to the specified Unicon ``returnType`` and
passed back to :t:`Unicon`.  Almost of this become invisible to the Unicon
programmer.  All you need to do is call ``native`` with a function link name
and arguments.  Almost all :t:`C` native data types are supported.

And that is a wrinkle. :t:`C` call frames need to know the exact type of each
argument, and what type to return (including nothing, termed ``void``).  For
many types, ``native`` can just convert to reasonable :t:`C` types.  Integer
to ``int``, Real to ``double``, String to ``char *`` etc, using the handy
macros built into :file:`icall.h`.  Sometimes this is wrong.  :t:`C`
(currently) has two types of floating point values, 32 bit ``float``, and 64
bit ``double``.  There are also distinctions for 8bit, 16bit, 32bit, 64bit
integers, in both signed and unsigned forms.  :t:`Unicon` just has Integer and
Real.

``native`` allows for type overrides in the function call, using two element
lists.

.. sourcecode:: unicon

    result := native("function", TYPEFLOAT, [x, TYPEFLOAT], [y, TYPEFLOAT])

The real values from :t:`Unicon` are demoted to :t:`C` ``float`` data, and the
returning type is promoted from ``float`` to an acceptable :t:`Unicon` `real`
form.

These type specifications can be freely mixed

.. sourcecode:: unicon

    result := native("mixed", TYPEINT, [x, TYPEFLOAT], [y, TYPEDOUBLE])

That assumes that mixed has a :t:`C` prototype of ``int mixed(float x, double
y)`` and makes the proper arrangements for the function call, returning an
Integer result back to Unicon.

.. note::

    Please note that this experiment is at a very early stage, and some of the
    type constant names, and argument lists may change before this ever gets
    accepted into Unicon proper; if it ever gets accepted.

libharu
.......
This entire exercise started with a desire to integrate PDF generation in
Unicon by leveraging ``libharu``, the PDF writer library.  There are many tens
of functions in ``libharu`` and each one would have required a small
``loadfunc`` call convention wrapper, written in :t:`C` to accommodate. That
led to an initial version of ``native()`` that took on the task of preparing a
:t:`C` call frame using inline assembler, which works, but is limited to
x86_64 System V call conventions.  See `cnative` for that blurb.

After finishing a trial of `cnative`, ``libffi`` was discovered.  It does the
same job and far more than `cnative`; there is a single interface, no burden to
write umpteen dozen small pieces of assembler to support the various platforms
that Unicon is currently built to run on, and is well supported by a team of
experts in the area of foreign function calls.

Here is what the ``libharu`` integration example looks like:

.. literalinclude:: ../programs/uniffi/haru.icn
   :language: unicon
   :start-after: ##+

.. only:: html

    .. rst-class:: rightalign

        :download:`../programs/uniffi/haru.icn`

Fairly short, and sweet.

This sample barely scratches the surface of ``libharu`` features (simply
drawing a partial arc, filled in red). What it highlights is that the calls
occurred with no extra :t:`C` source required.

This is where the excitement might start to build.  :t:`Unicon` programmers
can focus on :t:`Unicon`, leaving :t:`C` to the :t:`C` folk.

Here is a small GnuCOBOL program that was used during testing

.. literalinclude:: ../programs/uniffi/cobolnative.cob
   :language: cobol
   :start-after: *>+<*

.. only:: html

    .. rst-class:: rightalign

        :download:`../programs/uniffi/cobolnative.cob`

The Unicon caller:

.. literalinclude:: ../programs/uniffi/cobffi.icn
   :language: unicon
   :start-after: ##+

.. only:: html

    .. rst-class:: rightalign

        :download:`../programs/uniffi/cobffi.icn`


And a sample run:

.. command-output:: cobc -m -Wno-unfinished cobolnative.cob
   :cwd: ../programs/uniffi/

.. command-output:: unicon -s cobffi.icn -x
   :cwd: ../programs/uniffi/

``libffi`` makes calling GnuCOBOL modules from Unicon, a complete breeze.

Next steps
..........

I plan on pestering Clinton and Jafar, and who ever else will listen to help
polish this up, and hopefully get it added to the Unicon build system proper.
It currently lacks some features; not all datatypes are properly supported and
there needs to be some deep discussion about how indirect data references
(:t:`C` pointers) should be handled (they cannot be allowed to change
immutable Unicon data, so an interstitial layer will need to be worked out).

I'd be honoured to continue this with a formal Unicon Technical Report, and
will do so if that's what it takes to advance this flag.

On the other side of the coin...

C calling Unicon
----------------

The ``unicon -C`` native compile sequence is pretty handy.  It creates a
native executable by generating C source code and compiling that intermediate
into a native binary.  The one point lacking is that it assumes a ``main`` is
generated from the Unicon side, and does all the linking steps assuming that
point of view.  I'd like to extend ``unicon -C`` with a new compile time
option (something like ``--no-main`` or ``--object`` or ``-c`` meaning
compile/don't link (but the ``-c`` idea was deemed to conflict with the
current meaning of *generate ucode*), to produce object code, ready for
linking to other programs.

Initial trials for this have been proven (in a hack sort of way) by changing
the generated :t:`C` code output by ``unicon -C`` to change the name of *main*
to *somecode* and then removing the link phase from invocation of ``gcc`` that
is used, to simply generate an object file with ``gcc -c``.  That code was
then linked to a GnuCOBOL test program, and :t:`Unicon` was called, data
passed in, results returned.

The hack even went as far as returning a pointer
to the :t:`Unicon` `global` variable structure that is part of native
executables, but that part would not be part of any production level release.
First a shared memory space sequence would be worked out, instead of
pointers into :t:`Unicon` space (which can be garbage collected and moved at
any time, outside normal control of a developer).

:t:`Unicon` object files (meaning ``.o`` files, not class objects) will
alleviate some of the need to resurrect ``call_icon`` to allow :t:`C` programs
to call :t:`Unicon` programs.  :t:`Unicon` will then be able to take part in
all forms of mixed language programming.  Shareable libraries could be created
that will allow foreign languages to enjoy direct benefit from :t:`Unicon`
language features without knowing anything about :t:`Unicon` source code.
*Though one of the goals will be to demonstrate how easy that code is to read
and write*.

The first round of experiments relied on statically linking to the :t:`Unicon`
runtime system, but another phase may provide for a :file:`libunicon.so`
that could be dynamically linked into these callable :t:`Unicon` modules.
This would make for very small, easy to manage :t:`Unicon` application level
link libraries (or singleton object files). 

Continuing this experiment has been given the nod by `clint`, but there are
many details to work out, and it won't be part of :t:`Unicon` until the entire
sequence is ready at a level of quality expected by :t:`Unicon` developers.
There will be copious amounts of documentation available during the design,
development and implementation stages.

There are lots of things to discuss, and many possibilities await.

You can follow along in the SourceForge Discussion pages at

https://sourceforge.net/p/unicon/discussion/contributions/

*Have good, make well, happiest of 2017s*


.. post:: Jan 02 2017
   :tags: uniffi
   :category: extension
   :author: Brian Tiffin
   :location: on.ca
   :language: en