Building GNAT on MUSL, now always static

May 28th, 2018

An update on the previous version.

The produced gcc compiler builds static executables and no dynamically linked executables1.

For detailed instructions in how to run the script see the readme-2018-05-28.txt.

Updated!, the 2018-05-28 file contained a broken patch

New version!

  1. Building dynamically linked executables is controlled by a specfile. GCC has a builtin specfile, code for this file can be found in the gcc/config directory of the gcc source. The C++ and Ada front-ends have a slightly different handling of these settings and for these also some code had to be changed. The changes can be found in the 'gcc-4.9.adacore2016-2-musl.diff' file. []

Building GNAT on MUSL, now with partial and parallel build support

May 15th, 2018

An update on the previous version.

The previous version needed git for getting the source of libelf-compat, this was ripped out1.
The previous version cleaned and reused the build directory for different stages, this was changed to use a separate build directory for each stage and no more cleanup. Last, the script now builds with parallel make options, further speeding up the build process2.

  1. A tarred version of libelf-compat does exist on the internets, however that version does not match the one in the git repository. []
  2. The problem with building in parallel seemed to be with some of the ada specific build rules. But changing those rules did nothing to fix the problem. In fact, the installation of a previous step had failed. Which was a direct result of using the environment variable MAKEFLAGS, this environment variable is used in the scripts but also read in the make program. So, make install was run with parallel jobs and promptly failed. The actions of one of the rules used a variable in a loop and that variable was changed by an action in another rule. The fix was 2-fold, use MAKEOPTS, change the install step to always use make -j1. []

Building GNAT on MUSL, now with ARM 64-bit support

April 30th, 2018

An update on the previous version. I thought that version already supported ARM 64-bit processors1, but unfortunately it did not.

So another debug cycle ensued. As it turned out the code that is used to generate aarch64 instructions had a wrong #ifdef line. Once this bug was fixed, the next bug cropped up and with a mean time between a possible fix and the correct fix taking days, the whole exercise took weeks2. After a week or so, the cross-compiler seemed to work. Next, I wanted to compile a native compiler for the target platform with the help of the cross-compiler. Again, time was spent, and fortunately a compiler could be build and after some more work3 it can build the FFA code on aarch64 systems.

Some small things are left to do; the scripts can do with some clean-up and the native compilers are not tarred to a file, finally, I want to try little-endian ppc.

Update:The native compilers do not contain the gprbuild tool, this is still on the todo list.

  1. Build scripts designate these processors with 'aarch64'. []
  2. Try the fix, start a rebuild which will take an hour if unsuccessful and 3 if successful, look at it after half a day or more, see that it failed, try something else, etc. etc. []
  3. It also needs an assembler and a C library []

Sending arrays of octets between C and Ada

March 2nd, 2018

The Ada language and the C language have a very different concept of strings and characters. I'll try to follow Ossasepia and use the term octet for an 8 bit integer and use char for C and Character for Ada. In Ada the Character is defined as an enumerated type ranging from the Ada.Characters.Latin_1.NUL (Character'Val (0)) character to the Ada.Characters.Latin_1.LC_Y_Diaeresis (Character'Val (255)) character. This range is exactly the same as the range of all valid octets and so characters can be stored as octets. As characters are supposed to represent another domain as natural numbers these need to be converted back and forth through the Character'Val and Character'Ord functions. In C all is a lot more muddier and a char can either be seen as an 8 bit integer or as a symbol for a language, it all depends on the context. So far for context, now to address the cause, how to convert arrays of characters between C and Ada1. I consider the Char_Ptr support from the ada standard library out of bounds2, this investigation is based on the 'char_array' type from the Ada package `Interfaces.C. First, the code;

The code defines 5 different methods to interact with strings between C and Ada;

  1. The basic method, call a function from Ada to C. The character_array in Ada, will turn into a char * pointer in C. A parameter needs to be added to pass the length of the array.
  2. The basic method, now two way. Ada will call C and C will immediately call Ada. The Ada function uses a constrained character array so no count parameter is needed for it.
  3. A GNAT specific method, import a C function but use the Ada calling convention. The character_array in Ada will be a structure in C. This layout of this structure is based on how GNAT does this internally.
  4. Call to C as in (1), but then call Ada as in (3). Note that Ada methods can be exported using the Ada calling convention.
  5. Like (2) but now the Ada procedure does not have a constrained character array as parameter but an unconstrained one, so a count parameter is needed for Ada too.

First, to define the procedures (please also read the calling conventions section of the GNAT documentation):

with Interfaces.C; use Interfaces.C;

package C_Array is

        -- The basic method, call C using a pointer and a count
        procedure C_Fill_1(CH : in out char_array; Count : Integer);
        pragma Import(C, C_Fill_1, "c_fill_1");

        -- Same method as 'C_Fill_1', but the C function will call Ada.
        procedure C_Fill_2(CH : in out char_array; Count : Integer);
        pragma Import(C, C_Fill_2, "c_fill_2");

        -- Same method as 'C_Fill_1', but the C function will call Ada using Ada calling conventions
        procedure C_Fill_3(CH : in out char_array; Count : Integer);
        pragma Import(C, C_Fill_3, "c_fill_3");

        -- Call to C using Ada calling conventions
        procedure C_Fill_4(CH : in out char_array);
        pragma Import(Ada, C_Fill_4, "c_fill_4");

        -- Same method as 'C_Fill_1', the C function will call Ada using an unconstrained array and a count.
        procedure C_Fill_5(CH : in out char_array; Count : Integer);
        pragma Import(C, C_Fill_5, "c_fill_5");

        -- For method 2, the C function will make a call to a function with a constrained array parameter
        subtype constrained_char_array is char_array(0 .. 100);
        procedure ADA_Fill_2(CH : in out constrained_char_array);
        pragma Export(C, ADA_Fill_2, "ada_fill_2");

        -- For method 3, the C function will make a call to Ada usinging Ada calling conventions.
        procedure ADA_Fill_3(CH : in out char_array);
        pragma Export(Ada, ADA_Fill_3, "ada_fill_3");

        -- For method 5, the C function will make a call to a function with a unconstrained array parameter
        subtype constrained_char_array is char_array(0 .. 100);
        procedure ADA_Fill_5(CH : in out char_array; Count: Integer);
        pragma Export(C, ADA_Fill_5, "ada_fill_5");

end C_Array;

Next, the more interesting bit, the C functions;

#include <stdint.h>
#include <stdio.h>

typedef struct B {
        size_t LB0;
        size_t UB0;
} B_t;

typedef struct U {
        char * P_ARRAY;
        B_t * P_BOUNDS;
} U_t;

void ada_fill_2(char * buffer);
void ada_fill_3(U_t array, int count);
void ada_fill_5(char * buffer, int count);

void c_fill_1(char * buffer, int count) {
        int i;

        printf("c_fill_1; buffer = %p, count = %dn", buffer, count);

        for(i = 0; i < count; i++) {
                buffer[i] = '1';

void c_fill_2(char * buffer, int count) {
        printf("c_fill_2; buffer = %p, count = %dn", buffer, count);

void c_fill_3(char * buffer, int count) {
        B_t b;
        U_t a;
        b.LB0 = 0;
        b.UB0 = count;
        a.P_ARRAY = buffer;
        a.P_BOUNDS = &b;

        printf("c_fill_3; buffer = %p, count = %dn", buffer, count);

        ada_fill_3(a, count);

void c_fill_4(U_t array) {
        int i = 0;
        char * buffer = array.P_ARRAY;

        printf("c_fill_4; buffer = %p, count = %dn", array.P_ARRAY, array.P_BOUNDS->UB0 - array.P_BOUNDS->LB0);

        for(i = array.P_BOUNDS->LB0; i <= array.P_BOUNDS->UB0; i++) {
                buffer[i] = '4';

void c_fill_5(char * buffer, int count) {
        printf("c_fill_5; buffer = %p, count = %dn", buffer, count);
        ada_fill_5(buffer, count);

The first 2 functions are simple. Because the array is constrained in Ada there is no need for the count parameter, however the actual length of the array in C must always be the same as the one in Ada. Next the two methods that took the most time to figure out. I could not find any useful description of the so called Ada Calling Convention. No such convention seems to be specified, and every ada implementation is free to implement this as they see fit. The C code will be tight to GNAT when using this method and maybe even specific versions of GNAT. The Ada calling convention for arrays is implemented in the interface between the GNAT frontend and the GCC backend3. In the interface, the GNAT code tree is converted into a GCC code tree, and most of these functions are recursive and try to get information from different parts of both trees. In short, reading this code is not that easy, but from the code I could determine that unconstrained arrays are send as a structure with two fields, one field is a pointer to the start of the array, and the other is a pointer to a structure having again two fields. This last structure has a field for the lower bound and one for the upper bound of the array. The exact layout of the structure was a bit harder to determine so an extra flag for the compiler was needed -fdump-tree-original4. From that dump, I could determine the structure5. The C function is not more secure with this structure, but the Ada implementation will be. Finally, we finish with the more usual way of calling an Ada function with an unconstrained character_array and a count variable.

For reference, the Ada implementation. Note that for the fifth case we cannot use the upper bound of the array. This upper bound is undefined (and in practice will be the maximum value of the given range type).

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with Ada.Long_Integer_Text_IO; use Ada.Long_Integer_Text_IO;

package body C_Array is

        -- We have a statically defined length so the range will be fine.
        -- The call in C code to this procedure must use a buffer with at least the constrained range.
        procedure ADA_Fill_2(CH : in out constrained_char_array) is
                Put(" lb=" & size_t'Image(CH'First));
                Put(" ub=" & size_t'Image(CH'Last));

                for I in CH'Range loop
                        CH(I) := To_C('2');
                end loop;
        end Ada_Fill_2;

        -- The call in the C code needs to send an Ada array.
        procedure ADA_Fill_3(CH : in out char_array) is
                Put(" lb=" & size_t'Image(CH'First));
                Put(" ub=" & size_t'Image(CH'Last));

                for I in CH'Range loop
                        CH(I) := To_C('3');
                end loop;
        end Ada_Fill_3;

        -- For calls from C without a constained type or ada array, an extra count parameter is needed.
        procedure ADA_Fill_5(CH : in out char_array; Count: Integer) is
                Put("ada_fill_5; count="); Put(Count);
                Put(" lb=" & size_t'Image(CH'First));
                Put(" ub=" & size_t'Image(CH'Last));

                -- the Range cannot be used, the 'Last index is wrong.
                for I in ch'First .. size_t(Count) loop
                        CH(I) := To_C('5');
                end loop;
        end Ada_Fill_5;
end C_Array;

The code includes a simple main program that calls all five functions;

with C_Array; use C_Array;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;

with Interfaces.C; use Interfaces.C;

procedure Array_Main is
        work_array : char_array(0 .. 100);
        output_string : String(1 .. 101) := (others => ' ');
        C : Integer := 0;
        To_Ada(work_array, output_string, C, False);
        Put("c_fill_1 output ="); Put(output_string);

        To_Ada(work_array, output_string, C, False);
        Put("c_fill_2 output ="); Put(output_string);

        To_Ada(work_array, output_string, C, False);
        Put("c_fill_4 output ="); Put(output_string);

        To_Ada(work_array, output_string, C, False);
        Put("c_fill_4 output ="); Put(output_string);

        To_Ada(work_array, output_string, C, False);
        Put("c_fill_5 output ="); Put(output_string);

The output will be;

c_fill_1; buffer = 0x7ffdf87fd910, count = 100
c_fill_1 output =1111111111111111111111111111...
c_fill_2; buffer = 0x7ffdf87fd910, count = 100
ada_fill_2; lb= 0 ub= 100
c_fill_2 output =2222222222222222222222222222...
c_fill_3; buffer = 0x7ffdf87fd910, count = 100
ada_fill_3; lb= 0 ub= 100
c_fill_4 output =3333333333333333333333333333...
c_fill_4; buffer = 0x7ffdf87fd910, count = 100
c_fill_4 output =4444444444444444444444444444...
c_fill_5; buffer = 0x7ffdf87fd910, count = 100
ada_fill_5; count=        100 lb= 0 ub= 18446744073709551615
c_fill_5 output =5555555555555555555555555555...

A random remark; it is not a good idea to call To_Ada on the unconstrained array from method 56. First, To_Ada is not more efficient than a character by character conversion, in fact it is implemented as such. Second, To_Ada will use the Last parameter of the `character_array and that parameter is set to the maximum value of size_t (Ada will check on this bound but a segmentation fault will happen first). Either copy the character_array to a constrained character array first, or write a custom conversion function.

Another random remark; the Ada standard library can be studied with the Ada 2012 LRM and understood better with the GNAT source code. It helps to have a cross-referenced, browser readable version of the GNAT source code at hand (there is one in the GNAT Book, but that one is incomplete). To make such a version do:

0) Make a directory and go to it

    mkdir ada-html
    cd ada-html

1) Find the gnat runtime library
    (i.e. the directory containing adainclude and adalib)
    It should be in the gnat install directory,
    as the lib/gcc/<machine>/<gcc version>/ directory

    For AdaCore 2016, (linux 64bit) this directory can set with:

    RT_DIR = $(dirname `which gnatmake`)/../lib/gcc/x86_64-pc-linux-gnu/4.9.4

2) Copy the source files from adainclude:

    cp $RT_DIR/adainclude/*.ad* .

3) Copy the ali files from adalib (needed for cross references):

    cp $RT_DIR/adalib/*.ali .

4) Make the html files with the `` script: -f -D *.ad*

5) Go to the 'html' directory and look (open index.htm in a browser):

    cd html
  1. I've done this a couple of times and my knowledge thus gained was largely anecdotal. This won't do for republic business so hence this article. []
  2. please read the GNAT source code file i-cstrin.adb'. This should put you off the idea of using the Char_Ptr []
  3. all the code can be found in the gcc/ada/gcc-interfaces directory []
  4. This flag will dump the gcc version of the tree in a somewhat readable fashion. The dump does omit information so it also needs to be followed by a recompile and dump with -fdump-tree-original-raw (see also the GCC Command Line Switches). The second dump can be used to determine the types of the fields in the structures []
  5. For every kind of array this structure will have the same field but the fields will have different types. For integer array, the pointer will be an integer pointer. For Strings the boundary fields (LB0/UB0) will be 32 bit integers. So this kind of interfacing needs to be repeated on a case by case bases []
  6. This can be determined from reading the code in the i-c.adb file []

GNAT and Zero Foot Print Runtimes

February 27th, 2018

Executables written in ADA and compiled with GNAT, include a portion of the GNAT runtime library, the C library and some start-up/shut-down code. The total sum of extra code bytes added to any statically compiled executable can easily become 2 megabytes. Most of this code will never be called. Good luck to you, if you want to read the decoded assembly lines of such an executable. As the same luck would have it, the GNAT system includes the possibility to compile code with a custom runtime library and GCC itself can use different C libraries.

The gnat compilers can use alternate GNAT runtimes. The most common use of such alternate runtimes is for embedded systems with little or no OS. A guide to building such an embedded runtime system can be found here12. This guide provided the basis for the runtime described here.

A minimal runtime can be found in the following vpatch;

The vpatch needs to be pressed, and the code can be build with gprbuild.

All optional code has been removed from the gpr project file3.

library project Gnat_Runtime is
   for Languages   use ("Ada");

   for Source_Dirs use ("adainclude");
   for Object_Dir use "obj";
   for Library_Kind use "static";
   for Library_Name use "gnat";
   for Library_Dir use "adalib";

   package Builder is
      for Default_Switches ("Ada") use (
              "-gnatg", "-gnatyN",
              "-gnatec=" & Gnat_Runtime'Project_Dir & "restrict.adc");
   end Builder;

   package Compiler is
      for Default_Switches ("Ada") use (
   end Compiler;

end Gnat_Runtime;

If you inspect the pressed directory, you can see that a GNAT runtime is a directory with two subdirectories adainclude and adalib. The adainclude directory needs to contain the ada library files, the adalib needs to contain the libgnat.a and the .ali files. The obj directory is for intermediary build products and can be safely deleted after a build. Notice that the two sub-directories adainclude and adalib are all that is needed for a GNAT runtime.

├── adainclude
│   ├──                  -- The Ada package (Pure, empty, just the root)
│   ├── a-inteio.adb             -- Ada.Integer_Text_IO (for Put function of an Integer)
│   ├──             -- Ada.Integer_Text_IO
│   ├── a-textio.adb             -- Ada.Text_IO (a few functions (Put, New_Line) for standard output)
│   ├──             -- Ada.Text_IO
│   ├──             -- For interaction with C
│   ├── last_chance_handler.adb  -- No exception handling, so a last_chance_handler is needed
│   ├──  --
│   ├── s-elaall.adb             -- Elaboration code, see file for explanation
│   ├──             --
│   ├── s-parame.adb             -- Parameters for the C interface, needed by the last_chance_handler
│   ├──             --
│   └──               -- The configuration of the runtime
├── adalib                       -- Output directory, will contain libgnat.a
│   └── README                   -- Placeholder
├── gnat_runtime.gpr             -- The gprbuild project file
├── obj                          -- Output directory for intermediate results
│   └── README                   -- Placeholder
├── README                       -- Very small explanation
└── restrict.adc                 -- Restrictions

The "" file is paramount in configuring any GNAT runtime library. This file is well documented in AdaCore's configurable runtime documentation. That document describes the AdaCore Pro code, it is also valid for the GPL version4. The file is also the file to put restrictions in, these restrictions will be valid for the runtime code and all code compiled with the runtime.

The "|adb" files are necessary as the default exception handling mechanism is not included in this runtime (and the "|adb" are needed for the last_chance_handler). The handler function needs to eat C strings, so some very ugly Ada code is needed here5.

I've written two very basic examples to test this runtime6, one to test the size of the generated executable and one to test if the error handling still works as expected. These examples can be found in the following patch:

The examples can be build with the project file in the examples subdirectory. The examples work with the default runtime and the zfp runtime. To build with the zfp runtime do gprbuild --RTS=../. The example can be build with a glibc based gnat (GNAT AdaCore 2016) and a musl based gnat (using the MUSL build instructions), a small table;

C Library GNAT Runtime Executable Size (Kbytes) Stripped Size (Kbytes) Comments
GNU default 1226 851  
GNU zfp 962 738
MUSL default 669 122
MUSL zfp 60 54
  1. A very basic C hello world program also compiles into an executable of around 1Mb. Running nm on the generated binary is unnerving, it seems like a large portion of the gnu library is included by default. I've compiled the gnu library from source with the dynamic NSS library support turned off, this does not give any improvement. I've edited the startup code in the gnu c library, also without improvement on this point7.
  2. This needed a patch in the build process, by default the gnat library will not build with function and data sections8. Without these sections the executable size was 1641 Kbytes. The AdaCode GNAT 2016 static library was build with function sections and without any debug information9.
  3. It may be possible to get this even smaller, with the right options and restrictions the GNAT specific initialize and finalize functions can be removed. Maybe editing gnatbind and then recompiling could also result in less bytes. Both seem to me to be too much effort for the expected gain.

As it turns out, if only a limited set of the Ada library is employed, the default GNAT runtime does not add that much extra code to an executable.

Next, the '"constraint/constraint.adb"' file. The code in this file is designed to trigger a constraint error. For every constraint error the code should end up in the last_change_handler function. This is easily tested by running the constraint binary10.

  1. A WIKI, these still exist!. And it can only be accessed over HTTPS. Why?? []
  2. AdaCore also publishes code to generate Zero Foot Print runtimes or Bare metal BSPs. I could not get this script to work for an linux system. The code seems to support it, but no. []
  3. An alternative runtime library needs to be build with the -gnatg flag. This flag has a number of effects, one of them is to handle all warnings as if these where errors. Another is to enable style checking warnings. If you do not want to follow the GNAT style, these checks can be undone again with -gnatyN. []
  4. The document describes Zero Cost Exceptions, these are zero cost in time overhead, not in space, the amount of code this adds is not small. Also, as it is gcc backend code, you get gcc code, probably the same used for C++ exception handling. For less complexity and better understanding of the code, do without the whole Ada exception handling mechanism. []
  5. It makes sense to write the last_change_handler function in C. Unfortunately making the project an Ada + C project will result in a broken library. Some flags are removed from the compilation phase. These can be put back by updating the project file but so far that did not help. []
  6. I can report that FFA chapters 1 to 3 also work with the runtime []
  7. Some more things discovered, it is not possible to build just a static library by default. Also the library needs to be build with optimization on or the build will fail []
  8. To determine if a library was compiled with "-ffunction-sections -fdata-sections", run objdump -t <libraryname>.a. Look at the section of a function, if it is called .text then it was not compiled with the flags, if it is called .text.<functionname> then it was compiled with the flags. []
  9. Compiling the GNAT library without debug information proved to be one of the harder parts. The "-g" flag is sprinkled in many of the Makefiles of gcc. As a temporary fix, the adalib was build separately from the compiler, this last part needs to be included in the MUSL build instructions []
  10. You might think of restriction pragma's as providing checks against the source code. These checks will limit the kind of constructs you are allowed to make in writing Ada code. This is true for a lot of the restrictions, but not for the 'No_Exceptions' restriction. The No_Exceptions restriction will turn off all constraint checking. See the Ada 2012 LRM []

From the proverbial Peanut Gallery

February 13th, 2018

I've been investing a lot of time1 into The Republic and it's products to date. And now it seems to be dead in the water.

Reading the logs, reading, researching the FFA code, all these activities have helped to eradicate some of the cockroaches in my mind. A big step, setting up this blog, has been a victory. Sloppy thinking simply presents itself once layed down in written sentences, if not immediately then after a re-read or after comments by Stanislav. Also, not all thoughts can be put to paper, how is that2? So all this time has not been wasted.

But with the most probable demise of the BingoBoingo ISP, the whole republic is on very shaky footing.

Last Friday (or Saturday morning), Mircea Popescu correctly pulled the plug. Of the various reactions to be expected, we got none. Well, one!, Stanislav wanted to maybe take the business, but luckily he retreated3 45.

In the mean time Mircea Popescu has fired off a number of put downs in BingoBoingos' direction, in what feels to me to be an attempt to get same BingoBoingo of his ass. Unfortunately only self lamenting comments have been coming back6. Let's assume you want to stay in Uruguay, probably under another lords directions. You would want to show off that you can get busy7. Start working on a few issues;

  • Make a list of what has been done, was is to be done and was is being done8.
  • The rack contains one server so test and publish speeds for upload and download, sustained and peak etc.
  • Go online and figure out the import tariffs, yes on somethings you may need to play ball with the local government9.
  • Make a list of the local costs / delivery time for some of the parts and publish this10.
  • Read up on tax law in Uruguay, especially VAT etc, make some post about to show your understanding.
  • And think about this, once back, was the 1 BTC a loan? how would you pay this off? Or will this be a write-off on Mircea Popescus' side?. Simply the cost of doing business, or something.

As for the lords, "and now?", currently there is a rack11, there is a willing boy on the ground12, there is a bank-account, there is an office. A company is almost to be had for a given amount of coin13. I had every faith in being a client of an ISP business that is run / financed in some form by Mircea Popescu14, as he is committed to go for the long run. So, I hope the Lords will get of their asses.

  1. Considering that I've been reading the logs for about 1.5 yr, and increasing my meager involvement in little steps over the same period, I would say about 6 months. This is but a drop compared to the the investments made by the lords. []
  2. This is from a comment by Mircea Popescu either on trilema or in the logs. This needs be linked, and will be, but later []
  3. From my understanding of the logs and his blog, Stanislav is an Engineer. Brilliant in most things he will set out to do. However in the little of successful businesses I've seen, the great leader, the starter, the one who makes it a success, has a completely different character. As the starter you'll have to (what seems to an engineer) over-estimate the proceeds and about right estimate the costs. The engineer by definition has a keen feel for the costs but not so much for the proceeds. Or differently, the costs is something you can calculate now to a good certainty, the proceeds not. Taking risks is probably another factor. Also, a business starter without the sense to involve one ore more Engineer(s) is stupid and will fail. I fall in the same category as Stanislav, only I'm less brilliant and not so well educated, so I do have to work in the mines (but I do my best / most interesting work with men that have some sort of vision). []
  4. It must be noted that Stanislav is still the only one of the lords whot tries to stick his head out. []
  5. One of the first comments Stanislav makes is that he does not know how to run a VPS server, somehow supposing this is going to be the main business. If I would like to run a VPS server, I would like to do so with an expert or someone smart enough to either hire one or simply not be providing a VPS server. Instead of a VPS server, I would gladly pay money to run on a shared gentoo server managed by Stanislav (with users from the wot only). The possibilities of having an ISP with people you trust! But we digress []
  6. Yes, own up to your failure, but after that, start moving []
  7. And not by taking night time pictures of the local orc carnival []
  8. This has value even if you fly back []
  9. did you know this orcistan is in a free-trade zone with four others?. Governments are even known to publish exact tariff tables etc []
  10. It may not be worth sourcing locally, but how to figure this out? []
  11. at least partially paid for []
  12. Who is in the wot and even a lord! []
  13. Negotiations are possible. Or maybe to lords are such a wise bunch that they can think to pressure Mircea Popescu by letting time run out and so devaluate the whole business []
  14. And who is to say he will not be willing to provide key insights to help make this adventure into a success []

On FFA Chapter 2

February 12th, 2018

To be specific: "Finite Field Arithmetic." Chapter 2: Logical and Bitwise Operations.

The first section contains a short (re-)introduction and a list of additional files to download. With these files, the code can be updated and build again. This section is easy to follow and finish.

Next a short exposition on the early feedback on chapter 1, with the conclusion that W_Swap is now removed.

A definition of four predicate functions on Word types is given first. These are all very minimal and well designed. I could construct an alternative version of W_NZeroP, but it was so much more ugly than the published version. My stance on removing all unused functions is quickly becoming unattainable, clearly a general purpose library is being build and not a RSA specific one1. A retreat is in order, from now on everything dubious will be removed. Where dubious is an arbitrarily defined quality.

Next up, are 5 basic functions on the FZ type.

As FZ_Clear "illustrates the use of Ada's array assignment", we'll need to investigate it's consequences. We do not have to worry that this statement can be dependent on the array contents. We do have to worry about optimizations the compiler might perform (the famous memset removal in gcc). Let's try to create a small example with the FZ_Clear procedure2. First, a patch on the chapter 2 code:

The inline_always pragma has been moved from the 'fz_basics.adb' to '' in this patch.

The somewhat interesting code is in 'clear_mod.adb'.

with Ada.Text_IO; use Ada.Text_IO;

-- From FFA:
with Words;    use Words;
with FZ_Type;  use FZ_Type;
with FZ_Arith; use FZ_Arith;

-- FFA Ch. 2
with FZ_Basic; use FZ_Basic;
with FFA_IO;   use FFA_IO;

package body Clear_Mod is

   procedure Clear_Procedure is
      Z : FZ (0 .. 128);
      FZ_Set_Head (Z, 20);
      FZ_Clear (Z);
      Put_Line ("--- cleared: ---");
      Dump (Z);
      FZ_Set_Head (Z, 22);
      Z (2)   := 16#DEAD#;
      Z (3)   := 16#FAAF#;
      Z (4)   := 16#1001#;
      Z (100) := 16#0110#;
      Z (128) := 16#F1F1#;
      Dump (Z);
      FZ_Clear (Z);
   end Clear_Procedure;

   procedure Print_Procedure is
      Z : FZ (0 .. 256);
      Y : FZ (0 .. 132);
      Dump (Y);
      Put_Line ("--- <<<< ---");
   end Print_Procedure;

end Clear_Mod;

In clear_procedure we change a FZ, it is cleared, it's state is dumped on the terminal, some interesting words are put in the FZ and finally the FZ is cleared again. This last action is just to be sure that no information will be left on the stack after this procedure finishes. Without inlining, this function works perfectly3. With inlining, the code is different, the compiler will determine that the final clear is useless and remove it.

c/ffa/fzclear/bin/clear:     file format elf64-x86-64

Disassembly of section .text:

0000000000401530 <clear_mod__clear_procedure>:
  401530:       55                      push   %rbp
  401531:       48 89 e5                mov    %rsp,%rbp
  401534:       53                      push   %rbx
  401535:       48 81 ec 38 14 00 00    sub    $0x1438,%rsp
  40153c:       48 83 0c 24 00          orq    $0x0,(%rsp)
  401541:       48 81 c4 20 10 00 00    add    $0x1020,%rsp
  401548:       ba 14 00 00 00          mov    $0x14,%edx
  40154d:       be 30 c8 49 00          mov    $0x49c830,%esi
  401552:       48 8d 9d e0 fb ff ff    lea    -0x420(%rbp),%rbx
  401559:       48 89 df                mov    %rbx,%rdi
  40155c:       e8 ef 00 00 00          callq  401650 <fz_basic__fz_set_head>
  401561:       31 c0                   xor    %eax,%eax
  401563:       48 89 df                mov    %rbx,%rdi
  401566:       b9 81 00 00 00          mov    $0x81,%ecx
  40156b:       f3 48 ab                rep stos %rax,%es:(%rdi)
  40156e:       be 38 c8 49 00          mov    $0x49c838,%esi
  401573:       bf 20 c8 49 00          mov    $0x49c820,%edi
  401578:       e8 f3 2e 00 00          callq  404470 <ada__text_io__put_line__2>
  40157d:       48 89 df                mov    %rbx,%rdi
  401580:       be 30 c8 49 00          mov    $0x49c830,%esi
  401585:       e8 d6 fd ff ff          callq  401360 <ffa_io__dump__2>
  40158a:       bf 01 00 00 00          mov    $0x1,%edi
  40158f:       e8 bc 2a 00 00          callq  404050 <ada__text_io__new_line__2>
  401594:       ba 16 00 00 00          mov    $0x16,%edx
  401599:       48 89 df                mov    %rbx,%rdi
  40159c:       be 30 c8 49 00          mov    $0x49c830,%esi
  4015a1:       e8 aa 00 00 00          callq  401650 <fz_basic__fz_set_head>
  4015a6:       48 89 df                mov    %rbx,%rdi
  4015a9:       be 30 c8 49 00          mov    $0x49c830,%esi
  4015ae:       48 c7 85 f0 fb ff ff    movq   $0xdead,-0x410(%rbp)
  4015b5:       ad de 00 00
  4015b9:       48 c7 85 f8 fb ff ff    movq   $0xfaaf,-0x408(%rbp)
  4015c0:       af fa 00 00
  4015c4:       48 c7 85 00 fc ff ff    movq   $0x1001,-0x400(%rbp)
  4015cb:       01 10 00 00
  4015cf:       48 c7 85 00 ff ff ff    movq   $0x110,-0x100(%rbp)
  4015d6:       10 01 00 00
  4015da:       48 c7 45 e0 f1 f1 00    movq   $0xf1f1,-0x20(%rbp)
  4015e1:       00
  4015e2:       e8 79 fd ff ff          callq  401360 <ffa_io__dump__2>
  4015e7:       bf 01 00 00 00          mov    $0x1,%edi
  4015ec:       e8 5f 2a 00 00          callq  404050 <ada__text_io__new_line__2>
  4015f1:       48 81 c4 18 04 00 00    add    $0x418,%rsp
  4015f8:       5b                      pop    %rbx
  4015f9:       5d                      pop    %rbp
  4015fa:       c3                      retq
  4015fb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Another fortunate side-effect of reading the FFA chapters, is that I now have a good opportunity to learn x86 assembly language4. The final FZ_Clear implementation is removed from the procedure, so it follows that the values should be still on the stack and it may be possible to read these. And that is why the module also contains Print_Procedure, with some padding FZ and a FZ to capture the stack. On my local gnat installation (the gnat 2016 binary from adacore), the dump of Y shows the contents of the local variable Z from Print_Procedure. This is probably not what you would expect, the best option to me seems to make sure that FZ_Clear is never inlined. Unfortunately, I've not found a pragma Never_Inline.

With FZ_Get_Head and FZ_Set_Head we get some confusion of terminology, suddenly we are talking about a head or first or youngest word of an FZ. And it becomes clearer that the words in an FZ have an order and we are missing an exact definition of FZ outside of the code itself. Here is one attempt to do so, an FZ is a number5 between 0 and 2 to a given power. This power is determined by the number of words in a FZ multiplied by the number of bits in a word (the bitness). An FZ can be seen as the summation over the array of words that make up the FZ where each word is multiplied by 2 to the power of the bitness times the position of that word in the array. For the math to be a correct, we define the first word in the array to have position 06. As for a visual representation, when the words (and the bytes inside the words) are written from left to right, the tail or last or oldest word is written first and the head or first or youngest is written last. Although the FZ_Get_Head and FZ_Set_Head procedures triggered a nice train of thought, I would remove these.

The best is left for last in this module, FZ_Mux is indeed crucial. It's implementation in terms of W_Mux is trivial, so if you understand W_Mux all is well. I found one alternative for this function, but it is wasteful compared to this one. Image if you define a 2d array with as many rows as there are words in X and each row having two columns. Put each word in X in the first column of the corresponding row in the 2d array. Put each word in Y in the second column. Now the result can be formed by iterating over the array and accessing the element from the right column. Unfortunately, this hypothetical code based on a 2d array, accesses memory ever so slightly differently based on the Sel bit, and thus might leak information.

Next up, two unary predicate operators on the FZ type are introduced. The FP_ZeroP procedure can be used to determine if every word in the FZ has the value 0. The FP_OddP can be used to check if the last bit of the head word is set, and here FZ_Get_Head is not used in the implementation.

Three binary predicate operators are defined in fz_cmd.adb. The FZ_EqP procedure follows the pattern first seen in FP_ZeroP. The FZ_LessThanP and FZ_GreaterThanP are implemented in terms of the subtraction operator. The result is determined by the borrow. Now, you just have to determine if the order of the arguments for the FZ_Sub call is correct.

With a set of 7 bitwise operators, the introduction of the ffa library in chapter 2 is finished. In, binary operators are defined for two FZ operands, and separately for one FZ with one Word operand. These last versions work on the head word of an FZ, and we'll just have to see if there is any use for these procedures. Another remark, although these procedures could have been implemented in terms of the FZ_Get_Head / FZ_Set_Head procedures, again they have not. A final remark on the same procedures is that these modify the first input FZ operand where the full FZ procedures fill a third operand7.

We skip the demo and finish for the day, on to the next chapter!.

  1. This disclaimer was stated in channel and on the pages, so only my stupidity can be blamed []
  2. This is where I found a problem with the inlining on my GNAT installation. After resolving this we can continue. []
  3. If you turn off the inlining, you will see that the array assignment will be implemented as a memset []
  4. The first FZ_Clear call is inlined and implemented on the rep stos line. The xor %eax, %eax sets the 32bits %eax register to 0, which by extension also set the overlapping 64bits %rax to zero. The value in the first argument to the rep stos is therefore 0. []
  5. For a good book on numbers, read Frege []
  6. This in contrast to most of the actual ada code where the array indexing starts at 1. []
  7. I do not know the use for these functions. If we see a word as a FZ type of any length with that word as the head and all other words set to zero, then the implementation of FZ_And_W and FZ_Xor_W is wrong. []

MP-WP, a genesis

February 8th, 2018

This site runs on a found version of mp-wp. To get it to work on a php 5.6 installation, I needed to make some changes. These were done in an ad-hoc fashion and, as a result of the sometimes live fixing process, I had no log or
record of what I did. This cannot stand, so I redid most of the word starting
by making a genesis of the mp-wp, I found in the channel1.

As a vpatch cannot contain binaries or special character sequences, I removed all the binaries (gif, png, jpg and swf) and put these in tarfiles. The sha512sum of the tarfiles I put in a README and in the tarfiles.sha512sum.
All removals and edits done to the version I found and before the patch was made are described in the README file.

This genesis does not contain images, so these are distributed separately in

After pressing the genesis, go into the blog directory. Copy the tar-files to the blog directory. Check the sha 512 sums. In the blog directory unpack each tar-file (tar xf wp-admin-images.tar.gz etc). Done.

On this genesis, I currently have two patches, one to suppress and fix some warnings;

And one to fix the comments;

There is still a whole lot of pruning on the todo list, with the first item being to investigate and hopefully remove the 'kses' filtering system.

  1. I know that work was / is being done on a genesis vpatch of mp-wp, but that seems to be stuck for now. []

GNAT and Pragma Inline_Always: A report

February 5th, 2018

Will the GNAT ADA compiler inline a function when using Pragma Inline_Always? and proves the dump of a function a red herring? In short, yes it will and no it doesn't. The Pragma Inline_Always needs to be put in the specification files ('*.ads') and not in the implementation files ('*.adb'). The dump of a
function in a binary can accurately show if inlining was performed or not.

To run the code, use the following v-patch on the code in FFA chapter 11;

We start with the preliminaries, as for the gnat version (gprbuild -version):

GPRBUILD GPL 2016 (20160515) (x86_64-pc-linux-gnu)
Copyright (C) 2004-2016, AdaCore
This is free software; see the source for copying conditions.

In FFA almost all functions have the pragma Inline_Always adornment. According to the gcc 4.9.4 documentation: Similar to pragma Inline except that inlining is not subject to the use of option -gnatn or -gnatN and the inlining happens regardless of whether these options are used.

All of the code is based on a copy of the ffademo files provided in the FFA chapter 1 genesis.

I base the tests on the FZ_Add procedure and use it to implement the slowest possible multiplication function. This multiplication function is specified to multiply a FZ by a natural number. To multiply we will add the input FZ to a tally as many times as given by the natural number.

   procedure Mul_By_Sum
     (N        : in     Natural;
      X        : in     FZ;
      Z        :    out FZ;
      Overflow :    out WBool)

      C : WBool := 0;
      Z := (others => 0);
      for i in 1 .. N loop
         FZ_Add (X, Z, Z, C);
      end loop;
      Overflow := C;
   end Mul_By_Sum;

   procedure Inline_Experiment is
      X : FZ (1 .. 256) := (others => 0);
      Z : FZ (1 .. 256) := (others => 0);

      -- Carry.
      C : WBool := 0;

      N : Indices := 256;
      for i in 1 .. N loop
         X (i) := Word (i);
      end loop;

      Mul_By_Sum (10000000, X, Z, C);

      Dump (Z);

   end Inline_Experiment;

The experiment is divided over 3 programs, each with their own set of files;

  • inline_exp1_main; the FZ_Add and W_Carry procedures from libffa are used as is.
  • inline_exp2_main; the FZ_Add and W_Carry procedures are copied and included in the body of the module. For clarity we add a '2' to the procedure names.
  • inline_exp3_main: the FZ_Add and W_Carry functions are copied to a separate module and pragma Inline_Always is included in the specification file. For clarity we add a '3' to the procedure names.

Read the code in ffainline (except for the mul_by_sum these are all either calls to ffalib or copies of the code from ffalib), build and run everything with:

cd ffa/ffainline


time ./bin/inline_exp1_main > exp1.txt

time ./bin/inline_exp2_main > exp2.txt

time ./bin/inline_exp3_main > exp3.txt

diff exp0.txt exp1.txt

diff exp0.txt exp2.txt

diff exp0.txt exp3.txt

The diffs should be empty, and the timings should differ. A table with
my timings;

Experiment Time
1 8.579
2 4.924
3 4.871

The second and third versions are a little more than 40% faster than the first version. The only difference between the codes is where the procedures are defined, so the timing difference is all due to inlining. Let's check by disassembling the code with objdump. As everything will be disassembled by objdump, we need a small script to return the just code for one function;


FUNCTIONLINE=`objdump -t $1  |  sort | nl | grep $2`
LINENR=`echo $FUNCTIONLINE | awk '{print $1}'`
NEXTLINE=`objdump -t $1  | sort | nl -s ' @ ' | grep $NEXTLINENR @ `
START=`echo $FUNCTIONLINE | awk '{print $2}'`
STOP=`echo $NEXTLINE | awk '{print $3}'`
objdump --start-address=0x$START --stop-address=0x$STOP -S -d $1

The dump for the first version of mul by sum2.

./ inline_exp1_main inline_exp1__mul_by_sum

inline_exp1_main:     file format elf64-x86-64

Disassembly of section .text:

0000000000401430 <inline_exp1__mul_by_sum>:
  401430:       55                      push   %rbp
  401431:       48 89 e5                mov    %rsp,%rbp
  401434:       41 57                   push   %r15
  401436:       41 56                   push   %r14
  401438:       41 55                   push   %r13
  40143a:       41 54                   push   %r12
  40143c:       53                      push   %rbx
  40143d:       48 81 ec 38 10 00 00    sub    $0x1038,%rsp
  401444:       48 83 0c 24 00          orq    $0x0,(%rsp)
  401449:       48 81 c4 20 10 00 00    add    $0x1020,%rsp
  401450:       49 63 40 04             movslq 0x4(%r8),%rax
  401454:       49 89 cc                mov    %rcx,%r12
  401457:       49 63 08                movslq (%r8),%rcx
  40145a:       49 89 d7                mov    %rdx,%r15
  40145d:       31 d2                   xor    %edx,%edx
  40145f:       41 89 fd                mov    %edi,%r13d
  401462:       48 89 75 c8             mov    %rsi,-0x38(%rbp)
  401466:       4c 89 c3                mov    %r8,%rbx
  401469:       39 c1                   cmp    %eax,%ecx
  40146b:       7f 1c                   jg     401489 <inline_exp1__mul_by_sum+0x59>
  40146d:       48 29 c8                sub    %rcx,%rax
  401470:       48 8d 14 c5 08 00 00    lea    0x8(,%rax,8),%rdx
  401477:       00
  401478:       48 b8 00 00 00 00 04    movabs $0x400000000,%rax
  40147f:       00 00 00
  401482:       48 39 c2                cmp    %rax,%rdx
  401485:       48 0f 47 d0             cmova  %rax,%rdx
  401489:       31 f6                   xor    %esi,%esi
  40148b:       4c 89 e7                mov    %r12,%rdi
  40148e:       e8 bd 80 03 00          callq  439550 <memset>
  401493:       45 85 ed                test   %r13d,%r13d
  401496:       7e 38                   jle    4014d0 <inline_exp1__mul_by_sum+0xa0>
  401498:       45 31 f6                xor    %r14d,%r14d
  40149b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  4014a0:       48 8b 7d c8             mov    -0x38(%rbp),%rdi
  4014a4:       41 83 c6 01             add    $0x1,%r14d
  4014a8:       4d 89 e0                mov    %r12,%r8
  4014ab:       49 89 d9                mov    %rbx,%r9
  4014ae:       4c 89 e2                mov    %r12,%rdx
  4014b1:       48 89 d9                mov    %rbx,%rcx
  4014b4:       4c 89 fe                mov    %r15,%rsi
  4014b7:       e8 c4 00 00 00          callq  401580 <fz_arith__fz_add>
  4014bc:       45 39 f5                cmp    %r14d,%r13d
  4014bf:       75 df                   jne    4014a0 <inline_exp1__mul_by_sum+0x70>
  4014c1:       48 83 c4 18             add    $0x18,%rsp
  4014c5:       5b                      pop    %rbx
  4014c6:       41 5c                   pop    %r12
  4014c8:       41 5d                   pop    %r13
  4014ca:       41 5e                   pop    %r14
  4014cc:       41 5f                   pop    %r15
  4014ce:       5d                      pop    %rbp
  4014cf:       c3                      retq
  4014d0:       31 c0                   xor    %eax,%eax
  4014d2:       48 83 c4 18             add    $0x18,%rsp
  4014d6:       5b                      pop    %rbx
  4014d7:       41 5c                   pop    %r12
  4014d9:       41 5d                   pop    %r13
  4014db:       41 5e                   pop    %r14
  4014dd:       41 5f                   pop    %r15
  4014df:       5d                      pop    %rbp
  4014e0:       c3                      retq
  4014e1:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4014e8:       00 00 00
  4014eb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Can you see the callq 401580 <fz_arith__fz_add> in there? clearly the FZ_Add was not inlined. Now for the second version;

./ inline_exp2_main inline_exp2__mul_by_sum

inline_exp2_main:     file format elf64-x86-64

Disassembly of section .text:

0000000000401420 <inline_exp2__mul_by_sum>:
  401420:       55                      push   %rbp
  401421:       48 89 e5                mov    %rsp,%rbp
  401424:       41 57                   push   %r15
  401426:       41 56                   push   %r14
  401428:       41 55                   push   %r13
  40142a:       41 54                   push   %r12
  40142c:       53                      push   %rbx
  40142d:       48 81 ec 38 10 00 00    sub    $0x1038,%rsp
  401434:       48 83 0c 24 00          orq    $0x0,(%rsp)
  401439:       48 81 c4 20 10 00 00    add    $0x1020,%rsp
  401440:       49 63 40 04             movslq 0x4(%r8),%rax
  401444:       49 89 cf                mov    %rcx,%r15
  401447:       49 63 08                movslq (%r8),%rcx
  40144a:       49 89 d6                mov    %rdx,%r14
  40144d:       31 d2                   xor    %edx,%edx
  40144f:       89 fb                   mov    %edi,%ebx
  401451:       49 89 f4                mov    %rsi,%r12
  401454:       4d 89 c5                mov    %r8,%r13
  401457:       39 c1                   cmp    %eax,%ecx
  401459:       7f 1c                   jg     401477 <inline_exp2__mul_by_sum+0x57>
  40145b:       48 29 c8                sub    %rcx,%rax
  40145e:       48 8d 14 c5 08 00 00    lea    0x8(,%rax,8),%rdx
  401465:       00
  401466:       48 b8 00 00 00 00 04    movabs $0x400000000,%rax
  40146d:       00 00 00
  401470:       48 39 c2                cmp    %rax,%rdx
  401473:       48 0f 47 d0             cmova  %rax,%rdx
  401477:       31 f6                   xor    %esi,%esi
  401479:       4c 89 ff                mov    %r15,%rdi
  40147c:       e8 af 7e 03 00          callq  439330 <memset>
  401481:       85 db                   test   %ebx,%ebx
  401483:       0f 8e db 00 00 00       jle    401564 <inline_exp2__mul_by_sum+0x144>
  401489:       49 63 06                movslq (%r14),%rax
  40148c:       45 8b 55 00             mov    0x0(%r13),%r10d
  401490:       45 8b 5d 04             mov    0x4(%r13),%r11d
  401494:       45 8b 76 04             mov    0x4(%r14),%r14d
  401498:       49 89 c5                mov    %rax,%r13
  40149b:       48 89 45 c0             mov    %rax,-0x40(%rbp)
  40149f:       49 63 c2                movslq %r10d,%rax
  4014a2:       4c 89 ef                mov    %r13,%rdi
  4014a5:       48 29 c7                sub    %rax,%rdi
  4014a8:       49 8d 04 ff             lea    (%r15,%rdi,8),%rax
  4014ac:       45 31 ff                xor    %r15d,%r15d
  4014af:       48 89 45 c8             mov    %rax,-0x38(%rbp)
  4014b3:       4c 89 e8                mov    %r13,%rax
  4014b6:       48 f7 d8                neg    %rax
  4014b9:       49 8d 3c c4             lea    (%r12,%rax,8),%rdi
  4014bd:       4d 63 e6                movslq %r14d,%r12
  4014c0:       41 83 c7 01             add    $0x1,%r15d
  4014c4:       45 39 ee                cmp    %r13d,%r14d
  4014c7:       0f 8c 93 00 00 00       jl     401560 <inline_exp2__mul_by_sum+0x140>
  4014cd:       48 8b 45 c0             mov    -0x40(%rbp),%rax
  4014d1:       4c 8b 4d c8             mov    -0x38(%rbp),%r9
  4014d5:       31 f6                   xor    %esi,%esi
  4014d7:       4c 8d 40 ff             lea    -0x1(%rax),%r8
  4014db:       48 89 f0                mov    %rsi,%rax
  4014de:       66 90                   xchg   %ax,%ax
  4014e0:       49 83 c0 01             add    $0x1,%r8
  4014e4:       45 39 c3                cmp    %r8d,%r11d
  4014e7:       4a 8b 0c c7             mov    (%rdi,%r8,8),%rcx
  4014eb:       7c 53                   jl     401540 <inline_exp2__mul_by_sum+0x120>
  4014ed:       45 39 c2                cmp    %r8d,%r10d
  4014f0:       7f 4e                   jg     401540 <inline_exp2__mul_by_sum+0x120>
  4014f2:       49 8b 11                mov    (%r9),%rdx
  4014f5:       48 01 c8                add    %rcx,%rax
  4014f8:       49 83 c1 08             add    $0x8,%r9
  4014fc:       48 01 d0                add    %rdx,%rax
  4014ff:       48 89 d6                mov    %rdx,%rsi
  401502:       48 21 ca                and    %rcx,%rdx
  401505:       49 89 41 f8             mov    %rax,-0x8(%r9)
  401509:       48 09 ce                or     %rcx,%rsi
  40150c:       48 f7 d0                not    %rax
  40150f:       48 21 f0                and    %rsi,%rax
  401512:       48 09 d0                or     %rdx,%rax
  401515:       48 c1 e8 3f             shr    $0x3f,%rax
  401519:       4d 39 e0                cmp    %r12,%r8
  40151c:       75 c2                   jne    4014e0 <inline_exp2__mul_by_sum+0xc0>
  40151e:       48 89 c6                mov    %rax,%rsi
  401521:       44 39 fb                cmp    %r15d,%ebx
  401524:       75 9a                   jne    4014c0 <inline_exp2__mul_by_sum+0xa0>
  401526:       48 89 f0                mov    %rsi,%rax
  401529:       48 83 c4 18             add    $0x18,%rsp
  40152d:       5b                      pop    %rbx
  40152e:       41 5c                   pop    %r12
  401530:       41 5d                   pop    %r13
  401532:       41 5e                   pop    %r14
  401534:       41 5f                   pop    %r15
  401536:       5d                      pop    %rbp
  401537:       c3                      retq
  401538:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40153f:       00
  401540:       45 39 d5                cmp    %r10d,%r13d
  401543:       7c 05                   jl     40154a <inline_exp2__mul_by_sum+0x12a>
  401545:       45 39 de                cmp    %r11d,%r14d
  401548:       7e a8                   jle    4014f2 <inline_exp2__mul_by_sum+0xd2>
  40154a:       be 2f 00 00 00          mov    $0x2f,%esi
  40154f:       bf 80 c4 49 00          mov    $0x49c480,%edi
  401554:       31 c0                   xor    %eax,%eax
  401556:       e8 e1 0e 00 00          callq  40243c <__gnat_rcheck_CE_Index_Check>
  40155b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  401560:       31 f6                   xor    %esi,%esi
  401562:       eb bd                   jmp    401521 <inline_exp2__mul_by_sum+0x101>
  401564:       31 c0                   xor    %eax,%eax
  401566:       48 83 c4 18             add    $0x18,%rsp
  40156a:       5b                      pop    %rbx
  40156b:       41 5c                   pop    %r12
  40156d:       41 5d                   pop    %r13
  40156f:       41 5e                   pop    %r14
  401571:       41 5f                   pop    %r15
  401573:       5d                      pop    %rbp
  401574:       c3                      retq
  401575:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40157c:       00 00 00
  40157f:       90                      nop

In this version the call is gone, and the instructions that can be expected from FZ_Add and W_Carry are included. So inlining is happening and can be checked with objdump.

In the logs an assertion was made that gnat/gcc would include an inlined function for reference in the produced binary. This is not so, setting the "-ffunction-sections" flag for the compiler and combining this with the "--gc-sections" for the linker will remove all sections for symbols that are not referenced3.

  1. This an demonstration patch, purely to investigate the inline options, not for production purposes []
  2. an ada symbol name can be easily guessed; all lower case module name, two _, all lower case procedure / function name []
  3. easy to check with nm on the binaries, no fz_add in experiment 2 or 3 []

On FFA Chapter 1

January 26th, 2018

To be specific:
"Finite Field Arithmetic." Chapter 1: Genesis.

Chapter 1 starts with a background on the FFA code and a list of the files you need to start out with. It contains a link to FSF GNAT, but a gnat like this will not work as either the ADA part is broken (gcc < 5) or the gcc itself is tampered with (gcc > 5). You will need a GNAT from adacore1.

Next comes the unpacking and building description, following this part is straightforward.

For building the code, the gprbuild tool is used. This is a typical alternative build system, this one has superior support for ADA. I do not recommend it for anything else. If you do not trust this tool, it is not hard to write a makefile or a simple bash script based on the few command line parameters given in this script. Of the flags used, I still have one -fdump-scos2 that seems unnecessary, apparently it creates coverage information.

After the description of the build we get this line;

Do not run the demo yet. (you read programs before running, don’t you?)

No, I do not!. I have not read gcc or this previously unknown 'gprbuild'. And those where unsigned!

The code starts with a file that did not readily fit in my head; restrict.adc. The only way to get to grips with this file was reading the gnat documentation and the ada reference manual. I made a list of all restrictions according to the GNAT documentation and then checked which where used. To summarise; everything that touches any tasking (ada term for multi-threading) is off limits, everything that does dynamic allocation is off limits, everything that depends on a clock is off limits (delay and Calendar), plus some more to make the code as static as possible. A lot of these pragmas are specific for GNAT. One, I could not find documentation for: "No_Fixed_IO". This is the list of restrictions available in GNAT but not used by FFA3.

No_Anonymous_Allocators         Max_Storage_At_Blocking
Max_Entry_Queue_Length          No_Access_Subprograms
No_Dependence                   No_Direct_Boolean_Operators
No_Exception_Handlers           No_Exceptions
No_Fixed_Point                  No_IO
No_Local_Allocators             No_Long_Long_Integers
No_Recursion                    No_Reentrancy
No_Specification_of_Aspect      No_Use_Of_Entity
No_Obsolescent_Features         SPARK_05

The implementation is split up over different ADA modules, you can read the code on the site but make sure to study the copy created in the V-press.

The first module is defined in '', it is specified to be a Pure module, meaning the package will not have an internal state4. In '', we get to define the size of a word expressed as a number of bits per word, this is called MachineBitness. And a ByteBits, which is needed for communicating the ffa words as this will be on a per byte basis. Unfortunately, it also defines a MachineBitnessLog2, to be explained in another chapter5.

The next module is a fun paragraph, try to parse it

.. of FFA, the machine Word. A Word, from this point on, is simply a single machine word on your particular machine.

Or a little bit less self-referential; Word is the smallest unit used for the arithmetic in FFA, values of type Word can range from 0 (inclusive) up to 2 to power of the Bitness (exclusive). Remarkably, 'WBit_Index' is also defined here and maybe the bit is the smallest unit in FFA6

I could not determine the reason of this line; for Word'Size use Bitness;. The standard, does not help7.

The '' is a module to include the shifts, ask yourself, why not use the pragma Provide_Shift_Operators?

The '' file is an irritating Ada necessity, all important information is in 'word_ops.adb'. This last file must be inspected in depth. The carry and borrow functions are brilliant inventions so pay close attention, some extra comments and hints are provided (although the "To derive the..." lines are not very useful, you should be able to get this information from reading the code.). Working these two functions out with a 2-bit word on a paper will help. Also, can you think of another constant time implementation for W_Carry or W_Borrow?

The W_Mux function will be used for all selections and it needs a lot of attention too. Ask yourself; "If WBool is defined as subtype WBool is Word range 0 .. 1;, what is the result of (Sel-1) if Sel is 0?", plus "In how many ways can Mux be defined?"

W_Mux and W_Swap are both unused in this chapter.

The module ''; "Gaze upon the combination of abstract mathematical concepts and low level bit fiddling, all in one small library!". Ask yourself; "Do we need Word_Index or Indices?8".

In '', the Precondition pragma is used twice. An array Words with unspecified length is denoted by FZ, addition and subtraction are defined for values of FZ of equal length, hence preconditions are needed to check the lengths for all instantiations of these functions.

The code in 'fz_arith.adb', is not very complex but does close need attention9. Look at the summation, the sum is (A+B) + Carry, but the next carry is determined by just A, B and the Sum. Are we lucky this is correct?. The whole comment starting with The alert reader might ask why the iteration schemes differ... could have been avoided by making the FZ_Add iteration scheme follow the FZ_Sub scheme. This would increase the usefulness of FZ_Add.

I will skip commenting on the input/output and demo

The comments section is interesting, especially seeing how the only non-used function (W_Swap) gets all the attention

  1. specifically 2015 or 2016, not earlier and also not later. []
  2. see GNAT User's Guide []
  3. Why not use No_Fixed_Point or No_Reentrancy? []
  4. Ada has a complex module system all kinds of silly rules. []
  5. I recommend to remove this, when you implement this promised chapter it will then quickly fail and you can see if it is explained. []
  6. This is another unexplained constant, probably "to be used later". So a good algorithm for understanding the FFA code would be to remove every statement / definition that is not used in the same defining chapter, to be added if and when needed. []
  7. First let's read the "Static Semantics", it mentions 2 situation where this matters, one is about packed records, I would set the size for a Word in packed records on a case by case basis. The other situation is for "untyped conversions" and these should never happen in FFA code. For fun, let's read "Implementation Advice", this is typical Ada language reference manual language. []
  8. This is a serious question, as each new type/subtype definition introduces an extra concept to be dealt with an understood. []
  9. For example; from the fact that this code compiles, we can conclude that Ada is case insensitive. []