UDP - No C

November 13th, 2018

Some time ago, I promised to replace the C code in the udp library of Stanislav (on loper-os.org) with assembly code.
I've finally done so for 64bit intel linux.

This release is divided in two parts, (a) to replace the string to ip address functions with pure ada functions and (b) to replace the C calls with assembly equivalents. For other platforms (b) will need to be changed and the tree forked at that point.

The replacement of the string to ip address functions only works for the most common way to write ip-addresses12.

Next, the actual replacement of the C functions. This adds an extra module to provide Ada versions of some of the linux syscalls. Note that some of the code is a bit non-Ada, I wanted to keep the interfaces to the original C functions intact.

Finally, my signatures for the earlier patches.

  1. As 4 decimal numbers, each separated by a dot; "127.0.0.1". Each decimal number may range from 0 to 255 []
  2. The C-library version of these functions also accept hexadecimal and octal numbers and can handle space characters at the start of each number []

Building GNAT on MUSL, no more /usr/include/x86_64-linux-gnu

September 24th, 2018

An update on the previous version.

The produced gcc compiler builds static executables and no dynamically linked executables.

The compiler produced with the previous releases worked with several distributions but mysteriously failed for some. It seemed that the directory /usr/include/x86_64-linux-gnu was added by some developer to the include path on systems that support this directory. The files under that directory are specific for the GNU C Library and fail when included in MUSL C based builds. Of course, you wonder why this directory is always included and it turns out this is part of the default specfile for gcc1.

Before removing the line from the configuration, I wanted to know the history and possible usefulness of this item. The line can be found in the gcc/config/i386/gnu-user64.h file. My first step was the gcc git repository, this configuration item was not in the current source or in any previous version of gcc. Next up was the AdaCore release and it did include the item. Could it be that this was copied from any distribution? debian, gentoo and redhat all do not include a patch for gcc with this item. In short it's a specific line added by a developer at or for AdaCore. If I would take a guess at the usefulness of this item, I would propose this scheme; The AdaCore compilers can live in any directory and may be used to build code that contains system specific files, the compiler has tricks to find these files but always in relation to the path of the compiler. The binary compiler package does not include those files (and as these have a very close relation to the version of the GNU C library on the system, cannot contain those files and always work). To make gcc find the right path by default, this solution was found and implemented. This guess plus the problem that a MUSL C based gcc compiler cannot use the files in gcc/config/i386/gnu-user64.h plus the observation that the default gcc source-code does not include the line warrants removal of the line2.

Still undetermined, why does compilation sometimes work with the previous version of the code? Some systems do not include a /usr/include/x86_64-linux-gnu directory, but others do and still the compilation does not fail. I'll have to install more distributions to figure this one out, or if you have such a system, could you compile something with: gcc -v -Wmissing-include-dirs and report on the output?

For detailed instructions in how to run the script see the readme-2018-09-24.txt.

  1. In the past year, I've bumped against this specific configuration item before and I even changed the path for an AdaCore gcc installation. I was lazy and stupid and did not look into it any further. []
  2. To see the result of the algorithm that gcc uses for the compile path do: gcc -v -Wmissing-include-dirs []

GNAT Zero Foot Print - Take 5 - Assert and Aggregates

September 17th, 2018

Unfortunately, I've added a more files to the ZFP runtime. These files are all needed to support the full Ada syntax;

Assert
The mechanism behind the Assert pragma depends on the Ada.Assertions module (implemented in the files adainclude/a-assert.adb and adainclude/a-assert.adb), see also the LRM. This module was added but no visible effect was found when compiling an Ada module with an assert pragma. The GNAT compiler instead uses the Raise_Assert_Failure procedure (in file adainclude/s-assert.adb).
Aggregates
Some operations on arrays will apply for every element of an array. For example clearing an array with something like A := (others => 0);. The operations are called aggregate and depend on memcpy and memcmp. These functions are not available when compiling without any C library. The GNAT source contains Ada implementations for both and I've included memcmp function (in file adainclude/s-memcom.adb). Such a version is portable but not really optimal, so for memcpy I've included the code from the MUSL C library. So far I've only used memcmp, so more testing is needed.

To use this code, download the patch and signature;

And then press and play.

Annotated Assembly Code for a Boot Record

August 16th, 2018

Below, my notes to help me understand the boot code published here; http://btcbase.org/log/2018-07-06#1832315.. The boot loader is the first code run after the BIOS (512 bytes long, and loaded by the BIOS) and it in turn will load the rest OS / application, switch to 64bit mode and start to execute that code.

1
2
3
4
5
6
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Boot Loader - QEMU Variant
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
        payload_blocks   equ 14        ;; N * 512b blocks to load
        stack_top        equ 0x90000   ;; top of stack
	kernel_offset    equ 0x1000    ;; bottom of kernel

A number of constants is defined, the assembler will replace all occurrences of these names with the values after equ.

9
    	[BITS 16]

All lines after the '[BITS 16]' statement will be compiled for 16-bit intel. The boot process always starts with the processor in "real mode", in this mode all code is supposed to follow the 8086, 16 bit command set.

10
        section .text

Code and data can be compiled into sections, the boot program will be contained in a single section which is labelled with ".text".

11
	jmp     init

First line of actual code, a jump instruction to the body of the code. Between the jmp and the body, some data and utility functions can be defined.

12
13
14
15
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
gdtr:
    	dw	gdt_end - gdt - 1 ; GDT limit
	dw	gdt		  ; GDT base

A definition for a Global Descriptor Table. This particular definition is for an empty table with just one entry. This GDT will not be used and can be removed from the file. A GDT is a simple vector of 64-bit (8 byte) elements. A register will contain the length of the table and a pointer to the table, first 2 bytes encode the length (in bytes, not in elements), second 2 bytes the position. The length in bytes must be decreased by 1.

17
18
19
20
21
22
23
gdt:	times 8 db 0		; null descriptor
gdt_end:
	gdt64		dq 0x0000000000000000
	.code 		equ $ - gdt64
	dq 		0x0020980000000000
	.data 		equ $ - gdt64
	dq 		0x0000900000000000

A definition for a GDT that will be used. The first element is zero (apparently bios programs may expect this), the second is for the code section. The statement on line 18 defines a constant (and is not the same as .code section in assembly or object files), the constant will have a value of 8. The code segment element defines the offset in memory where that segment starts, its' size and some flags. To decode the GDT, label the bytes from right the left starting at 0 and ending at 7. The base, (start address position, in bytes or pages) is constructed from bytes 7, 4, 3, 2, and is a 32 bits value. The size, (number of bytes or pages) is constructed from 0 and 1 and half of 6. The other half of 6 defines the size flags. Byte 5 is used for flags. In the number 0x0020980000000000, base and limit are both zero. The size field is 0x2 or 0b0010, which means this is a 64bit descriptor. The flags field is 0x98, or 0b10011000, from high to low, high bit set == valid entry, 00 == privilege, ring 0, 1 == always set, 1 == executable, 0 == code can be run only in ring level 0, 0 == code segment cannot be read (can never be written to by definition), 0 == accessed bit, will be set by processor. Line 22 + 23 is for the description of a data block with the same base as the code block, this is not a 64bit segment. The flags are, 0x90, or 0b1001000, which means a valid data entry, with ring 0 privilege that grows up and is not writable. The last entry is not an entry in the table but the contents for the GDT register. First a 16bit length in bytes (minus the 1), next the 16bit position of the start of the table. How these flags, bases and lengths work out will hopefully become clear in the memory handling code.

29
30
31
32
33
34
35
36
37
38
DiskPacket:
	db	0x10
	db	0
d_blk:	dw	payload_blocks	; int 13 resets this to # of blocks actually read/written

db_off:	dw	after_me	; memory buffer destination offset
db_seg:	dw	0	        ; memory buffer destination segment

d_lba:	dd	1		; put the lba to read in this spot
	dd	0		; more storage bytes only for big lba's ( > 4 bytes )

The BIOS provides services to the boot program, one of these services is reading sectors from the disk. The service needs a structure filled with the number of sectors to read from the disk (14 in this code), were to put the read data (just after the code that was loaded from the same disk and is now running) and the LBA address (1 is the block just after the boot block).

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
read_sector:
 	mov 	si, DiskPacket		; address of "disk address packet"
	mov 	ah, 0x42		; AL is unused
	mov	dl, [BootDrv]
	or 	dl, 0x80		; drive number 0 (OR the drive # with 0x80)
	int 	0x13
	jc 	bad_disk
	inc	dword [d_lba]
	ret
bad_disk:
        mov     si, disk_sad_msg
        call    print
halt:
        hlt
        jmp halt

The call to read block from the hard disk, the bios will load the first block and put this block at 0x7c00. The other blocks will need to be loaded by the boot code (and will be placed 0x7e00). This is a standard implementation of how to call the bios and load the blocks. This service is activated by the 0x13 interrupt with the AH register set to 0x42 and the DL register set to the boot drive. The service will set the carry flag on any error, and the boot code will then print a message and halt the machine. As for line 47, I have no idea why the word at the d_lba address needs to be increased.

57
58
59
60
61
62
63
64
65
66
67
68
69
70
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Print string at si using bios console
print:
        mov    al, [si]
        inc    si
        or     al, al
        jz     end_print    ; end at NUL
        mov    ah, 0x0e     ; op 0x0e
        mov    bh, 0x00     ; page number
        mov    bl, 0x07     ; color
        int    0x10         ; INT 10 - BIOS print char
        jmp    print
end_print:
        ret

Print characters in a zero delimited buffer one at the time using a bios service.

75
76
77
78
79
data:
        start_msg      db 13, 10, "Loading payload from disk...", 13, 10, 0
	end_msg        db "Running Payload...", 13, 10, 0
        disk_sad_msg   db "Disk Error!", 13, 10, 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Text strings to print, 13 == CR, 10 == LF, 0 is end of string byte

81
	BootDrv        db 0 ; drive that we booted from

Byte to store the number of the boot drive

85
86
87
88
89
init:
        xor     ax, ax
        mov     ds, ax
        mov     es, ax
        mov     ss, ax

Set ax to zero and copy this value into ds (data segment), es (extra segment), ss (stack segment).

90
91
       	mov	bp, 0x9c00  ; init realmode stack
	mov     sp, bp

Setup a stack location, note that this is 8k bytes removed from the start of the boot code. The current minimal OS code is 3.3k so this is far away removed.

The stack is only used for a couple of calls in this boot rom and will not grow down by more than 1 word (the IP pointer will be pushed on the stack).

92
        mov	[BootDrv], dl  ; where we booted from

The bios will fill the lower part of the dx register with the index of the boot drive, store this index in memory

96
97
	mov     si, start_msg
        call    print

Print a start message to the boot screen

98
	call    read_sector

Read the rest of the rom

 99
100
	mov     si, end_msg
        call    print

Print an end message, rom has been read

101
        cli

Clear all status flags

102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
	;; enable a20
	call	a20_loop
	jnz	a20_done
	mov	al, 0xD1
	out	0x64, al
	call	a20_loop
	jnz	a20_done
	mov	al, 0xDF
	out	0x60, al
a20_loop:
	mov	ecx, 0x20000
a20_loop_2:
	jmp 	short a20_c
a20_c:
	in	al, 0x64
	test	al, 0x2
	loopne	a20_loop_2
a20_done:

A internet search for the A20 line in intel processors will inform you on some interesting properties of the intel processors. In short, the 20th address line is disabled at boot and no memory above 1mb can be accessed, to get to 64bit mode the address line has to be enabled. The most standard method to enable the line is to send a message to the keyboard controller and this is done in this code. Strangely the a20_loop code is missing a 'ret' statement after line 118 and even if ret is added, the statements at 104 and 108 will do nothing as the loop will only finish when the Z-flag is not set. The jump at line 114 is get a small delay. At line 110 and extra call to the loop and an unconditional jump to a20_done should be added. The boot rom works, but only because the qemu bios already enables the a20 flag.

123
124
	xor	bx, bx
	mov	es, bx

Build a PML4 page table, first setup registers. I will need to look-up how these page tables work. Set the BX register to 0 and copy this value to the ES. ES should still be zero from the code at line 88 but it maybe that the register was changed in the bios code.

125
	cld

Clear direction pointer, for the following string operations.

126
127
128
	mov	di, 0xA000
	mov	ax, 0xB00F
	stosw

Store the value 0xB00F at address 0xA000 and increase di.

129
130
131
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

Store the word (2 byte) value 0 for 2047 times, will set 4k bytes to zero.

131
132
133
134
135
136
	rep 	stosw
	mov	ax, 0xC00F
	stosw
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

The PDP table, start with 0xC00F, repeat zeros

137
138
139
140
141
	mov	ax, 0x018F
	stosw
	xor	ax, ax
	mov	cx, 0x07FF
	rep 	stosw

The PD table, start with 0x018F, repeat zeros

This ends the set-up of the paging tables.

143
144
	mov 	eax, 10100000b		; PAE, PGE
	mov 	cr4, eax

To enable 64bit two bits in the CR4 control register need to be set; (1) the Physical Address Extension (bit 5) when set will enable 36 bit instead of 32 bit addresses and (2) Page Global Enable (bit 7) when set will enable global pages that are maintained for all tasks. The Intel documentation notes that the PG flag (in CR0) must be set first, in this code it will be set after this statement at line 151-153. Note that even in REAL mode the 32bit registers are available.

145
146
	mov 	edx, 0x0000A000		; PML4
	mov 	cr3, edx

The address of the paging table is stored in CR3 (and 0xA000 was used in the setup for the paging tables)

147
148
149
150
	mov 	ecx, 0xC0000080		; EFER.LME
	rdmsr				; long mode!
	or 	eax, 0x00000100
	wrmsr

Change a Model Specific Register, the address of the register must be put in ECX and the value of the register will be put in EAX and EDX. In this case a bit in the MSR IA32_EFER must be set, its' address is 0xC0000080. The bit will enable the IA-32e mode as no flag is set in the Code Segment descriptor bits, the mode will be the so called "compatibility mode". The actual model (64bit or less) will then be determined from the GDT and in the GDT the 64bit flag was set.

151
152
153
	mov	ebx, cr0		; long mode
	or	ebx, 0x80000001		; Paging and protection
	mov	cr0, ebx		; Skip pmode

Enable paging

154
	lgdt	[gdt64.pointer]

The GDT register is loaded, and CPU will use the GDT from now on

155
 	jmp	gdt64.code:longmode     ; CS, 64b seg

A mixed size jmp, nasm implements code for this. As gdt4.code points to a quad word (8 bytes, 64 bits) the jmp is into a 64 bit segment.

156
[BITS 64]

Generate 64 bit code starting from this point

158
159
160
161
162
	;; set up new code/data/stack segments
        mov     ebp, stack_top
	mov     esp, ebp
	extern main
        jmp main

Setup C stack and call main

164
	times	510-($-$$) db 0

Fill up any leftover space with zero bytes but leave out the 2 last bytes

165
166
bootsig:
	dw 0xAA55

All boot sectors end with two bytes,0xAA and 0x55

168
after_me:

Label to use for loading the data from this disk into physical memory.

GNAT Zero Foot Print - Take 4 - Introduction of the platform

August 13th, 2018

An Ada runtime library is used to provide a standard interface to different operating systems and hardware. Already two different ways of compilation (1) based on the C library (2) based on assembly code, is supported in the ZFP library. Both versions can be had by pressing a different node of the v-tree. Although this works, it all becomes complicated when I want to add the same file to both systems and have to maintain multiple branches. Also, I want to a add a version of the library with no OS support and one with 64-bit arm support and probably MIPS and so on and so forth.

I needed to do a major overhaul of the code to support different platforms. An option was added to the gprbuild project file and with this option different source directories are selected to compile the library. All the sources have been distributed over different directories, one directory adainclude for generic (non-platform specific) code and multiple directories under the platform directory for all those files that are different per system. Now that all the source files are in different directories, the only way the runtime can be used is once it is installed1.

To use this new code, download the patch and signature;

After pressing, you'll need to do the following magic commands in the zfp directory2;

make clean MODE=x86_64-asm

make MODE=x86_64-asm

make install MODE=x86_64-asm PREFIX=prefix-asm

To check;

cd examples

make clean

make RTS=../prefix-asm

This will build the assembly based gnat library, for the C based do in the zfp directory;

make clean

make MODE=x86_64-c

make install MODE=x86_64-asm PREFIX=prefix-c

Again, to check

cd examples

make clean

make RTS=../prefix-c

Once built and installed into a prefix directory the default GNAT, the C and asm library can all be used to build the examples. The only thing to be set is the runtime directory with the RTS environment variable.

  1. At installation time the source files will nicely be put into the target adainclude directory with the gprinstall command []
  2. make is necessary, the gprbuild is fine for building Ada libraries and executables but when it comes to a simple rule to copy a file to a new name (so that gprinstall can pick it up and install that file) you can forget about it. []

GNAT Zero Foot Print - Take 3 - Regrind

August 7th, 2018

No new code in this installment. Instead, a regrind of all 3 patches, after a helpful suggestion to do so by Diana Coman . With this regrind, I updated the patches to follow the current thinking in vpatch management; the whole package under a common subdirectory, addition of a manifest and all files hashed with Keccak.

You can download and press the files with

v.pl init http://ave1.org/code/zfp

but you will have to comment out the hash checking code.

GNAT Zero Foot Print - Take 2 - No C

July 6th, 2018

"Libc gotta go."

—Stanislav Datskovskiy

And it will. In an, at this moment, unknown amount of steps the C library can be ripped out from the Ada Runtime library and be replaced with Ada and assembly code. In the first step, all C calls need to be replaced with Ada code and possibly some assembly to perform system calls to the Linux kernel. The second step is then to replace the C library specific start-up code with code for Ada.

I start with the previous version of a minimal ZFP library for Linux. This library uses only two calls to the C library, one to output characters and the other to exit the code. Both are replaced with a direct system call1. The second change is to include a file with startup code2. The resulting code is published in the following vpatch (with signature).

Combine this patch with those from the previous installment, press and build it. Building the code needs to be done with the Makefile3.

<<create a directory and put a .wot directory in it with at least my key>>

v.pl init http://ave1.org/code/zfp

v.pl p a zfp_2_noc.vpatch

cd a

make

cd examples

make

All system calls can be found in the adainclude/s-syscal.adb file. The Write function (used for outputting characters) is implemented as a single assembly statement syscall with the parameter list specified to fill the right processor registers. The function starts with a conversion from characters to bytes4 and ends with a check of the return values. After a completed system call the 'RAX' register will be filled with a return code. If an error occurred during execution of the system call, the register will contain the error code as a negative number (always between -1 and -4096). If the execution was successful the register will contain 0 or any other 64bit number outside of the range -1 to -4096.

function Write (fd : in Int; S : in String; E : out ErrorCode) return Int is
    type byte is mod 2**8;
    B : array (S'Range) of byte;
    R : Int := 0;
 begin
    for I in S'Range loop
       B (I) := Character'Pos (S (I));
    end loop;
    Asm
      ("syscall",
       Outputs => (Int'Asm_Output ("=a", R)),
       Inputs  =>
         (Int'Asm_Input ("a", SYSCALL_WRITE),
          Int'Asm_Input ("D", fd),
          System.Address'Asm_Input ("S", B'Address),
          Int'Asm_Input ("d", B'Length)),
       Volatile => True);
    if R < 0 and R >= -(2**12) then
       E := ErrorCode'Val (-R);
       R := -1;
    else
       E := OK;
    end if;
    return R;
 end Write;

The a-textio.adb and last_chance_handler.adb files have been updated to use the system calls instead of the C library. The s-maccod.ads was added from the GNAT runtime library to support the inline assembly code. The other addition is the startup.S file. In it simplest working form it just needs to contain one definition of a global (_start), a call to a main function and a syscall to exit the code;

.global _start

_start:
  call main

  /* exit code */
  mov $60, %rax
  mov $0, %rdi
  syscall

The version in the patch also stores the argument count and a pointer to the argument array in two globals. Both globals are unused for now but will be needed for future parsing of any command line arguments.

The final noteworthy change is the inclusion of a runtime.xml file. The gprbuild command will use this file to set flags for all projects that are build with the runtime library. For reasons , this file is written as an xml file containing gprbuild project statements;

<?xml version="1.0" ?>

<gprconfig>
  <configuration>
   <config>
   package Linker is
      for Required_Switches use Linker'Required_Switches &amp;
        ("${RUNTIME_DIR(ada)}/adalib/libgnat.a") &amp;
        ("-nostdlib", "-nodefaultlibs", "-lgcc");

      for Required_Switches use Linker'Required_Switches &amp;
          ("${RUNTIME_DIR(ada)}/adalib/start.o");
   end Linker;

   package Binder is
      for Required_Switches ("Ada") use Binder'Required_Switches ("Ada") &amp;
       ("-nostdlib") ;
   end Binder;
   </config>
  </configuration>
</gprconfig>

The linker flags are set so that no standard C library or startup code is included in the resulting binary. As we are then lacking the default startup code, an extra line is added to include the start.o code with every compile.

In the end, the fun part, a working binary. The hello world example from the previous installment can be built and it's size inspected. It is now at 2.6k (down from 54k) on my computer5.

In the final end, I will include another reference to AdaCore's configurable runtime documentation. The GNAT documentation has been very helpful for learning the GNAT system and developing this library.

  1. The main difficulty in doing so is to learn how the Linux system calls work and get a better understanding of the inline assembly statements. Stans' demo.asm posted in the logs proved very helpful for this process []
  2. This file is now written in assembly, although (upon reflection) it should be possible to rewrite it in Ada []
  3. I did not find a method to compile one separate file into an object file with gprbuild []
  4. Which in practice will be a copy operation []
  5. Ofcourse, this minimal library is too minimal. In some cases (for example when a string is concatenated) the compiler will generate memcpy or memset calls. We need to provide replacement Ada functions for each. This is not difficult as the ada 2017 code contains pure Ada versions for all of these. []

Building GNAT on MUSL, updated tar line

June 3rd, 2018

An update on the previous version.

The produced gcc compiler builds static executables and no dynamically linked executables.

For detailed instructions in how to run the script see the readme-2018-06-01.txt.

PGPy a review

May 29th, 2018

The code of PGPy1 sucks.

A good indication of the quality of a Python package is the 'requirements.txt' file, reproduced here;

cryptography>=1.1
enum34
pyasn1
six>=1.9.0
singledispatch

The cryptography package will need to be reviewed separately. A quick view at the PYPI package index for cryptography is already good for some lulz2. The enum34 brings the Python 3 enumeration type to Python 2. Only one object is used from the package pyasn1 and the functionality provided in this object could all have been implemented in an hour in PGPy. If you see six as a requirement, you know you are in trouble. The six package is for when an package author wants to program in Python 2 but also wants to make it's3 program work in Python 3 without any conversion. So six indicates that you will be reading code that is not Python 2 and will use the from future import print_function, the from future import division and more. Any author writing packages requiring six can be safely negrated. The singledistpatch package is again something from Python 34. Based on these requirements alone, I conclude that PGPy sucks.

Next, the types.py file in the pgpy directory. The code in this file failed to run on my systems and so triggered this review. The first class definition therein is an Armorable class. Clearly the authors did not know it is forbidden to define any -able class in Python. The Armorable class contains the full implementation of converting objects into armored5 text and vice versa. This is a mistake, as -able stands for Capable of being ..., the being in that fragment will need to be implemented by something else. If something is drinkable it usually does not drink itself, but is has properties that make it drinkable to someone. And which of it's many properties make it drinkable is determined by the drinker not by the drunk. Based on this class alone, I conclude that PGPy sucks.

Two classes in types.py are defined with a meta class (the Armorable and the MetaDispatchable). The whole metaclass mess is defined in PEP-3119. Go and read it if you want to waste your time. The definition of MetaDispatchable provides for an extra complex and custom object-orientation. Remember, we are reviewing a package to handle PGP code. Another strike against PGPy and I will not bore you with more.

  1. Yes not pgpy or PGPY but PGPy []
  2. The first only example is with a Fernet symmetric encryption recipe as if that is something. []
  3. Yes it's []
  4. I'm not following Python 3 but clearly the development of Python 3 has gone over the deep end []
  5. A PGPism []

Convert a TMSR key to PGP

May 29th, 2018

A script is floating around to convert TMSR key format (e,n,comment) to a PGP key for digesting in phuctor. This script did not work on the machines I tried it on. Of course, the script is fine, it's PGPY that is broken. I could not get it to install. As I'm programming in python for a living and have all kinds of stupid in me, I decided to try and fix the pgpy code that failed to install. An hour was so spent and some material gathered for a future blog post, but not any working code1.

After that I decided to spent another hour making an alternative that uses only standard python modules. I read RFC 4880 a month ago, this left me with headache back then. The thing is unreadable. So to make this script, I made extensive use of the search function in my browser and only read those lines that helped in writing the script.

The script;

import struct
import time
import sys
import base64
import math

# some format strings for the struct module
# these are used to encode integers and shorts to arrays of bytes
# '>' stands for big-endian as this is what is used in the PGP format
openpgp_publickey_format = ">BIB"
mpi_format = ">H"
packet_length_format = ">I"
crc_format = ">I"

# determine the index of the highest bit set to 1 in a number
def count_bits(B):
  R = 0
  i = 0
  while B > 0:
    i += 1
    if B & 0x1:
      R = i
    B >>= 1
  return R

# Convert a number to an array of bytes
# The bytes in the array are stored in big-endian order.
# The most significant byte is stored as the first item
# in the array
def number_to_bytes(B):
  R = []
  bits = 0
  while B > 0xff:
    bits += 8
    R.append(B & 0xff)
    B >>= 8
  R.append(B)
  bits += count_bits(B)
  return bits, ''.join(map(chr, reversed(R)))

# An MPI is a byte array that starts with a two byte
# length header. The length is given in bits.
def number_to_mpi(B):
  C, A = number_to_bytes(B)
  return struct.pack(mpi_format, C) + A

# A PGP public key header consists of a byte "4",
# an integer (4 bytes) to denote the timestamp
# and a byte "1" (RSA).
def public_key_header(T):
  return struct.pack(openpgp_publickey_format, 4, T, 1)

# A public key packet is the public key header
# plus 2 MPI numbers, the RSA modulus (N) and
# the RSA exponent (e).
def public_key_packet(t, n, e):
  return ''.join((public_key_header(t), number_to_mpi(n), number_to_mpi(e),))

# A comment or userid packet is a string encoded as utf-8
def userid_packet(s):
  return s.encode('utf8')

# The PGP format is a stream of "packets".
# Each packet has a header. This header consists of a tag
# and a length field. The tag has flags to determine if it is a
# "new" or "old" packet.
# The only supported encoding in this scriptis "new".
def encode_packet(packet_bytes, tag = 6):
  # 0x80, 8th bit always set, 7th bit set --> new packet
  h = 0x80 | 0x40
  # 0-5 bits -> the tag
  h |= tag

  # convert the integer to a byte
  header = chr(h)

  # dude, this is totally how you may save 2 or 3 bytes with minimal complexity
  l = len(packet_bytes)
  if l < 192:
    header += chr(l)
  elif l < 8384:
    l -= 192
    o1 = l >> 0xff
    o2 = l & 0xff
    header += chr(o1 + 192) + chr(o2)
  else:
    header += chr(0xff) + struct.pack(packet_length_format, l)

  return header + packet_bytes

# When you encode binary data as an ascii text with base64
# this data becomes fragile. So a CRC code is needed to
# fix this.
def crc24(s):
  R = 0xB704CE
  for char in s:
    B = ord(char)
    R ^=  B << 16
    for i in range(8):
      R <<= 1;
      if R & 0x1000000:
        R ^= 0x1864CFB
  return R & 0xFFFFFF

# Create a public key for consumption by Phuctor.
# The public key needs to contain 2 packets
# one for the key data (n, e)
# one for the comment
# It must be in the armor / ascii format.
def enarmored_public_key(n, e, comment, t):
  R = []
  # the header
  R.append("-----BEGIN PGP PUBLIC KEY BLOCK-----")
  R.append("")

  # the packets in bytes
  A = encode_packet(public_key_packet(t, n, e), 6)
  A += encode_packet(userid_packet(comment), 13)

  # the packets in base64 encoding with line length max 76
  s=base64.b64encode(A)
  i = 0
  while i < len(s):
    R.append(s[i:i+76])
    i += 76

  # the CRC
  R.append("="+base64.b64encode(struct.pack(crc_format, crc24(A))[1:]))

  # the footer
  R.append("")
  R.append("-----END PGP PUBLIC KEY BLOCK-----")

  return 'n'.join(R)

# read a file with comma separated lines
# each line is in the TMSR format: e,n,comment
if __name__ == "__main__":
  ser = 1
  for x in sys.stdin:
    x = x.strip()

    # ignore empty lines
    if len(x) == 0 or x.startswith('#'):
      continue

    # the comment may contain comma's so split on the first 2
    e,n,comment = x.split(',', 2)

    t0 = int(time.time())
    with open("{0}.txt".format(ser), "wb") as stream:
      stream.write(enarmored_public_key(int(n), int(e), comment, t0))

    ser += 1

And the patch itself with signature;

  1. I've been reading code (both open and closed source) for a large part of my life. I started this whole career by typing over basic programs into my fathers Commodore 128 and then stumbled along. The code I read in these popular security programs (pgpy, openssl, openssh, pgp) is markedly worse than any I encountered before. I can only image the kind of cockroaches that are attracted to this foul smelling mess []