A Family of Many Core Forth Processors --> Memory Management --> 24 bit Memory Management

24 bit Memory Management

A future release will support 24 bit words. This document tells you how that will work.


The largest Ice40 FPGA has 1Mbit of single port memory.  It is divided into 4 blocks of 256K * 16 bit wide words.  The Hana cpu allocates 1 block to each core.  To the developer, each processor looks like a 24 bit processor, with 10,666 words of memory,  but there a few places where the abstraction breaks down.  The Hana 4+4/24 processors is built on a 16 bit wide memory, so reading a 24 bit data word actually takes 2 clock cycles, during which the processor pauses.    Furthermore, the processor can store more instructions than you would expect using 24 bit words, because the instructions are compressed.  Other than that, the abstraction works very nicely.   You can read the details below.

The FPGA memory words are 16 bits wide, but the Hana processor words are 24 bits wide. So  every Hana word is stored in one and a half FPGA words, and requires two memory accesses.  Hana words at even addresses occupy one FPGA word, and the first half of the next FPGA word.  Hana words at odd addresses occupy the second half of the FPGA word, and the following FPGA word.  During reads, the processor takes two clock cycles to read two FPGA words and merges the two parts. During writes, the pause is not required,  the second half of the word can be written, while the processor is doing the next thing.    There is a 4 bit write mask, so it is possible to write to just half of the FPGA word. 

We have to worry about contention for memory.  The Hana processor is based on the J1 and Mecrisp cpus.  They require two port memory access.  Every clock cycle, the J1 processor reads an instruction from memory, and during many clock cycles the J1 application instructions also read or write to memory.    That works fine on more expensive FPGA's with dual port memory, but completely breaks down on the more economical ICE40 FPGAs.   The simple solution is to  give priority to application reads and writes, and pause the processor.  But that would slow the application.

To avoid memory contention slowdown, the instruction are pre-fetched.  They are read whenever possible,  and cached.  This includes reading instructions at jump addresses, so that frequently there is no stall during a jump.    Thus there is no need for instruction inlining, which saves precious memory.

To reduce the number of instruction fetches, and to minimize memory consumption, instructions are compressed.  Hana instruction size varies from 4 bits to 8 bits.  The most frequently used instructions are stpred in 4 bits.   That includes call, exit, swap, >r, r<, dup, and over.  Less frequently used instructions require 8 bits. Instructions which require an address,  such as read, write, jump and conditional jump, require 4 bits plus the address.  Small addresses (<4096) are stored in 12 bits.   The Hana processor also compresses small literals.   Small literals  ( between -127 and 128) are stored as 4 bits plus 1 byte.  The compression algorithm is available in the open source Forth code.  If you want documentation, just ask.  I expect it to be changing rapidly, so I am not yet documenting it.  In particular I am not quite confident what the most frequent forth commands are.  I also expect the choice of words implemented in hardware to evolve rapidly.

"I work very hard so that I can be lazy."  The point of all of this complexity, is to make it appear to the developer that he has a 24 bit processor with more than 10,666 words of memory. 




 Built using the  Forest Map Wiki