mxForth is an extremely efficient Forth compiler (presently there is no freely available Forth compiler
generating faster code). It will generate applications in which the interpreter is still active. This is an option that costs
real money (licensing!) in commercial compilers.
The downside of mxForth is that it is real slow at compiling big programs, and intrinsic functions (FIND WORD CMOVE etc.) are not optimized. This can be a problem sometimes. (If it is, you probably should be using iForth). Of course, not everybody is a speed freak, some people really care about functionality, user interface building, library support... In this case you should look for a professional Forth package (eg. ProForth for Windows from MPE or SwiftForth from FORTH Inc.).
There are presently versions of mxForth for Linux 2.0 and for Windows NT 4.0.
In the past I have written many meta and target compilers. Most of this vast array of source code is not usable anymore, mainly because it is written for 16-bit PC-Forths with very peculiar extensions. In order to save some of this tremendous amount of work from oblivion I decided to go against my principles and write a metacompiler in iForth (which is a full ANS Forth). For this I used the best ideas from the past but, because over-generality tends to lead to obesity, I made some clear design decisions from the start. META and mxForth will become part of the iForth distribution.
The first release of META is now ready. It fits in a 24 Kbytes text file. I've written an example application called mxForth that fits in 98 K of text. META can generate any kind of Forth for any kind of machine and any kind of operating system, but mxForth (which looks like an unoptimized evil brother of iForth) is a subroutine-threaded Forth for the Pentium meant to run under Linux. The OS-part of mxForth is split off to a C-server program that performs all I/O duties and loads mxForth into memory. This C-server program could just as well be written in any other language (assembly language for instance). mxForth has most ANS wordsets. Only the floating-point wordset is really missing. All the file words are there. Extra's are some Linux specific words to do timing, shell to the OS, change the working directory, etc. As a bonus one can write C-subroutines, add them to the server code and call them from Forth. It is not yet possible to let C call Forth. mxForth can be reduced to about 40 kB (presently is 700 K), the C-server is 18 kB.
META itself is very nice, I think. Most standard Forth operators like @ ! MOVE DUMP SEE , DIS are available in a META wordlist where they operate transparently on the target memory space. To prevent me going crazy, the new words are prefixed by "T", so we have T@ T! TMOVE TDUMP TSEE TDIS TWORDS etc. The only thing you can't do is execute the new code. The meta compiler is multi-pass, so you can create very complicated forward references if it needs be (mxForth currently needs 4 passes to build). A symbol table is generated at the end of the compile to aid in debugging (I had to learn gdb to debug mxForth, which was a terrible experience).
As Anton Ertl has shown many times in the past, a subroutine-threaded Intel Forth is very inefficient, and direct-threaded seems to be best. Now, a direct-threaded Forth called eForth is already available. It must be assembled using MASM (which is enough to drive red-blooded Linuxers up the wall). So I decided to do it differently this time. As efficiency was no primary aim, I valued the fact that a sr-Forth is as about as simple as one can imagine. But of course, I was curious about the performance anyway. To test that, I ran the four benchmarks in the [xxx] distribution. The results are shown below. You will notice that mxForth is about 200% faster than [xxx] 0.3.0 straight-out-of-the-box, on an Intel P5-166. OPTIMIZE is set to 4 (the default), which means that CONSTANT etc is immediate, and that LITERAL and BRANCH etc. generate inline machine code (which is the natural thing to do for a sr-Forth). Note that OPTIMIZE is an mxForth word, not a metacompile option. Maybe I will add some MACRO words to mxForth to see how high one can tune the performance. Wil Baden's Pinhole optimizer can not easily be added to a subroutine-threaded Forth. This is something I found out too late.
gcc -O4 mxforth.c -omxforthDo not touch the mxforth.img file in any way. The META compiler will generate a new mxForth when iForth is invoked as follows:
i4 in mxf.cmd(Here I assume you have aliased the commands to start iForth with i4).
[LF is an editor written in ANS Forth by Leo Wong -mhx]
LF.FRT runs (using function keys, reverse video, bold, and dynamic memory). I also added the Laxen & Perry F83 block editor and the "report" writer from Starting Forth. mxForth does not have floating-point or locals. Dictionary manipulations are possible, but some words are missing [this has been corrected -mhx]. It has FORGET, but no MARKER (FORGET crashes if you nest VOCABULARY's) [MARKER is now in, but the vocabulary problem remains -mhx]. mxForth is case-sensitive [not anymore, it is now an option -mhx].
I had to put in a little extra work to assure the figures I gave stay true, even with the original benchmarks. The new meta.frt and mxforth.zip files are now available. mxForth has become about 30% faster than my latest figures, I seem to have overdone it. My excuses to those people that have already downloaded the package.
The updated results are as follows:
siev bubble matrix fib machine and configuration -------------------------------------------------------------------------- 8.72 9.17 8.79 10.34 Intel Pentium 166MHz; 256K cache; gcc-2.7.2 --enable-force-reg; [xxx]-0.3.0; 4.03 5.12 3.71 4.39 Intel Pentium 166MHz; 256K cache; mxForth 2.1 with OPTIMIZE=4mxForth speedup with regards to [xxx]:
siev bubble matrix fib -------------------------------------------- 2.16 1.8 2.37 2.36 times faster
The main change is that I altered the model to have the top of stack in a register. mxForth is still subroutine-threaded. Furthermore I removed the numerous no-ops that 2.1 inserted to ensure alignment. The main drawback from this is that mxForth 2.2 has become very difficult to SEE.
To satisfy my curiosity I also added a switch to remove all optimizations from mxForth: OPTIMIZE OFF. In this case mxForth runs at about 0.75 times [xxx]'s speed (confirming bare subroutine-threading is very slow on Intel).
The results from the standard benchmarks in the [xxx] distribution as run on mxForth 2.2 for Linux. mxForth is subroutine-threaded with TOS in a register. The results for [xxx] are shown as a base-line figure, [xxx] is currently the fastest freely available Linux Forth (it also is quite a lot faster than Win32Forth for NT on the same machine). For all the following timings I used an Intel Pentium 166MHz with 256K cache.
siev bubble matrix fib configuration (times in seconds) -------------------------------------------------------------------------- 8.72 9.17 8.79 10.34 gcc-2.7.2 --enable-force-reg; [xxx]-0.3.0; 15.19 17.05 11.98 16.79 mxForth 2.2 with OPTIMIZE=0 12.15 12.47 10.60 13.60 mxForth 2.2 with OPTIMIZE=4 5.86 7.66 8.12 5.97 mxForth 2.2 with OPTIMIZE=7 10.30 14.10 16.25 18.04 mxForth 2.2 with OPTIMIZE=8 10.30 12.60 6.37 7.84 mxForth 2.2 with OPTIMIZE=10 9.34 9.85 4.36 7.84 mxForth 2.2 with OPTIMIZE=11 7.18 8.69 4.89 10.67 mxForth 2.2 with OPTIMIZE=12 3.86 8.68 4.91 2.99 mxForth 2.2 with OPTIMIZE=14 3.26 4.24 2.72 2.99 mxForth 2.2 with OPTIMIZE=15What are the optimizations?
OPTIMIZE (bit# set) optimization performed ---------------------------------------------------------- 0 smart VARIABLE and CONSTANT 1 LITERAL compiles inline instead of subroutine 2 LOOP, branch and ?branch compile jumps 3 MACRO's are expanded inlineIndividual bits in OPTIMIZE can be set to enable the different optimizers. The results show that simply inlining code (8) is not very effective as a first step, it may make some algorithms slower. The best approach seems to be a smart CONSTANT, VARIABLE and LITERAL plus inlined jumps. Non-smart CONSTANT and VARIABLE are possible (14) but a factor 1.5 lower speed results when a lot of memory referencing is going on.
What are the limits? iForth is still about 2 times faster than mxForth. The RAFTS project promises Forth code that will be at least 1.3 times faster than iForth/bigFORTH (approaching C).
Just as Leo Wong's fascination with LF/MF/HF, I just can't let go of mxForth.
We're skipping release 2.3 and go to 2.4 immediately. In 2.3 I added more macro words and made sure John Hayes's TESTER.FRT ran without errors. In 2.4 I succeeded in having mxForth run Anton Ertl's POSTPONE.FS. This is quite an achievement for an optimizing Forth (I think). I have now a somewhat generalized approach to defining optimizing macro words that doesn't need state smartness. The macro's even succeed in generating optimized code when POSTPONEd or [COMPILE]d. There are *no* state-smart words in mxForth anymore, none. I will let this technique ripen some more before I backmerge it into iForth. I don't see yet how I can re-use the macro's in the metacompiler to make sure the code for mxForth itself is optimized, but it must be in there somewhere.
Maybe interesting for some people: I've ported mxForth to Windows NT 4.0. It might even work for Windows '95 as long as SYSTEM is not used (I did not test this!). To be able to testdrive mxForth under Windows I have made a simple C server available (mxserver.exe) for download at http://www.IAEhv.nl/users/mhx. iForth users can use their iforth.exe C server by simply renaming the image file from mxforth.img to iforth.img. The documentation on my homepage has been expanded to reflect the latest changes.
Of course mxForth still has the facility to extend it through the C server. Just write a C function, put its address in the jump table, and mxForth can call it with SYSCALL. (This might be fun for fooling around with Windows DLL's interactively).
Although I did my best, I seem to have reached the limit of what can be done with the mxForth model (subroutine, TOS in register, datastack indexed by ebp, Forth return == hardware machine stack). In the end iForth seems to be at least 1.25 times faster than mxForth (iForth uses the hardware stack for the Forth data stack). I looked into CF32 by Tom Zimmer - Thom Almy. The programs I could get to run reliably on CF32 are 1.25 to 2 times faster than mxForth/iForth. The extra speed comes from the IN/OUT compiler hint CF32 allows the programmer to give to the compiler.
And now, the results for the standard benchmarks in the [xxx] distribution as run on mxForth 2.4 for NT (The Linux timings are exactly the same).
Note: The results for [xxx] are shown as a base-line figure. This is *not* to imply [xxx] or mxForth are the Alpha and Omega of Forthing and all other Forths are not worthy of your attention. [xxx]'s authors have expressed an interest in building an efficient Forth kernel and that's about all mxForth is good for at the moment. For building eye-popping, jaw-flapping applications under Win32s/Win95/WinNT use Win32Forth, for Linux use ... weell, who wants that stuff anyway :-)
For all the following timings I used an Intel Pentium 166MHz with 256K cache.
siev bubble matrix fib configuration (times in seconds) ------------------------------------------------------------------- 8.72 9.17 8.79 10.34 gcc-2.7.2 +force-reg; [xxx]-0.3.0; 3.26 4.24 2.72 2.99 mxForth 2.2 with OPTIMIZE=15 2.61 2.13 2.29 1.75 mxForth 2.4 with OPTIMIZE=6 (max)There is quite a lot of improvement of 2.4 over 2.2, OTOH 2.4 was very difficult to get right.
-- Program output (NT 4.0) ----------------------------------------  mxForth server 0.69 (console), Jul 30 1997, 16:49:11.  Stuffed mxForth at 0041618A [entry: 0x420000]  Current process priority is 32. mxForth vsn 2.4 FORTH> cd mxf2/work c:\dfwforth\examples\mxf2\work ok FORTH> include benches.frt Running the Ertl Suite... Sieving...2607 ms elapsed redefining list Bubbling...2129 ms elapsed Bubbling with flag...2454 ms elapsed Matrix multiply in progress..2286 ms elapsed Fibonacci (optimized)...9227465 1690 ms elapsed Fibonacci (original)...9227465 1746 ms elapsed ok