Skip to content

Bytecode Compiler & VM

username-removed-240561 requested to merge pull/5/bytecode into master

Created by: acook

This re-implements blacklight in terms of bytecode and a VM. Along with this are a massive number of changes big and small to behaviour and syntax. It should also greatly increase the execution speed and significantly decrease the memory usage of all blacklight programs.

Internals

Roughly speaking here are the internal file structure changes:

  • lexer -> compiler
  • lexer -> literal_matchers (matchers only)
  • evaluator -> vm
  • operations -> bytecodes
  • datatypes -> (individual file per type)
  • datatypes -> item_interfaces (interfaces only)
  • datatypes -> item_functions (currently equality only)
  • stack -> meta (MetaStack only)
  • stack -> object_stack (ObjectStack only)

New internal concepts introduced are:

  • VMstate is a struct that keeps tabs on everything, roughly equivalent to a function frame in many languages since there can be multiple and new ones are instantiated for block calls and threads. Each VMstate is named after it's origin, for example main for the primary thread, block for block calls, and work for spawned threads from the work op.
  • Meta is derived from the MetaStack, but is built specifically for Stacks, has all unnecessary functionality removed and a few things added to better serve its use case.
  • Likewise the ObjectStack is built specifically for Objects and has a bare minimum of functionality.
  • number_packing is a mechanism based on encoding/binary but extended to 32 and 64 bit integers, and currently is just a handful of functions.
  • BC Analyzer - there is just a basic stub of this, but it will allow advanced metaprogramming with Blocks, essentially allowing a programer to disassemble and reassemble code on the fly in individual or large chunks. Some work on the symbol table and other features will help this along.
  • sequence is an internal interface which allows any types with sequence-like functionality to respond to basic sequencing operations. (this replaced the lower-case vector interface, which was mostly unspecified and ambiguous) V and T are members and B is an honorary member (will become a full-fledged member later).

All Types

In the original interpreter implementation all types were structs, now they're mapped to lower level implementations which will save memory overhead. The intermediary representation is gone as well, replaced by the bytecode. The only types that are still structs are Stack, Queue, Object, and Tag.

The item-creation syntax has standardized for types with such ops:

  • newq is now q-new
  • newsand <> is now s-new

You may still use the empty literal versions for types with literals:

  • () for V
  • [] for B

Any -to-cv ops are now -to-t to match the CV -> T renaming.

C type

The C (char) type literal syntax has been changed and expanded. It now maps directly onto Go's rune type and is entirely utf-8. It will likely be renamed from C to R soon so people don't confuse it with Clang-like chars.

Before:

\32  ;; defines a C in terms of a decimal number

After:

\'   ;; any single UTF-8 codepoint, including multibyte characters
\a32 ;; an ascii decimal
\u27 ;; a utf-8 hex code

T type

The T (Text) type was previously known as CV (CharVector), but distinguishing it from untyped V became a necessity. This means all -to-cv ops are now -to-t, which is also shorter and clearer. This also leads to some minor behavioural changes, but all normal V ops will still work just fine on T. This also allows T constants to be efficiently declared in the bytecode syntax and makes a whole class of T-specific ops easier to implement.

T literals now allow escaping of single-quote characters.

B type

The B (Block) type was previously known as WV (WordVector), but thinking of it in terms of just Words was resulting in difficulties encoding and calling them like functions. In the future it will be possible to dump a B into a V of Ws and other nifty metaprogramming features.

N type

The compiler can now understand negative N (Number) literals.

Known Issues

There are no known issues that weren't already present in the interpreter version.

Merge request reports