AssemblyScript Introduction (Part 1)

in coinexchain •  4 years ago 

AssemblyScriptis not a brand new programming language. Its syntax is a strict subset of the popular TypeScriptlanguage syntax. It is tailored and customized specifically for WebAssembly(hereinafter referred to as Wasm). The following diagram shows the grammatical relationship of JavaScript, TypeScript and AssemblyScript.

This article aims to discuss how AssemblyScript programs are compiled into Wasm modules. To understand the basic syntax and usage of AssemblyScript language, you can refer to the AssemblyScript tutorial. In our previous Wasm articles, we have discussed the Wasm binary format and instruction set in detail. We already know that the Wasm binary module organizes content by section, and there are currently 12 different types of sections. Let’s briefly review the content of these sections:

  • The custom section (ID is 0) stores auxiliary information such as function names. Such information does not affect the execution semantics of Wasm, and there won’t be any problem even if it is completely discarded.

  • The type section (ID is 1) stores function types (also called function signature) and block types.

  • The import section (ID is 2) stores the import information. There are four types of items that can be imported: functions, table, memory, and global variables.

  • The function section (ID is 3) stores the function signature information defined internally. This is an index table, which stores the index of the signature of the internally defined function in the type section.

  • Table section (ID is 4) stores internally defined table information. Restricted by the Wasm specification, the module can import or define only one table.

  • The memory section (ID is 5) stores internally defined memory information. Restricted by the Wasm specification, the module can import or define only one block of memory.

  • The global section (ID is 6) stores internally defined global variable information.

  • The export section (ID is 7) stores export information. Like the import section, there are four types of items that can be exported: functions, table, memory, and global variables.

  • The Start section (ID is 8) stores the start function index.

  • The element section (ID is 9) stores the initialization data of the table.

  • The code section (ID is 10) stores the local variable information and bytecode of the internally defined functions.

  • The data section (ID is 11) stores initialization data of the memory.

Let's take a look at how AssemblyScript programs are compiled into Wasm modules, or to be more specific, how the key information needed to run the program is stored in various sections. We will use the wasm2watand wasm-objdumpcommand line tools provided by WABTto observe the binary modules generated by the AssemblyScript compiler.

Type Section

All function signatures in the program will be collected by the compiler and put into the type section of the binary module. Here is an example:

declarefunctionf1(x: i32): i32;
declarefunctionf2(x: f32, y: f32): f32;
declarefunctionf3(x: f32, y: f32): f32;

exportfunctionf4(a: i32, b: i32): i32{
 returnf1(b) +f1(b);
}
exportfunctionf5(a: f32, b: f32, c: f32): f32{
 returnf2(a, b) +f3(b, c);
}

In the above example three external functions are declared and two internal functions defined. Note that we need to mark the internal functions as exported (or turn off compiler optimization); otherwise they may be optimized out by the compiler. Compile the above program using AssemblyScript compiler, and then use the wasm-objdumpcommand to convert the generated binary module to text format. The result is shown as below: 

(module
(type (;0;) (func (param f32 f32) (result f32)))
(type (;1;) (func (param i32) (result i32)))
(type (;2;) (func (param i32 i32) (result i32)))
(type (;3;) (func (param f32 f32 f32) (result f32)))
(import "index" "f1" (func (;0;) (type 1)))
(import "index" "f2" (func (;1;) (type 0)))
(import "index" "f3" (func (;2;) (type 0)))
(func (;3;) (type 2) (; unrelated code ;) )
(func (;4;) (type 3) (; unrelated code ;) )
;; unrelated code
)

Since the signatures of f2()and f3()are the same, there are a total of four function signatures. As you can see, the compiler puts the signatures of f2()and f3()first, followed by the signatures of f1()f4(), and f5().

Import & Export Section

As mentioned earlier, the module can import or export four kind of items: functions, table, memory, and global variables. According to the above example, if compiler optimization is not taken into consideration, the functions in AssemblyScript will be compiled into Wasm functions. Global variables also take on a similar correspondence, which we will see immediately. The following example shows how to declare external functions and global variables:

declarefunctionadd(a: i32, b: i32): i32;

@external("sub2")
declarefunctionsub(a: i32, b: i32): i32;

@external("math", "mul2")
declarefunctionmul(a: i32, b: i32): i32;

@external("math", "pi")
declareconstpi: f32;

exportfunctionmain(): void{
 add(1, 2);
 sub(1, 2);
 mul(1, piasi32);
}

The AssemblyScript compiler will take the file name of the compiled program as the external module name and the function or global variable name as the member name by default. Yet you can also use the @externalannotation to explicitly specify the member name alone, or specify both the external module name and the member name. Table and memory are special, and we will discuss them in detail later. The AssemblyScript compiler provides --importTableand --importMemoryoptions. If these two options are specified during compilation, table and memory import items (env.tableand env.memory) will be generated in the import section of the module. Save the above example as index.tsand then compile it with these two options. Below is the compilation result (which has been converted to text format):

(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func))
(import "index" "add" (func (;0;) (type 0)))
(import "index" "sub2" (func (;1;) (type 0)))
(import "math" "mul2" (func (;2;) (type 0)))
(import "math" "pi" (global (;0;) f32))
(import "env" "memory" (memory (;0;) 0))
(import "env" "table" (table (;0;) 1 funcref))
(func (;3;) (type 1) (; unrelated code ;) )
;; unrelated code
)

The functions and global variables marked as exported in the AssemblyScript language will be put by the compiler in the export section of the module; the memory is exported by default, but can be turned off by the --noExportMemoryoption; the table is not exported by default, but can be turned on by the --exportTableoption. Let's look at another example:

exportconstpi: f32=3.14;

exportfunctionadd(a: i32, b: i32): i32{ returna+b; }
exportfunctionsub(a: i32, b: i32): i32{ returna-b; }
exportfunctionmul(a: i32, b: i32): i32{ returna*b; }

Compile the above example with the --exportTableoption, and here is the compiled module (which has been converted to text format):

(module
(type (;0;) (func (param i32 i32) (result i32)))
(func (;0;) (type 0) (; unrelated code ;) )
(func (;1;) (type 0) (; unrelated code ;) )
(func (;2;) (type 0) (; unrelated code ;) )
(table (;0;) 1 funcref)
(memory (;0;) 0)
(global (;0;) f32 (f32.const 0x1.91eb86p+1 (;=3.14;)))
(export "memory" (memory 0))
(export "table" (table 0))
(export "pi" (global 0))
(export "add" (func 0))
(export "sub" (func 1))
(export "mul" (func 2))
(elem (;0;) (i32.const 1) func)
)

Function & Code Section

As mentioned above, the function information defined in the module is divided into two sections: the signature information of the function is in the type section, and the local variable information and bytecode of the function are in the code section. If optimization is turned off completely, then there should be a direct correspondence between the functions defined in the AssemblyScript language and the functions in the Wasm module. That is to say, each function defined in the language will produce an entry in the function section and code section of the module. Let's look at an example:

function add(a: i32, b: i32): i32 { return a + b; }
function sub(a: i32, b: i32): i32 { return a - b; }
function mul(a: i32, b: i32): i32 { return a * b; }
export function main(): void {
  add(1, 2);
  sub(1, 2);
  mul(1, 2);
}

When compiler optimization is turned on, such correspondence may be broken when it comes to non-exportedinternal functions. For better observation, we can specify the -O0option to turn off optimization during compilation. The following is the compiled module (which has been converted to text format):

(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func))
(func (;0;) (type 0) (; unrelated code ;) )
(func (;1;) (type 0) (; unrelated code ;) )
(func (;2;) (type 0) (; unrelated code ;) )
(func (;3;) (type 1) (; unrelated code ;) )
(table (;0;) 1 funcref)
(memory (;0;) 0)
(export "memory" (memory 0))
(export "main" (func 3))
(elem (;0;) (i32.const 1) func)
)

It is more intuitive to observe the function section and code section with the wasm-objdumpcommand. The following is the output result (irrelevant content is omitted):

...
Section Details:

Type[2]:
- type[0] (i32, i32) -> i32
- type[1] () -> nil
Function[4]:
- func[0] sig=0
- func[1] sig=0
- func[2] sig=0
- func[3] sig=1 <main>
Table[1]: ...
Memory[1]: ...
Export[2]: ...
Elem[1]: ...
Code[4]:
- func[0] size=7
- func[1] size=7
- func[2] size=7
- func[3] size=23 <main>
Custom: ...

Table & Element Section

The table in Wasm are mainly for implementing function pointers in C/C++ and other languages. Both AssemblyScript language and JavaScript/TypeScript language support first-class functions, which is also achieved through the Wasm table. Here is an example:

typeop=(a: i32, b: i32) =>i32;

functionadd(a: i32, b: i32): i32{ returna+b; }
functionsub(a: i32, b: i32): i32{ returna-b; }
functionmul(a: i32, b: i32): i32{ returna*b; }

exportfunctioncalc(a: i32, b: i32, op: (x:i32, y:i32) =>i32): i32{
 returnop(a, b);
}
exportfunctionmain(a: i32, b: i32): void{
 calc(a, b, add);
 calc(a, b, sub);
 calc(a, b, mul);
}

The following is the compiled module (which has been converted to text format). Please pay attention to the table section and element section. We will talk more about Wasm table in subsequent articles.

(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func (param i32 i32)))
(type (;2;) (func (param i32 i32 i32) (result i32)))
(func (;0;) (type 2) (param i32 i32 i32) (result i32)
  (call_indirect (type 0)
    (local.get 0) (local.get 1)
    (block (result i32) ;; label = @1
      (global.set 0 (i32.const 2))
      (local.get 2)
    )
  )
)
(func (;1;) (type 0) (; unrelated code ;) )
(func (;2;) (type 0) (; unrelated code ;) )
(func (;3;) (type 0) (; unrelated code ;) )
(func (;4;) (type 1) (param i32 i32)
  (drop (call 0 (local.get 0) (local.get 1) (i32.const 1)))
  (drop (call 0 (local.get 0) (local.get 1) (i32.const 2)))
  (drop (call 0 (local.get 0) (local.get 1) (i32.const 3)))
)
(table (;0;) 4 funcref)
(memory (;0;) 0)
(global (;0;) (mut i32) (i32.const 0))
(export "memory" (memory 0))
(export "calc" (func 0))
(export "main" (func 4))
(elem (;0;) (i32.const 1) func 1 2 3)
)

Memory & Data Section

We will discuss AssemblyScript memory management in detail subsequent articles. Let's first look at a simple example:

declarefunctionprintChar(c: i32): void;

exportfunctionmain(): void{
 conststr="Hello, World!\n";
 for(leti=0; i<str.length; i++) {
   printChar(str.charCodeAt(i));
}
}

UTF-16 encoding is adopted in AssemblyScript strings, and string literals are placed in the data section. The following is the compiled module (with compiler optimization on). Please pay attention to the memory section and data section:

(module
(type (;0;) (func))
(type (;1;) (func (param i32)))
(import "index" "printChar" (func (;0;) (type 1)))
(func (;1;) (type 0) (; unrelated code ;) )
(memory (;0;) 1)
(export "memory" (memory 0))
(export "main" (func 1))
(data (;0;) (i32.const 1024) "\1c\00\00\00\01\00\00\00\01\00\00\00\1c\00\00\00H\00e\00l\00l\00o\00,\00 \00W\00o\00r\00l\00d\00!\00\0a")
)

The AssemblyScript compiler also provides two options, respectively --initialMemory and --maximumMemory, allowing us to explicitly control the initial and maximum number of pages of memory. We are not going into details here.

Global Section

As can be seen from the above, the AssemblyScript language uses Wasm global variables to implement global variables in the language. When the compiler optimization is turned off completely, each global variable defined in the AssemblyScript language will occupy an item in the global section of the generated module. Let's look at an example:

varg1: i32=100;
exportvarg2: i32=200;
exportvarg3: i64=300;
exportconstpi: f32=3.14;

exportfunctionmain(): i32{
 returng1;
}

The following is the compiled module (with compiler optimization off). As you can see, all the four global variables appear in the global section:

(module
(type (;0;) (func (result i32)))
(func (;0;) (type 0) (result i32) (global.get 0))
(table (;0;) 1 funcref)
(memory (;0;) 0)
(global (;0;) (mut i32) (i32.const 100))
(global (;1;) (mut i32) (i32.const 200))
(global (;2;) (mut i64) (i64.const 300))
(global (;3;) f32 (f32.const 0x1.91eb86p+1 (;=3.14;)))
(export "memory" (memory 0))
(export "g2" (global 1))
(export "g3" (global 2))
(export "pi" (global 3))
(export "main" (func 0))
(elem (;0;) (i32.const 1) func)
)

Start Section

The start section is to specify a start function index. The specified function will be automatically invoked after the module is instantiated, so as to perform some additional initialization. Here is an example:

declarefunctionmax(a: i32, b: i32): i32;
declarefunctionprintI32(n: i32): void;

varx=max(123, 456);

exportfunctionmain(): void{
 printI32(x);
}

In this example two external functions are declared and a global variable xand a function main()are defined. The AssemblyScript compiler needs to put the initialization logic of the global variable xinto a function and the index of the function in the initial section. Please see the compiled module below (the index of the start function is 4):

(module
(type (;0;) (func))
(type (;1;) (func (param i32)))
(type (;2;) (func (param i32 i32) (result i32)))
(import "index" "max" (func (;0;) (type 2)))
(import "index" "printI32" (func (;1;) (type 1)))
(func (;2;) (type 0)
  (global.set 0 (call 0 (i32.const 123) (i32.const 456)))
)
(func (;3;) (type 0) (call 1 (global.get 0)))
(func (;4;) (type 0) (call 2))
(table (;0;) 1 funcref)
(memory (;0;) 0)
(global (;0;) (mut i32) (i32.const 0))
(export "memory" (memory 0))
(export "main" (func 3))
(start 4)
(elem (;0;) (i32.const 1) func)
)

Custom Section

As mentioned above, the custom section mainly stores additional information, such as function names and other debugging information. The Wasm specification only defines one standard "name" custom section, which is specifically designed to store name information. By default, the AssemblyScript compiler does not generate a "name" custom section, but it can be enabled through the --debugoption. Let's add the --debugoption to recompile the above example, and observe the generated binary module through the wasm-objdumpcommand (some irrelevant content is omitted):

...
Section Details:

Type[3]:
- type[0] () -> nil
- type[1] (i32) -> nil
- type[2] (i32, i32) -> i32
Import[2]:
- func[0] sig=2 <assembly/index/max> <- index.max
- func[1] sig=1 <assembly/index/printI32> <- index.printI32
Function[3]:
- func[2] sig=0 <start:assembly/index>
- func[3] sig=0 <assembly/index/main>
- func[4] sig=0 <~start>
Table[1]: ...
Memory[1]: ...
Global[1]:
- global[0] i32 mutable=1 - init i32=0
Export[2]: ...
Start:
- start function: 4
Elem[1]: ...
Code[3]:
- func[2] size=12 <start:assembly/index>
- func[3] size=6 <assembly/index/main>
- func[4] size=4 <~start>
Custom:
- name: "name"
- func[0] <assembly/index/max>
- func[1] <assembly/index/printI32>
- func[2] <start:assembly/index>
- func[3] <assembly/index/main>
- func[4] <~start>
Custom:
- name: "sourceMappingURL"

Summary

In this article, we have discussed how AssemblyScript programs are compiled into Wasm modules, focusing on what information is stored in the various sections of the Wasm module. Next time we will discuss how the AssemblyScript language uses the Wasm instruction set to implement various grammatical elements.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!