Metacza: Syntax
File |
※ |
Input files of Metacza are always encoded as UTF-8, UTF-16 or UTF-32. Both little endian and big endian are supported.
Output files are always UTF-8 encoded. All output streams (in particular, the error stream) are also taken to be UTF-8 encoded.
A Metacza file always starts with #!, possibly preceded by an optional Unicode BOM (U+FEFF) in either UTF-8, UTF-16 or UTF-32 encoding. From this, the input encoding is inferred.
The first line must contain the string metacza. A valid first line would be:
#! this is ignored metacza
Command line options may follow the metacza keyword on the first line:
#! metacza --namespace superlib
Comments are C++ style line comments, i.e., they start with //:
// This is a comment
The body of a file is a sequence of statements.
The interpretation of the file can be ended before the actual end of the file by using a special token:
__END__
Statements |
※ |
Prototypes |
※ |
f(x,y,z); g(a,b...);
Defaults may be given:
int2str(n, base = 10, len = 1)
As an abbreviation, classifications may be used instead of identifiers in the argument list. In that case, var is optional.
f(x: raw(int), y: _(_), const z, const w: _(_));
This is equivalent to:
let var x: raw(int) var y: _(_) const z const w: _(_) in f(x,y,z,w)
If the kind is a pack, i.e., it ends in ..., then the variable will be expanded using ..., too. So the
data string(ch: raw(char)...)
This is equivalent to:
let var ch: raw(char)... in data string(ch...)
Function Definition |
※ |
Functions may be defined using multiple clauses that may be distributed even among different files.
fib(n) = fib(n-1) + fib(n-2); fib(0) = 0; fib(1) = 1;
The first use of a function is taken to be prototypical, the following are pattern matches. The prototypical usage can be declared outside the current file by using a const declaration.
Each function definition clause may be optionally preceded by a let block. This can be used for local classifications of the parameters, but definitions are also allowed and will be local to the function.
let const a: _; in f(a) = 5;
Like with prototypes, function heads may contain embedded classifications.
f(const red) = green g(car(x: _(_))) = true
This is equivalent to:
let const red in f(red) = green let var x: _(_) in g(car(x)) = true
Data Definition |
※ |
Data definitions look like prototypes with data prefixed.
data red; data complex(_,_);
Inheritance can be specified using a colon:
data foo; data bar : foo;
A let block is possible for local definitions (e.g. to add a tag):
let tag = 5 in data myint(_);
Preprocessor Statements |
※ |
The following preprocessor statements are supported:
#if ... #ifdef ... #ifndef ... #else #elif ... #endif #define ... #undef ... #include ... #warning ... #error ... #line ... #pragma ...
Evaluate Once Statement |
※ |
The evaluate once statement is only allowed on top-level. All statements following it until the end of the file will be protected.
pragma once;
Namespace Statements |
※ |
Metacza supports the following namespace related statements with the same syntax as C++:
namespace myScope { STMT... } namespace newName = oldScope::oldName; using namespace myScope;
Further, Metacza supports some extended syntax:
namespace someScope::myScope { STMT... } namespace someScope::newName = oldScope::oldName;
Assertions |
※ |
Assertions look similar to C, but take an additional argument for the failure message string.
assert(EXPR, "Some Failure Message");
As an extension, the message may be left out. In C++, a generic assertion failure message will be generated.
assert(EXPR);
Classification |
※ |
A classification starts with either var or const followed by an identifier. Then an optional colon and a kind specificaion follow. It is terminated with a semicolon.
var a; const b; var c: _; const d: _(_);
Raw C++ Code |
※ |
Raw C++ code that will be copied to the output file unmodified can be put between %{ and %}. Metacza keeps track of string and character constants and line comments in the raw C++ code so that a closing %} may be used there without closing the Metacza statement. Note that C style comments are not supported even in raw C++ code and will cause an error.
%{ static int const value = 10; %}
Expressions |
※ |
Expressions generally follow C++ syntax, but a few things are a little different. This mainly concerns handling of operator precedence (I personally hate this in C/C++/Perl), the if-else construction, and let blocks.
Operators are translated to invocations of meta template functions just like any funcall. This section also lists the template names that are used when compiling in Boost.MPL mode. In those cases where this differs from native Metacza mode, the differences are listed.
In raw mode (i.e., for expressions marked with a raw() functor), all operators are translated to plain C/C++ syntax.
Constant Literals |
※ |
Metacza's basic literals are booleans, integers, and strings.
Integer literals are always unsigned: a preceding minus is parsed as a prefix operator and not part of the literal.
Integer literals may be given in decimal, octal, hexadecimal and binary notation and may contain underbar characters for improving readability (just like in Perl).
Strings come in three flavours, depending on their desired encoding in the output file: UTF-8, UTF-16, and UTF-32. They follow C++11 syntax. Strings without prefix are UTF-8 encoded just like strings with u8 prefix.
// Expression // Boost MPL // native (if different) /////////////////////////////////////////////////////////////////////////////// true // true_ false // false_ 10 // int_<10> PInt<10> 50 // int_<50> ... 2_542 // int_<2542> 0x100 // int_<256> 0b1010_1011_1111 // int_<2751> 0777 // int_<511> "hello" // string<'hell','o'> string<'h','e','l','l','o'> u8"hello" // string<'hell','o'> string<'h','e','l','l','o'> u"ab" // vector<char16_t,u'a',u'b'> string16<u'a',u'b'> U"ab" // vector<char32_t,U'a',U'b'> string32<U'a',U'b'>
Identifiers |
※ |
Metacza identifiers have the same syntax as in C++. Identifiers must not contain more than one consecutive underbars.
foo bar10 x123_456
The special identifier _ is always a new anonymous identifier.
_ = print(5);
Funcalls |
※ |
EXPR ( EXPR, ..., EXPR )
Funcall Like Operators |
※ |
print(EXPR) raw(EXPR)
Unary Prefix Operator |
※ |
// Expression // Translated As //////////////////////////////////////////// + EXPR // identity - EXPR // negate * EXPR // treated specially: unlambda ! EXPR // not_ ~ EXPR // bitxor_<~0,...>
Binary Infix Operators |
※ |
// Expression // Translated As //////////////////////////////////////////////////////// EXPR + EXPR // plus EXPR - EXPR // minus EXPR * EXPR // times EXPR / EXPR // divides EXPR % EXPR // modulus EXPR << EXPR // shift_left EXPR >> EXPR // shift_right EXPR == EXPR // equal_to EXPR != EXPR // not_equal_to EXPR < EXPR // less EXPR > EXPR // greater EXPR <= EXPR // less_equal EXPR >= EXPR // greater_equal EXPR & EXPR // bitand_ EXPR | EXPR // bitor_ EXPR ^ EXPR // xor_ EXPR && EXPR // and_ EXPR || EXPR // or_
Unary Suffix Operators |
※ |
EXPR...
As an abbreviation, the following expressions are equivalent:
_... ...
Lambda Expressions |
※ |
{ EXPR } { (x,y) = EXPR } // with a parameter list { (x...) = EXPR } // and many others. { let STMT... in (x,y,z) = EXPR } // with local definitions and params
Special Expressions |
※ |
( EXPR ) let STMT... in EXPR EXPR if EXPR else EXPR // translated as eval_if (in raw mode: ?:)
Operator Precedence |
※ |
In general, any sequence of operators in Metacza needs to be disambiguated by parentheses. This means that operator precedence usually is not used or needed, because Metacza forces you to use parentheses anyway.
In a few cases, precedence is exploited and parentheses are not needed. This section lists these cases.
Sequences of summation operators + and - may be used without parentheses. This includes both prefix and infix operators:
-1 + 5 + -8 - +9
Sequences of communative, associative operators may be used without parentheses.
1 * 2 * 3 * 4 * 5 1 | 2 | 4 | 8 | 16 1 ^ 2 ^ 3 ^ 4 ^ 5 1 & 3 & 7 & 15 (foo == 10) && !bar && true (foo == 10) || !bar || false
Funcalls in the same expression as prefix operators may be used without parentheses. The funcall takes precedence:
+f(5) // same as +(f(5)) (NOT: (+f)(5))
Seqences of if expressions may be used without parentheses.
a if x else b if y else c
The condition in an if expression needs no parentheses.
a if n == 1 else b
Further, syntax not part of expressions (but looking like an operator) may be used together with any operator without parentheses.
f = 5 + x; // no need for parens here: = is not an expression operator
Apart from that, all uses of multiple operators must use parentheses.
a + b * c // ERROR a + (b * c) // GOOD (a + b) * c // GOOD a && b || c // ERROR a && (b || c) // GOOD (a && b) || c // GOOD let a = b in a + b // ERROR let a = b in (a + b) // GOOD (let a = b in a) + b // GOOD a + b if c != 10 else d // ERROR a + (b if c != 10 else d) // GOOD (a + b) if c != 10 else d // GOOD -a * b // ERROR -(a * b) // GOOD (-a) * b // GOOD +a... // ERROR +(a...) // GOOD (+a)... // GOOD syntax, but error otherwise +a if c else -a // ERROR (+a) if c else -a // GOOD +(a if c else -a) // GOOD
Kinds |
※ |
_ // some value (will become 'typename' in C++) _() // nullary function _(_) // unary function _(_...) // function w/ arbitrarily many arguments _(_,_) // binary function // other functions work accordingly raw(int) // C++ type 'int' // other C++ types work accordingly, but currently, // only (qualified) identifiers are allowed in raw().
As an abbreviation, the following kinds are equivalent:
_... ...
Layout Rules |
※ |
Metacza uses a mild set of layout rules so that semicolons separating statements are usually not needed.
For statements, you may drop the semicolon and start a fresh new statement without the separator, if the token starting the new statement is
- the first non-space or comment token on the line, and
- it's located further left on the line than the previous reference column
A reference column is defined by a previous statement: if a statement is the first statement on that line, its first column is the new reference column.
stmt1(x)
|
|____ reference column
With several statements, only the first one on a line defines the reference column:
stmt1(x) ; stmt2(x) // only the first stmt on a line defines the reference column
|
|___ reference column
By these rules, the following few lines are well-formed; the semicolon usually ending a statement can be dropped when the next statement starts at the reference column.
stmt1(x) // no semicolon needed because
stmt2(x) // this starts on the same column
stmt3(x); stmt4(x) // no-one prevents you from using semicolons
stmt5(x); stmt6(x) // after stmt4, none is needed: stmt5 is further left
|
|___ reference column
This rule does have some consequences: expression parsing stops at tokens where a semicolon may be dropped. This means that suffix and infix operators that are the first non-space or comment token on the line, must be further right than the start of the statement, otherwise they are not recognised as part of the expression.
f(x) = x
+ 5 // ERROR: + is not further right than reference column
f(x) = x
+ 5 // OK: strictly right of reference column
|
|___ reference column
Note that if some structure must be parsed as the next token, then it will be parsed regardless of the layout rule:
f(x) = x + 5 // OK, since after +, there must be an expression
The reference column is set recursively, i.e., inner blocks set the reference column only locally. When the inner block closes, the reference column before the block is active again.
f(x) = | |___ outer reference column let y = 5 | |___ inner reference column in | |___ outer reference column (x + y)