Cigma is an experimental programming language designed to take advantage of modern C++ within a simplified and redesigned syntax. Cigma programs can be more concise and (once learned) easier to read than equivalent C++ programs, while remaining fully interoperable with existing C++ code and toolchains.
The Developer Guide explains how the Cigma programming language is implemented and is meant for people wishing to contribute to the project, refer to the User Guide to learn about the language itself. |
Compilation
This project uses CMake as its build system. At least version 3.3 is required (however running the tests requires C++20 support, which is only available from 3.12). ¶
The reference implementation of the Cigma language is written in C++11 and a compatible compiler is required. GCC, Clang and Visual Studio are known to work. ¶
The flex
lexer generator is required to build the reference parser.
¶
The reference implementation of the language uses the bison
parser generator. Bison 3 or newer is required. When building on Windows a precompiled version of flex and bison will be downloaded automatically.
¶
The documentation is generated using docra
. This tool and AsciiDoctor are required to generate the document you are currently reading.
¶
Once you have all the prerequisites, you can launch the build system.
Linux
-
Create a build directory
mkdir -p build/dir && cd "$_"
-
Configure the build system in it
cmake /path/to/source/cigma
-
Build!
make
Windows
Visual Studio, as of version 15.5, is capable of opening and configuring CMake projects directly, just open the source code folder using ¶
then do once configured.Targets
The build system provides the following targets:
- bison_check
-
Reports errors or conflicts in the reference Bison grammar. ¶
- bison_sync
-
Uses
yppsync
to update helper code within the reference Bison grammar. ¶ - bison_unsync
-
Uses
yppsync
to remove helper code from the reference Bison grammar. ¶ - boot_ast
-
Library containing the reference implementation of the Cigma grammar. ¶
- boot_cpp
-
Bootstrap library to translate Cigma ASTs into compilable C++ code. ¶
- boot_dot
-
Bootstrap library to print Cigma ASTs in Graphviz DOT format. ¶
- boot_yml
-
Bootstrap library to print Cigma ASTs in YAML format. ¶
- install
-
Default CMake installation target. ¶
- uninstall
-
Remove files copied by the install target. ¶
- yppsync
-
This is an internal development tool used to simplify the implementation of the reference Bison parser. ¶
Bootstrapped implementation
The Cigma programming language is designed to map very closely to modern C++. For this reason, the simplest way to compile Cigma programs is to translate them into C++ code first, and then compile the result using any C++ toolchain.
The first implementation of such a translator is being written in C++11, and will be used to bootstrap a second translator, written fully in Cigma. This section describes the implementation of this first-level translator.
Grammar
One of the explicit design goals of Cigma is to have a regular grammar without ambiguities (much unlike C++, that is highly non-regular and notoriously difficult to parse), all while retaining the expressive power of modern C++. After a lot of iteration, the current Cigma grammar manages to express C++20 programs using a much simpler (and, arguably, more readable) LALR(1) grammar. This section describes the current implementation of the grammar, built using the classical combination of Bison and Flex.
Parser
The Bison compiler-compiler is able to generate parsers for different kinds of grammars, the simplest ones being LALR(1). LALR(1) grammars require no backtracking and use only a single lookahead token in order to decide the next grammar rule to follow. These limitations produce fast parsers but impose lots of constraints on the design of the language, which delayed the development of the Cigma language for a long time.
C++ translation
Initialization
Initialization in C++ is unfortunately a very complex topic, there exist tens of possible initialization syntaxes, each with its own issues. Despite long and ongoing modernization efforts there still isn’t a generic, concise and consistent set of rules for C++ initialization. When mapping (unambiguous) Cigma initialization syntax to (ambiguous) C++ initialization syntax, the translator implements the following guidelines. ¶
Guidelines for direct initialization
Direct initialization uses <(…)
syntax and invokes constructors, including constructors on predefined types ( bool
, int
, …).
Zero arguments
-
Should translate to
T t = T()
if both name and Type are available at the callsite. -
Should translate to
T()
if only the Type is available. -
Should translate to
t()
if only the name is available.
-
(1) does not suffer from most vexing parse, unlike
T t()
-
(1) is also used by the STL to always enforce direct default initialization regardless of the Type
One argument
-
Should translate to
T t(arg)
if both name and Type are available at the callsite. -
Should translate to
T(arg)
if only the Type is available. -
Should translate to
t(arg)
if only the name is available (implicit initialization must be allowed).
-
(1) does not suffer from most vexing parse, unlike
T t(arg)
-
Works with predefined types (
int a({42})
raises a warning) -
(1) does not accidentally create an
std::initializer_list
(in particular ifT
isauto
, since C++-11 with current compilers)
This form does allow narrowing conversions. Uniform initialization would have prevented narrowing conversions but at the same time it makes it easy to call the std::initializer_list constructor by accident. Using this form, something like :/std:list[:!int]<(10) will have the expected result (create a list of ten elements), instead of accidentally creating a list containing a single element with value ten, the second meaning can be unambiguously expressed with aggregate initialization, like :/std:list[:!int]<{10} .
|
More arguments
-
Should translate to
T t(arg1, arg2, …)
if both name and Type are available at the callsite. -
Should translate to
T(arg1, arg2, …)
if only the Type is available. -
Should translate to
t(arg1, arg2, …)
if only the name is available (implicit initialization must be allowed).
-
Prevents accidentally calling
std::initializer_list
constructors (for example instd::vector
)
This form was chosen to avoid ambiguities for the same reason as the case above, however this allows narrowing conversions in constructor calls. ¶ |
Guidelines for aggregate initialization
Aggregate initialization uses <{…}
syntax and initializes individual fields in user-defined aggregates (or calls the aggregate constructor if available).
Zero arguments
-
Should translate to
T t{}
if both name and Type are available at the callsite. -
Should translate to
T{}
if only the Type is available. -
Should translate to
t{}
if only the name is available.
-
Applies to enums as well as other user-defined types (if
T
is a template parameter<()
should be preferred as it default initializes both aggregates and non-aggregates)
One argument
-
Should translate to
T t = {arg}
if both name and Type are available at the callsite. -
Should translate to
T{arg}
if only the Type is available (and it is notauto
). -
Should translate to
t{arg}
if only the name is available.
-
Equals sign in (1) is required when
T
isauto
to consistently obtain anstd::initializer_list
More arguments
-
Should translate to
T t = {arg1, arg2, …}
-
Should translate to
T{arg1, arg2, …}
if only the Type is available (and it is notauto
) -
Should translate to
t{arg1, arg2, …}
if only the name is available
-
Equals sign in (1) is required when
T
isauto
to generate anstd::initializer_list
-
Equals sign not required anymore for templified containers like
std::array
if using C++-11 post CWG 1270 or C++-14 and beyond ¶
Guidelines for copy initialization
Save for the exceptions mentioned above, the translator should never emit copy initializers (using an equals sign) in its C++ output, and always favour uniform initialization instead. ¶
Implementation of initialization guidelines
Depending on the surrounding context, the guidelines just described result in the following behaviour. ¶
Declaration group
A ast::DeclGroup
is used wherever multiple names can be declared with the same Type (and possibly the same initializer), typical examples are function arguments, members and local variables. Within this context, the following initializers can be found:
¶
ast::ExprCtor
-
Performs explicit direct initialization. Since the Type name is known at the callsite we the follow the above guidelines directly. (TODO) ¶
ast::ExprInitList
-
Performs explicit aggregate initialization with positional (not designated) initializers. Here we can also follow the guidelines directly. This is encountered for example when initializing an array. ¶
ast::ExprsByMember
-
Performs explicit aggregate initialization like above, but with designated initializers (only available since C++20). (TODO) ¶
ast::Expr
-
The Type is being initialized with an equals sign only, according to the guidelines a single-argument list initializer should be emitted. There is one exception when setting default values for function and template arguments, where braces are not supported in all compilers, and another when the Type is
auto
, as that generates anstd::initializer_list
.=
to work around MSVC bug https://developercommunity.visualstudio.com/content/problem/1232251/error-in-brace-initialization-of-pointer-to-functi.html ¶const int X = {42}
syntax so we must omit the brackets there as well (although that’s fine for GCC). ¶
Constructor declaration
A ast::DeclCtor
is used to declare constructors. Such a declaration may contain initializers that are used to initialize base classes or class members. Within this context, the following initializers can be found:
¶
ast::ExprCtor
-
Should emit a delegated constructor. (TODO) ¶
ast::ExprsByPos
-
Initializes class members by position (not by name). Translating this syntax requires iterating over the struct or class members. (TODO) ¶
ast::ExprsByMember
-
Initializes class members by name. Since the name is available, the initialization guidelines can be followed directly. (TODO) ¶
ast::ExprsByBase
-
Initializes base classes (this is basically a list of
ast::ExprInitType
). (TODO) ¶ ast::ExprInitType
-
Initializes a single base class. This stage is reached via recursion from the previous case. (TODO) ¶
ast::Expr
-
This combination is probably impossible in the current grammar and should raise an error. (TODO) ¶
Type initialization
A ast::ExprInitType
is an initializer that explicitly describes the name of the Type being constructed (as a ast::TypePath
) in addition to the expression (or expressions) to initialize it with. In this context, the initializer is used in expressions and not as part of a base class initializer.
¶
ast::ExprCtor
-
Calls the constructor for the given Type. The result can be used for direct initialization. The guidelines still apply given that the Type name is known. (TODO) ¶
ast::ExprInitList
-
Performs explicit list initialization using a braced-init-list according to the guidelines. (TODO) ¶
ast::ExprsByMember
-
This could be used to explicitly initialize a temporary using designated initializers. (TODO) ¶
ast::Expr
-
Explicitly call constructor with a single argument, as per the guidelines. (TODO) ¶
Positional expression group
A ast::ExprsByPos
is not an initializer per se, but it may be found by recursion when visiting other kinds of initializers. From the guidelins, this should always be translated into a braced-init-list.
¶
Member-wise expression group
The group ast::ExprsByMember
is of course used to initialize members by name. As such, when emitting C++ initializers for it it is just necessary to recurse into the appropriate initializer generator for each member.
¶
Anything else
TODO ¶
Self-hosted implementation
Once the bootstrapped translator will be able to understand and correctly translate to C++ a large enough subset of the Cigma programming language, a self-hosted translator will be implemented in Cigma itself.
Appendix A: Bootstrap AST
ast::Visitor
Base class to create AST visitors. ¶
struct Visitor {...};
ast::Visitor::visit
Methods to reimplement to visit specific kinds of AST nodes. ¶
virtual void visit(Node *, AttrPtr &n) {}
virtual void visit(Node *, BlockPtr &n);
virtual void visit(Node *, DeclPtr &n);
virtual void visit(Node *, ExprPtr &n);
virtual void visit(FilePtr &);
virtual void visit(Node *, FlowPtr &n);
virtual void visit(Node *, IdPtr &n) {}
void visit(Node*, NodePtr &);
This method is not meant to be overridden. ¶
virtual void visit(Node *, PatternPtr &n);
virtual void visit(Node *, TokenPtr &n);
virtual void visit(Node *, TypePtr &n);
Appendix C: Grammar rules
constraint
:
"::"
concept_suffix
template_args
[1]
|
"::"
"!"
concept_suffix
template_args
|
"::"
"["
expr_paren
"]"
[2]
|
"::"
"!"
"["
expr_paren
"]"
|
"::"
decl_constraint_args_opt
"{"
constraint_statements
list_end
"}"
|
"::"
"!"
decl_constraint_args_opt
"{"
constraint_statements
list_end
"}"
;
[1] Named constraint specialization ¶
[2] Compile-time expression constraint (must evaluate to true
)
¶
decl
:
access
decl_template_args_opt
constraints_opt
decl_group
|
access
decl_bind
[1]
|
decl_use
|
decl_label
|
access
decl_template_args_opt
constraints_opt
decl_type
|
access
"|"
type
type_enum
|
access
decl_template_args_opt
constraints_opt
decl_nspace
|
access
decl_template_args_opt
decl_concept
|
access
decl_template_args_opt
constraints_opt
decl_oper
block_opt
|
access
decl_template_args_opt
constraints_opt
decl_ctor
block_opt
|
access
decl_template_args_opt
constraints_opt
decl_dtor
block_opt
|
access
decl_template_args_opt
constraints_opt
decl_guide
;
[1] structured binding ¶
decl_template_arg
:
type_id
concept_path_opt
|
type_id
concept_path_opt
"="
type_noauto
|
decl_template_args
decl_template_arg_variadic_type
|
":"
NOID
concept_path_opt
[1]
|
":"
NOID
concept_path_opt
"="
type_noauto
[2]
|
decl_template_arg_variadic_type
|
decl_group
|
decl_names
concept_path
|
decl_names
concept_path
expr_init
|
decl_template_args
type_id
concept_path_opt
|
decl_template_args
type_id
concept_path_opt
"="
type_noauto
;
[1] anonymous type ¶
[2] anonymous default argument ¶
decl_template_args
[1] fully specialized templates have nothing here ¶
decl_type
:
type_path
attrs_opt
"="
type
|
type_path
attrs_opt
"="
"|"
type
type_enum
[1]
|
type_path
attrs_opt
"|"
type
type_enum
[2]
|
type_path
attrs_opt
type_struct
|
type_path
attrs_opt
"|"
type_struct
[3]
|
type_path
attrs_opt
"?"
type_struct
[4]
|
type_path
attrs_opt
;
[1] Unscoped enum ¶
[2] Scoped enum ¶
[3] Union ¶
[4] Typeclass ¶
expr_delete
:
DELETE
expr_primary
|
DELETE
"."
"("
")"
expr_primary
;
Access to tagged union member (not yet implemented)
¶
::std::get_if<$expr>(& $expr_postfix)
¶
::std::get_if<$type>(& $expr_postfix)
¶
get<$expr>($expr_postfix)
¶
get<$type>($expr_postfix)
¶
std::any_cast<$type>(& $expr)
¶
std::any_cast<$type>($expr)
¶
dynamic_cast<$type>($expr)
¶
conditional expression
¶
bind subexpression to a local name
¶
std::bind
¶
std::bind_front
¶
/std/placeholders _N
¶
Define lambda and call it in place
¶
Compile-time version of the above
¶
dynamic_cast
¶
static_cast
¶
/std:is_same[:A, :B] value
¶
!/std:is_same[:A, :B] value
¶
Used as don’t care element in /std tie
¶
Used as don’t care element in /std tie
¶
Simplified if
¶
Simplified else
¶
Simplified else if
¶
Simplified switch
¶
Match range
¶
Match range
¶
Match expression
¶
Match via extractor
¶
Dereference previous match
¶
Match if convertible to true
¶
Match variant by type
¶
Match variant by index
¶
Match via any_cast
¶
Match via dynamic_cast
¶
Match tagged union member
¶
Match members by name
¶
Match name
¶
Match anything
¶
Nested switch
¶
Bind to local variable
¶
Structured binding
¶
Capture by-copy by default
¶
Capture by-reference by default
¶
Capture by-copy
¶
Capture by-copy of pack expansion
¶
Capture by-copy with initializer
¶
Capture by-reference
¶
Capture by-reference of pack expansion
¶
Capture by-reference with initializer
¶
Capture by-copy of current object
¶
Capture by-reference of current object
¶
designated initialization
¶
in-place initialization
¶
expr_fold
:
"("
expr
oper_fold_right
")"
[1]
|
"("
oper_fold_left
expr
")"
[2]
|
"("
expr
oper_fold_right
expr
")"
[3]
|
"("
expr
oper_fold_left
expr
")"
[4]
;
[1] unary right fold ¶
[2] unary left fold ¶
[3] binary right fold ¶
[4] binary left fold ¶
expr_primary
:
expr_literal
|
expr_literal
ID
|
expr_range
|
expr_fold
|
expr_block
|
expr_if
|
expr_switch
|
expr_new
|
expr_delete
|
expr_call
|
expr_sizeof
|
expr_alignof
|
name_path
|
expr_typeid
|
expr_compiletime
|
member_prefix
"."
overload_id
template_args_opt
|
THIS
|
"<"
type
"{"
exprs
"}"
[1]
|
type_path
"&"
member_id
[2]
|
expr_paren
|
expr_init_ctor
|
expr_init_list
;
[1] std::initializer_list ¶
[2] pointer to member ¶
exprs_byindex
:
exprs_byindex
list_sep
expr
|
exprs_byindex
list_sep
"."
expr_paren
"="
expr
|
exprs_byindex
list_sep
"."
expr_range
"="
expr
|
"."
expr_paren
"="
expr
|
"."
expr_range
"="
expr
|
"."
expr_compiletime
"="
expr
[1]
|
"."
"["
type
"]"
"="
expr
[2]
;
[1] /std in_place_index
¶
[2] /std in_place_type
¶
overload_id
:
OV_XOR
|
OV_NEG
|
OV_MOD_EQ
|
OV_ADD
|
OV_ADD_EQ
|
OV_SUB
|
OV_SUB_EQ
|
OV_OR
|
OV_OR_EQ
|
OV_AND
|
OV_AND_EQ
[1]
|
OV_LOR
|
OV_INC
|
OV_LAND
|
OV_LSH
|
OV_LSH_EQ
|
OV_RSH
|
OV_RSH_EQ
|
OV_LT
|
OV_LE
|
OV_GT
|
OV_GE
|
OV_EQ
|
OV_DEC
|
OV_NE
|
OV_LEG
|
OV_AWAIT
|
OV_USER
|
OV_ASSIGN
|
OV_MUL
|
OV_MUL_EQ
|
OV_DIV
|
OV_DIV_EQ
|
OV_MOD
;
[1]
specop_id
:
OV_PTR
|
"("
decl_proc_args_opt
")"
type_qualifiers_opt
[1]
|
NEW
|
NEW
"."
"("
")"
|
DELETE
|
DELETE
"."
"("
")"
|
"["
decl_proc_args_opt
"]"
type_qualifiers_opt
[2]
|
"["
"["
decl_proc_args_opt
"]"
"]"
type_qualifiers_opt
[3]
|
"("
decl_proc_args_opt
")"
type_qualifiers_opt
"->"
[4]
|
"["
decl_proc_args_opt
"]"
type_qualifiers_opt
"->"
[5]
|
"["
"["
decl_proc_args_opt
"]"
"]"
type_qualifiers_opt
"->"
[6]
|
"("
type
")"
[7]
|
"["
type
"]"
[8]
|
"["
"["
type
"]"
"]"
[9]
;
[1] subscript operator ¶
[2] subscript operator (constexpr) ¶
[3] subscript operator (consteval) ¶
[4] call operator ¶
[5] call operator (constexpr) ¶
[6] call operator (consteval) ¶
[7] cast operator ¶
[8] cast operator (constexpr) ¶
[9] cast operator (consteval) ¶
start
:
CIGMA_START
cigma
CIGMA_END
|
CIGMA_ATTR
attr
|
CIGMA_EXPRS
exprs
|
CIGMA_IDS
nspace_path
|
CIGMA_IDS
type_path
|
CIGMA_IDS
member_path
|
CIGMA_FLOW
flow
|
CIGMA_TYPE
type
|
CIGMA_END
|
LEXED_START
buffer
LEXED_END
|
MACRO_START
macro
MACRO_END
|
CIGMA_ATTRS
attrs
|
CIGMA_BLOCK
block
|
CIGMA_CONSTRAINT
constraint
|
CIGMA_CONSTRAINTS
constraints
|
CIGMA_DECL
decl
|
CIGMA_DECLS
decls_global
list_end
|
CIGMA_VAR
decl_var
|
CIGMA_EXPR
expr
;
type_arg_direction_opt
[1] Input argument ¶
[2] Output argument ¶
[3] In-out argument ¶
[4] Forward argument ¶
type_novariadic
:
":"
|
":"
type_qualifiers
|
type_noauto
|
":"
"|"
type_struct
[1]
;
[1] in-line union ¶
type_path_body
:
type_ids_body
|
nspace_path
type_ids
|
"|"
types_args
[1]
|
"."
types_args
[2]
|
"."
"("
":"
"?"
")"
[3]
|
"["
".."
type
"]"
[4]
;
[1] std::variant ¶
[2] std::tuple ¶
[3] std::any ¶
[4] std::initializer_list ¶
type_qualifiable_body
:
type_path_body
|
type_func_body
|
type_coro_body
|
type_pointer_member_body
|
"["
expr_paren
"]"
[1]
;
[1] decltype ¶
type_qualifier
:
"!"
|
"long"
|
"."
expr_args
|
"."
"["
expr
"]"
[1]
|
"~"
"~"
|
lifetime_id
|
"?"
|
"signed"
|
"unsigned"
|
"volatile"
|
"~"
"volatile"
|
"&"
|
"~"
"&"
|
"^"
|
"'"
;
[1] std::array ¶