Cigma is an experimental programming language designed to take advantage of modern C++ within a simplified and redesigned syntax. Cigma programs can be more concise and (once learned) easier to read than equivalent C++ programs, while remaining fully interoperable with existing C++ code and toolchains.

The Developer Guide explains how the Cigma programming language is implemented and is meant for people wishing to contribute to the project, refer to the User Guide to learn about the language itself.

Compilation

This project uses CMake as its build system. At least version 3.3 is required (however running the tests requires C++20 support, which is only available from 3.12).

The reference implementation of the Cigma language is written in C++11 and a compatible compiler is required. GCC, Clang and Visual Studio are known to work.

The flex lexer generator is required to build the reference parser. The reference implementation of the language uses the bison parser generator. Bison 3 or newer is required. When building on Windows a precompiled version of flex and bison will be downloaded automatically.

The documentation is generated using docra. This tool and AsciiDoctor are required to generate the document you are currently reading.

Once you have all the prerequisites, you can launch the build system.

Linux

  • Create a build directory

    mkdir -p build/dir && cd "$_"
  • Configure the build system in it

    cmake /path/to/source/cigma
  • Build!

    make

Windows

Visual Studio, as of version 15.5, is capable of opening and configuring CMake projects directly, just open the source code folder using File  Open  Folder then do CMake  Build All once configured.

Targets

The build system provides the following targets:

bison_check

Reports errors or conflicts in the reference Bison grammar.

bison_sync

Uses yppsync to update helper code within the reference Bison grammar.

bison_unsync

Uses yppsync to remove helper code from the reference Bison grammar.

boot_ast

Library containing the reference implementation of the Cigma grammar.

boot_cpp

Bootstrap library to translate Cigma ASTs into compilable C++ code.

boot_dot

Bootstrap library to print Cigma ASTs in Graphviz DOT format.

boot_yml

Bootstrap library to print Cigma ASTs in YAML format.

install

Default CMake installation target.

uninstall

Remove files copied by the install target.

yppsync

This is an internal development tool used to simplify the implementation of the reference Bison parser.

Bootstrapped implementation

The Cigma programming language is designed to map very closely to modern C++. For this reason, the simplest way to compile Cigma programs is to translate them into C++ code first, and then compile the result using any C++ toolchain.

The first implementation of such a translator is being written in C++11, and will be used to bootstrap a second translator, written fully in Cigma. This section describes the implementation of this first-level translator.

Grammar

One of the explicit design goals of Cigma is to have a regular grammar without ambiguities (much unlike C++, that is highly non-regular and notoriously difficult to parse), all while retaining the expressive power of modern C++. After a lot of iteration, the current Cigma grammar manages to express C++20 programs using a much simpler (and, arguably, more readable) LALR(1) grammar. This section describes the current implementation of the grammar, built using the classical combination of Bison and Flex.

Parser

The Bison compiler-compiler is able to generate parsers for different kinds of grammars, the simplest ones being LALR(1). LALR(1) grammars require no backtracking and use only a single lookahead token in order to decide the next grammar rule to follow. These limitations produce fast parsers but impose lots of constraints on the design of the language, which delayed the development of the Cigma language for a long time.

C++ translation

Initialization

Initialization in C++ is unfortunately a very complex topic, there exist tens of possible initialization syntaxes, each with its own issues. Despite long and ongoing modernization efforts there still isn’t a generic, concise and consistent set of rules for C++ initialization. When mapping (unambiguous) Cigma initialization syntax to (ambiguous) C++ initialization syntax, the translator implements the following guidelines.

Guidelines for direct initialization

Direct initialization uses <(…​) syntax and invokes constructors, including constructors on predefined types ( bool, int, …​).

Zero arguments
  1. Should translate to T t = T() if both name and Type are available at the callsite.

  2. Should translate to T() if only the Type is available.

  3. Should translate to t() if only the name is available.

Why
  • (1) does not suffer from most vexing parse, unlike T t()

  • (1) is also used by the STL to always enforce direct default initialization regardless of the Type

One argument
  1. Should translate to T t(arg) if both name and Type are available at the callsite.

  2. Should translate to T(arg) if only the Type is available.

  3. Should translate to t(arg) if only the name is available (implicit initialization must be allowed).

Why
  • (1) does not suffer from most vexing parse, unlike T t(arg)

  • Works with predefined types ( int a({42}) raises a warning)

  • (1) does not accidentally create an std::initializer_list (in particular if T is auto, since C++-11 with current compilers)

This form does allow narrowing conversions. Uniform initialization would have prevented narrowing conversions but at the same time it makes it easy to call the std::initializer_list constructor by accident. Using this form, something like :/std:list[:!int]<(10) will have the expected result (create a list of ten elements), instead of accidentally creating a list containing a single element with value ten, the second meaning can be unambiguously expressed with aggregate initialization, like :/std:list[:!int]<{10}.
More arguments
  1. Should translate to T t(arg1, arg2, …​) if both name and Type are available at the callsite.

  2. Should translate to T(arg1, arg2, …​) if only the Type is available.

  3. Should translate to t(arg1, arg2, …​) if only the name is available (implicit initialization must be allowed).

Why
  • Prevents accidentally calling std::initializer_list constructors (for example in std::vector)

This form was chosen to avoid ambiguities for the same reason as the case above, however this allows narrowing conversions in constructor calls.
Guidelines for aggregate initialization

Aggregate initialization uses <{…​} syntax and initializes individual fields in user-defined aggregates (or calls the aggregate constructor if available).

Zero arguments
  1. Should translate to T t{} if both name and Type are available at the callsite.

  2. Should translate to T{} if only the Type is available.

  3. Should translate to t{} if only the name is available.

Why
  • Applies to enums as well as other user-defined types (if T is a template parameter <() should be preferred as it default initializes both aggregates and non-aggregates)

One argument
  1. Should translate to T t = {arg} if both name and Type are available at the callsite.

  2. Should translate to T{arg} if only the Type is available (and it is not auto).

  3. Should translate to t{arg} if only the name is available.

Why
  • Equals sign in (1) is required when T is auto to consistently obtain an std::initializer_list

More arguments
  1. Should translate to T t = {arg1, arg2, …​}

  2. Should translate to T{arg1, arg2, …​} if only the Type is available (and it is not auto)

  3. Should translate to t{arg1, arg2, …​} if only the name is available

Why
  • Equals sign in (1) is required when T is auto to generate an std::initializer_list

  • Equals sign not required anymore for templified containers like std::array if using C++-11 post CWG 1270 or C++-14 and beyond

Guidelines for copy initialization

Save for the exceptions mentioned above, the translator should never emit copy initializers (using an equals sign) in its C++ output, and always favour uniform initialization instead.

Implementation of initialization guidelines

Depending on the surrounding context, the guidelines just described result in the following behaviour.

Declaration group

A ast::DeclGroup is used wherever multiple names can be declared with the same Type (and possibly the same initializer), typical examples are function arguments, members and local variables. Within this context, the following initializers can be found:

ast::ExprCtor

Performs explicit direct initialization. Since the Type name is known at the callsite we the follow the above guidelines directly. (TODO)

ast::ExprInitList

Performs explicit aggregate initialization with positional (not designated) initializers. Here we can also follow the guidelines directly. This is encountered for example when initializing an array.

ast::ExprsByMember

Performs explicit aggregate initialization like above, but with designated initializers (only available since C++20). (TODO)

ast::Expr

The Type is being initialized with an equals sign only, according to the guidelines a single-argument list initializer should be emitted. There is one exception when setting default values for function and template arguments, where braces are not supported in all compilers, and another when the Type is auto, as that generates an std::initializer_list. = to work around MSVC bug https://developercommunity.visualstudio.com/content/problem/1232251/error-in-brace-initialization-of-pointer-to-functi.html const int X = {42} syntax so we must omit the brackets there as well (although that’s fine for GCC).

Constructor declaration

A ast::DeclCtor is used to declare constructors. Such a declaration may contain initializers that are used to initialize base classes or class members. Within this context, the following initializers can be found:

ast::ExprCtor

Should emit a delegated constructor. (TODO)

ast::ExprsByPos

Initializes class members by position (not by name). Translating this syntax requires iterating over the struct or class members. (TODO)

ast::ExprsByMember

Initializes class members by name. Since the name is available, the initialization guidelines can be followed directly. (TODO)

ast::ExprsByBase

Initializes base classes (this is basically a list of ast::ExprInitType ). (TODO)

ast::ExprInitType

Initializes a single base class. This stage is reached via recursion from the previous case. (TODO)

ast::Expr

This combination is probably impossible in the current grammar and should raise an error. (TODO)

Type initialization

A ast::ExprInitType is an initializer that explicitly describes the name of the Type being constructed (as a ast::TypePath ) in addition to the expression (or expressions) to initialize it with. In this context, the initializer is used in expressions and not as part of a base class initializer.

ast::ExprCtor

Calls the constructor for the given Type. The result can be used for direct initialization. The guidelines still apply given that the Type name is known. (TODO)

ast::ExprInitList

Performs explicit list initialization using a braced-init-list according to the guidelines. (TODO)

ast::ExprsByMember

This could be used to explicitly initialize a temporary using designated initializers. (TODO)

ast::Expr

Explicitly call constructor with a single argument, as per the guidelines. (TODO)

Positional expression group

A ast::ExprsByPos is not an initializer per se, but it may be found by recursion when visiting other kinds of initializers. From the guidelins, this should always be translated into a braced-init-list.

Member-wise expression group

The group ast::ExprsByMember is of course used to initialize members by name. As such, when emitting C++ initializers for it it is just necessary to recurse into the appropriate initializer generator for each member.

Self-hosted implementation

Once the bootstrapped translator will be able to understand and correctly translate to C++ a large enough subset of the Cigma programming language, a self-hosted translator will be implemented in Cigma itself.

Appendix A: Bootstrap AST

ast::Attr

struct Attr : Node {...};

ast::Block

struct Block : Node {...};

ast::Comment

struct Comment : Note {...};

ast::Decl

struct Decl : Node {...};

ast::DeclCtor

struct DeclCtor : Decl {...};

ast::DeclDtor

struct DeclDtor : Decl {...};

ast::DeclForeign

struct DeclForeign : Decl {...};

ast::DeclGroup

struct DeclGroup : Decl {...};

ast::DeclLabel

struct DeclLabel : Decl {...};

ast::DeclModule

struct DeclModule : Decl {...};

ast::DeclName

struct DeclName : Decl {...};

ast::DeclNspace

struct DeclNspace : Decl {...};

ast::DeclType

struct DeclType : Decl {...};

ast::Expr

struct Expr : Node {...};

ast::ExprAddress

struct ExprAddress : ExprUnary {...};

ast::ExprAlignof

struct ExprAlignof : Expr {...};

ast::ExprAny

struct ExprAny : Expr {...};

ast::ExprAwait

struct ExprAwait : Expr {...};

ast::ExprBinary

struct ExprBinary : Expr {...};

ast::ExprBinaryCompiletime

struct ExprBinaryCompiletime : Expr {...};

ast::ExprBind

struct ExprBind : Expr {...};

ast::ExprBlock

struct ExprBlock : Expr {...};

ast::ExprCall

struct ExprCall : Expr {...};

ast::ExprCast

struct ExprCast : Expr {...};

ast::ExprCtor

struct ExprCtor : Expr {...};

ast::ExprDelete

struct ExprDelete : Expr {...};

ast::ExprDeleteArray

struct ExprDeleteArray : Expr {...};

ast::ExprDereference

struct ExprDereference : Expr {...};

ast::ExprDtor

struct ExprDtor : Expr {...};

ast::ExprFold

struct ExprFold : Expr {...};

ast::ExprForeign

struct ExprForeign : Expr {...};

ast::ExprForward

struct ExprForward : Expr {...};

ast::ExprGet

struct ExprGet : Expr {...};

ast::ExprGroup

struct ExprGroup : Expr {...};

ast::ExprIf

struct ExprIf : Expr {...};

ast::ExprIfBind

struct ExprIfBind : Expr {...};

ast::ExprIfTrue

struct ExprIfTrue : Expr {...};

ast::ExprIndex

struct ExprIndex : Expr {...};

ast::ExprInitList

struct ExprInitList : Expr {...};

ast::ExprInitMember

this class is currently not used

struct ExprInitMember : Expr {...};

ast::ExprInitType

struct ExprInitType : Expr {...};

ast::ExprLambda

struct ExprLambda : Expr {...};

ast::ExprLiteral

struct ExprLiteral : Expr {...};

ast::ExprMember

struct ExprMember : Expr {...};

ast::ExprMemberAddress

struct ExprMemberAddress : Expr {...};

ast::ExprMemberDereference

struct ExprMemberDereference : Expr {...};

ast::ExprMove

struct ExprMove : Expr {...};

ast::ExprNew

struct ExprNew : Expr {...};

ast::ExprPath

struct ExprPath : Expr {...};

ast::ExprPlaceholder

struct ExprPlaceholder : Expr {...};

ast::ExprRange

struct ExprRange : Expr {...};

ast::ExprSelectBase

struct ExprSelectBase : Expr { ... };

ast::ExprSizeof

struct ExprSizeof : Expr {...};

ast::ExprSwitch

struct ExprSwitch : Expr {...};

ast::ExprTypeid

struct ExprTypeid : Expr {...};

ast::ExprUnary

struct ExprUnary : Expr {...};

ast::ExprsByBase

struct ExprsByBase : ExprsByPos {...};

ast::ExprsByEnum

struct ExprsByEnum : ExprsByName {...};

ast::ExprsById

struct ExprsById : ExprsByName {...};

ast::ExprsByIndex

struct ExprsByIndex : ExprGroup {...};

ast::ExprsByMember

struct ExprsByMember : ExprsByName {...};

ast::ExprsByName

struct ExprsByName : ExprGroup {...};

ast::ExprsByPos

struct ExprsByPos : ExprGroup {...};

ast::File

struct File {...};

ast::Flow

struct Flow : Node {...};

ast::FlowBreak

struct FlowBreak : Flow {...};

ast::FlowCatch

struct FlowCatch : Flow {...};

ast::FlowContinue

struct FlowContinue : Flow {...};

ast::FlowGoto

struct FlowGoto : Flow {...};

ast::FlowIf

struct FlowIf : Flow {...};

ast::FlowLoop

struct FlowLoop : Flow {...};

ast::FlowRange

struct FlowRange : Flow {...};

ast::FlowReturn

struct FlowReturn : Flow {...};

ast::FlowSwitch

struct FlowSwitch : Flow {...};

ast::FlowThrow

struct FlowThrow : Flow {...};

ast::FlowTry

struct FlowTry : Flow {...};

ast::FlowYield

struct FlowYield : Flow {...};

ast::Node

struct Node {...};

ast::Pattern

struct Pattern : Node {...};

ast::PatternAnyType

struct PatternAnyType : Pattern {...};

ast::PatternBinding

struct PatternBinding : Pattern {...};

ast::PatternConcept

struct PatternConcept : Pattern {...};

ast::PatternDereference

struct PatternDereference : Pattern {...};

ast::PatternDynType

struct PatternDynType : Pattern {...};

ast::PatternExpr

struct PatternExpr : Pattern {...};

ast::PatternExtractor

struct PatternExtractor : Pattern {...};

ast::PatternIfTrue

struct PatternIfTrue : Pattern {...};

ast::PatternMember

struct PatternMember : Pattern {...};

ast::PatternMembers

struct PatternMembers : Pattern {...};

ast::PatternOr

struct PatternOr : Pattern {...};

ast::PatternRange

struct PatternRange : Pattern {...};

ast::PatternSwitch

struct PatternSwitch : Pattern {...};

ast::PatternTaggedUnion

struct PatternTaggedUnion : Pattern {...};

ast::PatternType

struct PatternType : Pattern {...};

ast::PatternVariantByExpr

struct PatternVariantByExpr : Pattern {...};

ast::PatternVariantByType

struct PatternVariantByType : Pattern {...};

ast::Scope

struct Scope {...};

ast::Token

struct Token {...};

ast::TokenBuffer

struct TokenBuffer : Token {...};

ast::TokenLiteral

struct TokenLiteral : Token {...};

ast::Type

struct Type : Node {...};

ast::TypeArray

struct TypeArray : TypeQualifier {...};

ast::TypeAuto

struct TypeAuto : Type {...};

ast::TypeDecay

struct TypeDecay : TypeQualifier {...};

ast::TypeDirection

struct TypeDirection : TypeQualifier {...};

ast::TypeEnum

struct TypeEnum : Type {...};

ast::TypeExpr

struct TypeExpr : Type {...};

ast::TypeFunc

struct TypeFunc : Type {...};

ast::TypeLifetime

struct TypeLifetime : TypeQualifier {...};

ast::TypeLong

struct TypeLong : TypeQualifier {...};

ast::TypeModifierSize

struct TypeModifierSize : Type {...};

ast::TypeMovable

struct TypeMovable : TypeQualifier {...};

ast::TypeMutable

struct TypeMutable : TypeQualifier {...};

ast::TypePartial

struct TypePartial : TypeQualifier {...};

ast::TypePath

struct TypePath : Type {...};

ast::TypePointer

struct TypePointer : TypeQualifier {...};

ast::TypePointerMember

struct TypePointerMember : TypePointer {...};

ast::TypePrimitive

struct TypePrimitive : Type {...};

ast::TypeQualifier

struct TypeQualifier : Type {...};

ast::TypeReference

struct TypeReference : TypeQualifier {...};

ast::TypeSigned

struct TypeSigned : TypeQualifier {...};

ast::TypeStruct

struct TypeStruct : Type {...};

ast::TypeUnsigned

struct TypeUnsigned : TypeQualifier {...};

ast::TypeVoid

struct TypeVoid : Type {...};

ast::TypeVolatile

struct TypeVolatile : TypeQualifier {...};

ast::Visitor

Base class to create AST visitors.

struct Visitor {...};

ast::Visitor::visit

Methods to reimplement to visit specific kinds of AST nodes.


virtual void visit(Node *, AttrPtr &n) {}

virtual void visit(Node *, BlockPtr &n);

virtual void visit(Node *, DeclPtr &n);

virtual void visit(Node *, ExprPtr &n);

virtual void visit(FilePtr &);

virtual void visit(Node *, FlowPtr &n);

virtual void visit(Node *, IdPtr &n) {}

void visit(Node*, NodePtr &);

This method is not meant to be overridden.


virtual void visit(Node *, PatternPtr &n);

virtual void visit(Node *, TokenPtr &n);

virtual void visit(Node *, TypePtr &n);

ast::id

struct Id : Node {...};

Appendix B: Bootstrap C++ translator

Appendix C: Grammar rules

decl_template_args

[1] fully specialized templates have nothing here

expr_delete

Access to tagged union member (not yet implemented) ::std::get_if<$expr>(& $expr_postfix) ::std::get_if<$type>(& $expr_postfix) get<$expr>($expr_postfix) get<$type>($expr_postfix) std::any_cast<$type>(& $expr) std::any_cast<$type>($expr) dynamic_cast<$type>($expr) conditional expression bind subexpression to a local name std::bind std::bind_front /std/placeholders _N Define lambda and call it in place Compile-time version of the above dynamic_cast static_cast /std:is_same[:A, :B] value !/std:is_same[:A, :B] value Used as don’t care element in /std tie Used as don’t care element in /std tie Simplified if Simplified else Simplified else if Simplified switch Match range Match range Match expression Match via extractor Dereference previous match Match if convertible to true Match variant by type Match variant by index Match via any_cast Match via dynamic_cast Match tagged union member Match members by name Match name Match anything Nested switch Bind to local variable Structured binding Capture by-copy by default Capture by-reference by default Capture by-copy Capture by-copy of pack expansion Capture by-copy with initializer Capture by-reference Capture by-reference of pack expansion Capture by-reference with initializer Capture by-copy of current object Capture by-reference of current object designated initialization in-place initialization

expr_fold

[1] unary right fold

[2] unary left fold

[3] binary right fold

[4] binary left fold

member_id

: "." ID
| "." NOID [1]
;

[1] Currently only used for padding in bitfields

nspace_id

: "/" ID
| "/" NOID [1]
;

[1] Anonymous namespace

return_type

: "->" type_novariadic
| "->" [1]
| nothing
;

[1] "void" return type

specop_id

[1] subscript operator

[2] subscript operator (constexpr)

[3] subscript operator (consteval)

[4] call operator

[5] call operator (constexpr)

[6] call operator (consteval)

[7] cast operator

[8] cast operator (constexpr)

[9] cast operator (consteval)

type_arg_direction_opt

: ">>" [1]
| "<<" [2]
| "<>" [3]
| "->" [4]
| nothing
;

[1] Input argument

[2] Output argument

[3] In-out argument

[4] Forward argument

type_path_body

[1] std::variant

[2] std::tuple

[3] std::any

[4] std::initializer_list

Appendix D: Grammar symbols