This tool scans your source code for comments with a specific syntax and generates cross-referenced AsciiDoc documentation based on templates that you define.

The Developer Guide explains how to build and modify the source code, refer to the User Guide to learn how to use this software.
This document, like the User Guide, has been generated using docra itself and can also serve as a demonstration of how this tool can be used.

Compilation

This project uses CMake as its build system. Version 3.0 or newer is required, since at least the following CMake features are expected:

  • ExternalProject_Add (to download googletest)

  • Command mode with tar to download flex (only on Windows)

  • Setting CPACK_RPM_FILE_NAME in CPack

Docra is written in C++11 and a compatible compiler is required, GCC, Clang and Visual Studio are known to work. In particular the compiler must support <filesystem> .

The flex lexer generator is required for compiling, when building on Windows a prebuilt version will be downloaded automatically.

If asciidoctor is installed, docra will use it to render its own documentation as HTML and man pages. The older asciidoc can also be used as a fallback, but currently only to generate the man pages.

If pygmentize is installed, the docra source code will be rendered as HTML and the generated HTML cross-reference will link to these local files and not to the public repository (in order to do this, you have to compile the project once, and then re-run cmake in the build directory so that it can find and run the docra executable you just compiled on its own source code).

Once you have all the prerequisites, you can launch the build system.

Linux

  • Create a build directory

    mkdir -p build/dir && cd "$_"
  • Configure the build system in it

    cmake /path/to/source/docra
  • Build!

    make

Windows

Visual Studio, as of version 15.5, is capable of opening and configuring CMake projects directly, just open the source code folder using File  Open  Folder then do CMake  Build All once configured.

Targets

The build system provides the following targets:

docs_adoc

Run docra on its own source code to render documentation templates into AsciiDoc files.

docs_html

Generate HTML documentation from the generated AsciiDoc source.

docs_man

Generate man pages from the generated AsciiDoc source.

lex_adoc

Build lexer library for AsciiDoc input.

lex_hash

Build lexer library for #-commented languages.

lex_clike

Build lexer library for C-like languages.

lex_ini

Build lexer library for INI-like input.

lex_verilog

Build lexer library for Verilog input.

lex_vhdl

Build lexer library for VHDL input.

lex_xml

Build lexer library for XML.

lex_yacc

Build lexer library for YACC-like grammars.

docra

Build the main executable.

install

Default CMake installation target.

uninstall

Remove files copied by the install target.

Language

The software is written in C++. Since this tool depends on asciidoctor, an initial idea was to write it in in Ruby, however I never wrote anything in Ruby before and decided to stick with a known language instead. This also means that the docra executable runs standalone with no runtime dependencies (although you will need asciidoctor anyway to turn its output into anything usable). Being just plain C++ also means the main tool can already be compiled to JS (via emscripten) allowing it, in the future, to be distributed and run as an extension inside of editors like Code and Atom.

More specifically, we use C++11 (this is in fact my first C++11 project), the main reason for moving on from C++98 was to finally have access to the <filesystem> standard library. Other than that and the occasional auto we do not use lots of C++11 features.

Walkthrough

The code is organized around four main areas, which correspond to the successive steps that are executed once docra is launched, these proceed as follows:

Diagram

Cross-reference

Like any side projects, the quality of this code suffered a lot from being able to contribute to it in only tiny chunks of one hour or less after work. A lot could be cleaned up.

This section has been generated using the ?includexref directive

Chunk

The low-level input to the generic Lexer


Chunk.depth

Hierarchy level This field is also set by the low-level lexers since they know how to deconstruct hierarchy for each language


Chunk.it

Current lexing position within a Chunk This iterator is repositioned by the Lexer while scanning str


Chunk.lineno

Line within the input file This field is set by the low-level lexers


Chunk.str

The text contained by the Chunk This field is filled by the specific low-level lexer


Chunk.tag

Only set in pre-parsed tags, the tag will be just copied in the resulting token


Chunk::KIND

The specific kind of Chunk


Chunk::KIND.CODE

A Chunk that contains code


Chunk::KIND.COMMENT

A Chunk that contains a comment


Chunk::KIND.INVALID

Refers to an unitialized Chunk


Chunk::KIND.TAG

A special Chunk containing a pre-parsed tag


ConfigEntryConfig


ConfigEntryConfig::begin

ConfigSections::iterator begin(const std::string &sec);

Returns → iterator over all sections called sec


ConfigEntryConfig::end

ConfigSections::iterator end();

ConfigEntryConfig::get

const ConfigEntries &get(const std::string &sec, const std::string &sub);

ConfigEntryConfig::set

void set(const std::string &sec, const std::string &sub, const std::string &key, const std::string &val, bool replace);

ConfigEntryFileScopes

Map from a file path to the root of its scope tree (populated by Parser::parse )

typedef std::map<fs::path, Scope> FileScopes;

ConfigEntryParam


ConfigEntryParam.desc

Tokens desc;

List of tokens describing this parameter


ConfigEntryParam.desc_code

Tokens desc_code;

Code block describing this parameter


ConfigEntryParam.inlined

bool inlined;

This param is listed in a group (and should not be rendered by itself but inside the group)


ConfigEntryParam.name

std::string name;

Name of this parameter


ConfigEntryParser_Source

Takes a stream of Token from Lexer to create a parse tree


ConfigEntryParser_Source.lexer

Lexer &lexer;

Reference to the current Lexer


ConfigEntryParser_Source.project

Project &project;

Reference to the current Project


ConfigEntryParser_Source::parse

bool parse(int verb, const fs::path &, Scope &);

ConfigEntryPrefix

Document-wide state for a given prefix


ConfigEntryPrefix.symbols

Map from prefixed symbol name to all its existing suffixes


ConfigEntryScope

Representation of a generic lexical scope within a Project


ConfigEntryScope.children

Scopes children;

List of scopes semantically contained in this Scope


ConfigEntryScope.lineno_end

int lineno_end;

Source location of scope end


ConfigEntryScope.lineno_start

int lineno_start;

Source location of scope start


ConfigEntryScope.parent

Scope* parent;

Pointer to Scope containing this one


ConfigEntryScope.suffix

Suffix *suffix;

Points to the suffix corresponding to the scope tag (optional)


ConfigEntryScope::add_child

Scope* add_child(int lineno);

Insert a new scope below the current hierarchy


ConfigEntrySuffix

All data attached to a given (prefix, id, suffix) tuple


ConfigEntrySuffix.desc_code

Tokens desc_code;

Code block describing this entity


ConfigEntrySuffix.desc_long

Tokens desc_long;

Long description of this entity


ConfigEntrySuffix.desc_short

Tokens desc_short;

Short description of this entity


ConfigEntrySuffix.lineno_code

int lineno_code;

If this tag refers to code, the corresponding line number


ConfigEntrySuffix.params

Params params;

Local symbols valid within the scope of this full tag


ConfigEntrySuffix.path

fs::path path;

Path to file that this tag belongs to


ConfigEntrySuffix.project

Project *project;

Project containing this tag


ConfigEntrySuffix.render_codeblock

bool render_codeblock(std::ostream &out, const Tokens &desc, Tokens::const_iterator &it) const;
out (I)

Stream to write to

it (IO)

Iterator to start of block


ConfigEntrySuffix.render_description

bool render_description(const Config &cfg, std::ostream &out, const Tokens &desc) const;

ConfigEntrySuffix.scope

Scope *scope;

Pointer into the scope tree to where this suffix was defined


ConfigEntrySuffix.tag

Tag tag;

The full resolved tag identifying this suffix


ConfigEntrySuffix.token_tag

Token token_tag;

A copy of the original suffix tag


ConfigEntrySymbol

All data assigned to a given tag id

All the suffixes, but appearing in the same order they were inserted


ConfigEntrySymbol.render_xref

bool render_xref(const Config &cfg, std::ostream &out) const;

ConfigEntrySymbol.render_xref_yacc

bool render_xref_yacc(const Config &cfg, std::ostream &out) const;

ConfigEntrySymbol.suffixes

Map from an optional suffix to the associated Suffix data


ConfigEntryoperator == (const Location&)

Comparison operator for source locations

inline bool operator == (const Location &a, const Location &b);
a (I)

left-hand operand

b (I)

right-hand operand

Returns → boolean result of comparison if both operands point to same source location


Lexer

Language-agnostic base class for each lexer


Lexer.chunk

Chunk chunk;

Last Chunk produced by low-level lexer This can briefly differ from chunk.depth, the lexer will generate SCOPE_XXX tokens until balanced.


Lexer.depth

int depth;

Current position in hierachy


Lexer.depth_delta

int depth_delta;

Cumulative depth variation to be applied at next eol


Lexer.eof

bool eof;

True at end of input See lex_eof .


Lexer.file

FILE *file;

Handle to input file We use stdio’s FILE since that’s what Flex expects.


Lexer.fini

virtual void fini() = 0;

Dual of init


Lexer.init

virtual void init(const fs::path &path, FILE *file) = 0;
path (I)

path to file

file (I)

previously opened file handle


Lexer.lex_eof

Is called by the lower-level lexers.

int lex_eof();

Sets the eof flag


Lexer.match

std::smatch match;

Result of last regexp match


Lexer.next_chunk

virtual bool next_chunk() = 0;

Returns → 'false' on EOF, 'true' otherwise

Low-level chunking This function is reimplemented by each low-level lexer.


Lexer.next_token

Produce a new Token

Returns → 'false' at the end of input, 'true' otherwise

The last token can be accessed at Lexer.token (or with Lexer.current_token ).

Tokens are matched using C++11 regular expressions Two regexes are used, the first one is:

static std::regex re_comment(
 "(<\\?)([a-z]*)" "|"  (1)
 "(\\?>)" "|"  (2)
 "``([^`]+)``" "|"  (3)
 "(\r?\n)" "|"  (4)
 "("  (5)
  "(([^\\s`]*)`([^`]*)`([^\\s:]*))(:)?"
 ")"
);
1 See Token::KIND.BLOCK_ENTER
2 See Token::KIND.BLOCK_EXIT
3 See Token::KIND.LITERAL
4 See Token::KIND.EOL
5 See Token::KIND.TAG

Tags are formatted using the following format:

prefix`id`suffix:

The id can be empty, the suffix and colon can be omitted. The full syntax is explained in (TODO).

The second regex is:

static std::regex re_comment_boc(
 "(<\\?)([a-z]*)" "|"  (1)
 "(\\?>)" "|"  (2)
 "``([^`]*)``" "|"  (3)
 "(\r?\n)" "|"  (4)
 "("  (5)
  "(([^\\s`]*)`([^`]*)`([^\\s:]*))(:)?" "|"
  "^(([A-Z]+):)"  get rid of this exception! It only creates problems!
 ")"
);
1 See Token::KIND.BLOCK_ENTER
2 See Token::KIND.BLOCK_ENTER
3 See Token::KIND.LITERAL
4 See Token::KIND.EOL
5 See Token::KIND.TAG

This syntax matches tags in the form

ABCD:

In this case, ABCD becomes the prefix, the (TODO) id is empty and the colon is not optional anymore. This syntax is only used to mark positions in the code as TODOs, FIXMEs and so forth For more details, (TODO). It can happen that next_chunk returns 'true' and the last chunk is invalid, this is in fact what is returned on eof! See lex_eof . Code chunks are passed on to the parser as code tokens with no modification, only comments are parsed. Some rules only apply at the beginning of a comment. In those cases, .boc is set accordingly. In C++11 regex, anything that comes after the end of the match constitutes the 'suffix'.

If its 'matched' attribute is true, then it contains a valid value.

Moving the Chunk.it iterator to the beginning of the prefix will move the tokenizer forward.

If there’s nothing left to parse, the tokenizer will consume the next chunk.

'true' is returned because one Token was already successfully produced.

The only way to reset a match variable seems to be to swap it with a new, local variable that will be cleaned up outside this scope. If the regex matches somewhere after the beginning of the string, 'prefix' will be set accordingly. This part of the string will be returned as a simple DOC_STR token.

A DOC_STR is also returned if nothing is matched.


Lexer.path

fs::path path;

Optional path to file being scanned


Lexer.scanner

void *scanner;

Opaque pointer to Flex state


Lexer.token

Token token;

Last Token produced by the lexer


Lexer_Adoc

Lexer recognizing Asciidoc comment syntax

struct Lexer_Adoc : public Lexer

Lexer_Adoc::init

void init(const fs::path &path, FILE *file);

Lexer_CLike

Lexer for C-like languages This includes C, C++, Java, Javascript and so forth…​

struct Lexer_CLike : public Lexer

Lexer_CLike::init

void init(const fs::path &path, FILE *file);

Lexer_Dummy

Dummy lexer to test generic logic only

struct Lexer_Dummy : public Lexer

Lexer_Dummy::next_chunk

bool next_chunk();

This function does nothing but initialize and return a single Chunk.COMMENT chunk. This is used to unit-test the comment syntax.


Lexer_Hash

Lexer for #-commented languages

struct Lexer_Hash : public Lexer

Lexer_Hash::init

void init(const fs::path &path, FILE *file);

Lexer_Ini

Lexer for 'ini-like' configuration files

struct Lexer_Ini : public Lexer

Lexer_Ini::init

void init(const fs::path &path, FILE *file);

Lexer_Verilog

Specific lexer for Verilog and SystemVerilog

struct Lexer_Verilog : public Lexer

Lexer_Verilog::init

void init(const fs::path &path, FILE *file);

Lexer_Vhdl

Specific lexer for VHDL

struct Lexer_Vhdl : public Lexer

Lexer_Vhdl::init

void init(const fs::path &path, FILE *file);

Lexer_Xml

Lexer for Xml

struct Lexer_Xml : public Lexer

Lexer_Yacc

Lexer for Yacc/Bison grammars

struct Lexer_Yacc : public Lexer

Since a terminal symbol can be identified both by an identifier and an arbitrary string, this can be used to find the identifier starting from the string.


Lexer_Yacc.curr_id

The identifier for the token or rule currently being processed


Lexer_Yacc.curr_pattern

A pattern is a list of terminals and nonterminals, a rule can match several patterns


Lexer_Yacc.curr_token_str

If curr_id points to a token, holds its (optional) string representation


Lexer_Yacc.line_buffer

Buffer holding current line


Lexer_Yacc.token_ids

Holds all the token identifiers


Lexer_Yacc::init

void init(const fs::path &path, FILE *file);

Parser_Source

l (I)

a Lexer

p (I)

a Project


Parser_Source::parse

Parse the current document

root_scope (I)

The top-level Scope representing the document

Returns → always 'true' (TODO: should actually return false on error)

First tag inside a scope that inherited the tag from its parent, if we are on the same line as the scope opening then we are the actual tag for this scope! First tag inside a scope that does not have one yet assigned, is assigned to the scope itself (if within a couple of lines from the start) If this tag follows some code, AND that code is not already tagged, then consider this tag to belong to that line of code.

This tag does NOT have some code already associated with it, and it’s not a '#' tag. Then the code will come later, but we must already create a suffix for it! (Or we have nowhere to put any subsequent DOC_STR s).

If the scope starts on a line that was tagged, and that tag is NOT already matched, use that tag for the scope itself. (Add exception in case where tag is on the SAME line as the scope, because in this case the two clearly belong together).

A DOC_STR inside a block is just added verbatim to that block.

If a comment starts with a tag, it starts a tagged block Comments on lines immediately following it should be considered part of the same block. (but they must start with a space! this is a simple way to separate them from commented code).

If a "? " comment occurs inside a block, it will be rendered in asciidoc as a numbered code block callout. (but all I have to do here is add it to the description, the renderer will do the rest).

If a comment starts with "? ", it starts a hierachical block (or it continues the previous one, there is no difference) As such it should associate with the innermost containing scope with a suffix associated to it.

Recurse up the scope stack to find where this belongs.. Comments that end up here did NOT start with "? ", they will be accepted as docstrings only if they are preceded by a line belonging to a documentation block.

Additional conditions for valid comments are that:

Either they are empty (and will count as an empty line in the resulting asciidoc).

- or they start with an arrow sign, in which case they are documenting a function return value.

- or they must start with a space, as a way to distinguish them from a commented line of code.


Project

Aggregates comments collected from all files found in the command line input


Project.insert_tag

Suffix *insert_tag(const Tag &tag, Token &&token, Scope *scope);

Project.locate

Suffix *locate(const std::string &prefix, const std::string &symbol, const std::string &suffix = "");

Suffix *locate(const Tag &tag);

Project.locate_prefix

Prefix *locate_prefix(const std::string &prefix);

Project.locate_symbol

Symbol *locate_symbol(const std::string &prefix, const std::string &symbol);

Symbol *locate_symbol(const Tag &tag);

Project::analyze_directory

Recursively scan a directory looking for cigma tags

bool analyze_directory(int verb, const fs::path &path, int level);
path (I)

Directory to scan

level (I)

Relative depth of path in source hierarchy

Returns → 'false' on error, 'true' otherwise


Project::analyze_file

Lex and parse the given file

bool analyze_file(int verb, const fs::path &path, int level);
path (I)

Path to file

level (I)

Relative depth of path in source hierarchy

Returns → 'false' on error, 'true' otherwise

Find all equivalent suffixes If a suffix at the same position already existed Update its suffix (WHY!?) Update its scope


Project::scopes

Association of each file path in the project with its syntax tree

FileScopes scopes;

Scope::add_child

Creates a new Scope inside the current one


Tag

Identifier of a documented entity


Tag.empty

bool empty();

Returns → wether this tag is empty


Tag.erase

Clears out the tag contents

void erase();

Tag.prefix

Prefix for a KIND.TAG


Tag.suffix

Suffix for a KIND.TAG


Tag.symbol

Identifier for a KIND.TAG


Tag.xref_id

Turn a tag into an asciidoc identifier for cross-referencing

std::string xref_id() const;

Token

A piece of input recognized by the Lexer


Token.boc

Beginning of comment


Token.brief

If a tag, whether it’s followed by a column (unused!)


Token.kind

The Kind of this particular token


Token.lineno

Line number of token


Token.str

Token content (for KIND.CODE or KIND.DOC_STR )


Token::KIND

The specific kind of Token


Token::KIND.BLOCK_ENTER

The beginning of an embedded code block


Token::KIND.BLOCK_EXIT

The end of an embedded code block


Token::KIND.CODE

A single line of code A Chunk of Chunk.CODE kind goes through the Lexer without modification


Token::KIND.DOC_STR

Emitted by default for parts of comments


Token::KIND.INLINE_PARAM

A param identifier, rendered inline


Token::KIND.LITERAL

An inline literal expression


Token::KIND.SCOPE_ENTER

Emitted when entering a scope


Token::KIND.SCOPE_EXIT

Emitted when leaving a scope


Token::KIND.TAG

A tag identifier, with optional prefix and suffix


main

Entry point of the program

int main(int argc, char *argv[]);
argc (I)

Number of command-line arguments

argv (I)

String array of arguments


make_regex

Convert a pattern into a regex object

std::regex make_regex(const std::string &pattern, bool ispath = false);