This tool scans your source code for comments with a specific syntax and generates cross-referenced AsciiDoc documentation based on templates that you define.
The Developer Guide explains how to build and modify the source code, refer to the User Guide to learn how to use this software. |
This document, like the User Guide, has been generated using docra itself and can also serve as a demonstration of how this tool can be used. |
Compilation
This project uses CMake as its build system. Version 3.0 or newer is required, since at least the following CMake features are expected:
-
ExternalProject_Add
(to downloadgoogletest
) -
Command mode with
tar
to downloadflex
(only on Windows) -
Setting
CPACK_RPM_FILE_NAME
in CPack ¶
Docra is written in C++11 and a compatible compiler is required, GCC, Clang and Visual Studio are known to work. In particular the compiler must support <filesystem>
.
¶
The flex
lexer generator is required for compiling, when building on Windows a prebuilt version will be downloaded automatically.
¶
If asciidoctor
is installed, docra will use it to render its own documentation as HTML and man pages. The older asciidoc
can also be used as a fallback, but currently only to generate the man pages.
¶
If pygmentize
is installed, the docra source code will be rendered as HTML and the generated HTML cross-reference will link to these local files and not to the public repository (in order to do this, you have to compile the project once, and then re-run cmake in the build directory so that it can find and run the docra executable you just compiled on its own source code).
¶
Once you have all the prerequisites, you can launch the build system.
Linux
-
Create a build directory
mkdir -p build/dir && cd "$_"
-
Configure the build system in it
cmake /path/to/source/docra
-
Build!
make
Windows
Visual Studio, as of version 15.5, is capable of opening and configuring CMake projects directly, just open the source code folder using ¶
then do once configured.Targets
The build system provides the following targets:
- docs_adoc
-
Run docra on its own source code to render documentation templates into AsciiDoc files. ¶
- docs_html
-
Generate HTML documentation from the generated AsciiDoc source. ¶
- docs_man
-
Generate man pages from the generated AsciiDoc source. ¶
- lex_adoc
-
Build lexer library for AsciiDoc input. ¶
- lex_hash
-
Build lexer library for
#
-commented languages. ¶ - lex_clike
-
Build lexer library for C-like languages. ¶
- lex_ini
-
Build lexer library for INI-like input. ¶
- lex_verilog
-
Build lexer library for Verilog input. ¶
- lex_vhdl
-
Build lexer library for VHDL input. ¶
- lex_xml
-
Build lexer library for XML. ¶
- lex_yacc
-
Build lexer library for YACC-like grammars. ¶
- docra
-
Build the main executable. ¶
- install
-
Default CMake installation target. ¶
- uninstall
-
Remove files copied by the install target. ¶
Language
The software is written in C++. Since this tool depends on asciidoctor, an initial idea was to write it in in Ruby, however I never wrote anything in Ruby before and decided to stick with a known language instead. This also means that the docra executable runs standalone with no runtime dependencies (although you will need asciidoctor anyway to turn its output into anything usable). Being just plain C++ also means the main tool can already be compiled to JS (via emscripten) allowing it, in the future, to be distributed and run as an extension inside of editors like Code and Atom.
More specifically, we use C++11 (this is in fact my first C++11 project), the main reason for moving on from C++98 was to finally have access to the <filesystem>
standard library. Other than that and the occasional auto
we do not use lots of C++11 features.
Walkthrough
The code is organized around four main areas, which correspond to the successive steps that are executed once docra is launched, these proceed as follows:
Cross-reference
Like any side projects, the quality of this code suffered a lot from being able to contribute to it in only tiny chunks of one hour or less after work. A lot could be cleaned up. |
This section has been generated using the ?includexref
directive
Chunk
The low-level input to the generic
¶Lexer
Chunk.depth
Hierarchy level This field is also set by the low-level lexers since they know how to deconstruct hierarchy for each language ¶
Chunk.it
Current lexing position within a
This iterator is repositioned by the Chunk
while scanning Lexer
¶str
ConfigEntryConfig
ConfigEntryConfig::begin
ConfigSections::iterator begin(const std::string &sec);
Returns → iterator over all sections called sec ¶
ConfigEntryConfig::set
void set(const std::string &sec, const std::string &sub, const std::string &key, const std::string &val, bool replace);
ConfigEntryFileScopes
Map from a file path to the root of its scope tree (populated by
)
¶Parser::parse
typedef std::map<fs::path, Scope> FileScopes;
ConfigEntryParam
ConfigEntryParam.inlined
bool inlined;
This param is listed in a group (and should not be rendered by itself but inside the group) ¶
ConfigEntryParser_Source
Takes a stream of
from Token
to create a parse tree
¶Lexer
ConfigEntryPrefix
Document-wide state for a given prefix ¶
ConfigEntryScope
Representation of a generic lexical scope within a
¶Project
ConfigEntryScope.suffix
Suffix *suffix;
Points to the suffix corresponding to the scope tag (optional) ¶
ConfigEntryScope::add_child
Scope* add_child(int lineno);
Insert a new scope below the current hierarchy ¶
ConfigEntrySuffix
All data attached to a given (prefix, id, suffix) tuple ¶
ConfigEntrySuffix.lineno_code
int lineno_code;
If this tag refers to code, the corresponding line number ¶
ConfigEntrySuffix.render_codeblock
bool render_codeblock(std::ostream &out, const Tokens &desc, Tokens::const_iterator &it) const;
ConfigEntrySuffix.render_description
bool render_description(const Config &cfg, std::ostream &out, const Tokens &desc) const;
ConfigEntrySuffix.scope
Scope *scope;
Pointer into the scope tree to where this suffix was defined ¶
ConfigEntrySymbol
All data assigned to a given tag id ¶
All the suffixes, but appearing in the same order they were inserted ¶
ConfigEntrySymbol.render_xref_yacc
bool render_xref_yacc(const Config &cfg, std::ostream &out) const;
ConfigEntryoperator == (const Location&)
Comparison operator for source locations ¶
inline bool operator == (const Location &a, const Location &b);
Returns → boolean result of comparison if both operands point to same source location ¶
Lexer
Language-agnostic base class for each lexer ¶
Lexer.chunk
Chunk chunk;
Last
produced by low-level lexer
This can briefly differ from chunk.depth, the lexer will generate SCOPE_XXX tokens until balanced.
¶Chunk
Lexer.next_token
Produce a new
¶Token
Returns → 'false' at the end of input, 'true' otherwise ¶
The last token can be accessed at
(or with Lexer.token
).Lexer.current_token
¶ Tokens are matched using C++11 regular expressions Two regexes are used, the first one is:
static std::regex re_comment(
"(<\\?)([a-z]*)" "|" (1)
"(\\?>)" "|" (2)
"``([^`]+)``" "|" (3)
"(\r?\n)" "|" (4)
"(" (5)
"(([^\\s`]*)`([^`]*)`([^\\s:]*))(:)?"
")"
);
1 | See
|
2 | See
|
3 | See
|
4 | See
|
5 | See
|
Tags are formatted using the following format:
prefix`id`suffix:
The id can be empty, the suffix and colon can be omitted. The full syntax is explained in (TODO).
The second regex is:
static std::regex re_comment_boc(
"(<\\?)([a-z]*)" "|" (1)
"(\\?>)" "|" (2)
"``([^`]*)``" "|" (3)
"(\r?\n)" "|" (4)
"(" (5)
"(([^\\s`]*)`([^`]*)`([^\\s:]*))(:)?" "|"
"^(([A-Z]+):)" get rid of this exception! It only creates problems!
")"
);
1 | See
|
2 | See
|
3 | See
|
4 | See
|
5 | See
|
This syntax matches tags in the form
ABCD:
In this case, ABCD becomes the prefix, the (TODO) id is empty and the colon is not optional anymore. This syntax is only used to mark positions in the code as TODOs, FIXMEs and so forth
For more details, (TODO).
¶
It can happen that
returns 'true' and the last chunk is invalid, this is in fact what is returned on eof! See next_chunk
.
¶
Code chunks are passed on to the parser as code tokens with no modification, only comments are parsed.
¶
Some rules only apply at the beginning of a comment.
In those cases, lex_eof
is set accordingly.
¶
In C++11 regex, anything that comes after the end of the match constitutes the 'suffix'..boc
If its 'matched' attribute is true, then it contains a valid value.
Moving the
iterator to the beginning of the prefix will move the tokenizer forward.Chunk.it
If there’s nothing left to parse, the tokenizer will consume the next chunk.
'true' is returned because one
was already successfully produced.Token
¶
The only way to reset a match variable seems to be to swap it with a new, local variable that will be cleaned up outside this scope.
¶
If the regex matches somewhere after the beginning of the string, 'prefix' will be set accordingly.
This part of the string will be returned as a simple
token.DOC_STR
Lexer_CLike
Lexer for C-like languages This includes C, C++, Java, Javascript and so forth… ¶
struct Lexer_CLike : public Lexer
Lexer_Dummy
Dummy lexer to test generic logic only ¶
struct Lexer_Dummy : public Lexer
Lexer_Dummy::next_chunk
bool next_chunk();
This function does nothing but initialize and return a single
chunk. This is used to unit-test the comment syntax.
¶Chunk.COMMENT
Lexer_Yacc
Lexer for Yacc/Bison grammars ¶
struct Lexer_Yacc : public Lexer
Since a terminal symbol can be identified both by an identifier and an arbitrary string, this can be used to find the identifier starting from the string. ¶
Lexer_Yacc.curr_pattern
A pattern is a list of terminals and nonterminals, a rule can match several patterns ¶
Lexer_Yacc.curr_token_str
If
points to a token, holds its (optional) string representation
¶curr_id
Parser_Source
- l (I)
-
a
Lexer
- p (I)
-
a
Project
Parser_Source::parse
Parse the current document ¶
- root_scope (I)
-
The top-level
representing the document ¶Scope
Returns → always 'true' (TODO: should actually return false on error) ¶
First tag inside a scope that inherited the tag from its parent, if we are on the same line as the scope opening then we are the actual tag for this scope! ¶ First tag inside a scope that does not have one yet assigned, is assigned to the scope itself (if within a couple of lines from the start) ¶ If this tag follows some code, AND that code is not already tagged, then consider this tag to belong to that line of code.
¶
This tag does NOT have some code already associated with it, and it’s not a '#' tag.
Then the code will come later, but we must already create a suffix for it!
(Or we have nowhere to put any subsequent
s).DOC_STR
¶ If the scope starts on a line that was tagged, and that tag is NOT already matched, use that tag for the scope itself. (Add exception in case where tag is on the SAME line as the scope, because in this case the two clearly belong together).
¶ A DOC_STR inside a block is just added verbatim to that block.
¶ If a comment starts with a tag, it starts a tagged block Comments on lines immediately following it should be considered part of the same block. (but they must start with a space! this is a simple way to separate them from commented code).
¶ If a "? " comment occurs inside a block, it will be rendered in asciidoc as a numbered code block callout. (but all I have to do here is add it to the description, the renderer will do the rest).
¶ If a comment starts with "? ", it starts a hierachical block (or it continues the previous one, there is no difference) As such it should associate with the innermost containing scope with a suffix associated to it.
Recurse up the scope stack to find where this belongs.. ¶ Comments that end up here did NOT start with "? ", they will be accepted as docstrings only if they are preceded by a line belonging to a documentation block.
¶ Additional conditions for valid comments are that:
Either they are empty (and will count as an empty line in the resulting asciidoc).
¶ - or they start with an arrow sign, in which case they are documenting a function return value.
Project
Aggregates comments collected from all files found in the command line input ¶
Project.locate
Suffix *locate(const std::string &prefix, const std::string &symbol, const std::string &suffix = "");
Suffix *locate(const Tag &tag);
Project.locate_symbol
Symbol *locate_symbol(const std::string &prefix, const std::string &symbol);
Symbol *locate_symbol(const Tag &tag);
Project::analyze_directory
Recursively scan a directory looking for cigma tags ¶
bool analyze_directory(int verb, const fs::path &path, int level);
Returns → 'false' on error, 'true' otherwise ¶
Project::analyze_file
Lex and parse the given file ¶
bool analyze_file(int verb, const fs::path &path, int level);
Returns → 'false' on error, 'true' otherwise ¶
Find all equivalent suffixes ¶ If a suffix at the same position already existed ¶ Update its suffix (WHY!?) ¶ Update its scope ¶
Project::scopes
Association of each file path in the project with its syntax tree ¶
FileScopes scopes;
Tag
Identifier of a documented entity ¶
Tag.xref_id
Turn a tag into an asciidoc identifier for cross-referencing ¶
std::string xref_id() const;
Token
A piece of input recognized by the
¶Lexer
Token::KIND.CODE
A single line of code
A
of Chunk
kind goes through the Chunk.CODE
without modification
¶Lexer
make_regex
Convert a pattern into a regex object ¶
std::regex make_regex(const std::string &pattern, bool ispath = false);