Docra — User Guide

This tool scans your source code for comments with a specific syntax and generates cross-referenced AsciiDoc documentation based on templates that you define.

The User Guide just explains how to use this software, refer to the Developer Guide to learn how to build and modify the source code.

This document, like the Developer Guide, has been generated using docra itself and can also serve as a demonstration of how this tool can be used.

Motivation

This tool was developed to fill what I perceived to be gap in existing documentation systems, my requirements at the time were the following (in no particular order):

Should be based on a lightweight markup language (like reStructuredText, Markdown or Asciidoc)
Should allow keeping the prose describing a particular piece of functionality close to the code implementing it, so that both can be kept in sync with relative ease
Should give to the technical writer full control over the structure of the final document
Should be embeddable in virtually any programming and markup language (including things like arbitrary configuration files)

Unless otherwise indicated, "AsciiDoc" always refers to the language as implemented by Asciidoctor (which is actively maintained), and not to the old python implementation of the language (last released in 2013).

When using docra, you are mostly writing source code comments in Asciidoc syntax (with some minor docra-specific extensions). There are several reasons for this choice over one of the other lightweight markups:

It is well documented
It is less cumbersome than reStructuredText
It is less limited than Markdown
It has one, well maintained, reference implementation (unlike the myriad of Markdown-derivatives)
Its syntax has a direct mapping to DocBook, a standard language used to author technical content
It can generate a variety of diagram types from plain-text descriptions, through the asciidoctor-diagram plugin

Comparisons

Doxygen

Doxygen has sophisticated lexers for a variety of languages and some level of Markdown support. It should be possible to customize the output of Doxygen by manipulating its XML output, however features like transclusions (embedding text from an arbitrary comment in the code into the output document, and back-reference it to the corresponding source code) and tag selectors (using regular expressions to control which comment tags are accessed by a given output directive) that are used heavily in docra are not present in Doxygen.

Sphinx

In principle, Sphinx is able to support many languages through a feature called domains. However, documenting anything other than Python (the language it was originally designed for) remains a suboptimal experience. ReStructuredText is also, subjectively, verbose and yet very hard to remember.

Language-specific tools

Java has JavaDoc, Rust has Rustdoc, Kotlin has dokka, Javascript has JSDoc (and many others), however each of these tools can only work on the single language they were designed for. Docra is meant to work across language boundaries, this is particularly relevant when trying to document points of interaction between linguistically seperate domains (for example, between a kernel driver and the user-level code interacting with it, or between HDL code programmed on an FPGA and the software accessing it, or between two components written in different languages communicating with each other through some middleware protocol).

CWEB

CWEB, created by legendary professor Knuth in 1987, was an obvious inspiration for docra, unfortunately CWEB itself is intentionally restricted to languages like C, C++ and Java, also Latex is considerably harder to write than AsciiDoc. A quick perusal of a simple C++ library written in CWEB http://www.literateprogramming.com/string.w shows that CWEB documents are quite cumbersome to write, and inscrutable to read without a post-processor.

Limitations

Using docra in your project requires some manual work. Docra has, by design, no knowledge about how to lexically understand whatever programming language you are using, and relies heavily on the comments that you put in the source code. Additionally, by default no output will be produced unless you write a template specifying the outline of the document that you want to generate. This makes it hard to convert existing large projects to docra. Docra is much better suited for smaller projects, in particular side-projects, where periods of time available to design, implementation and testing are short and far between and up-to-date documentation is always needed to maintain focus and momentum (docra itself is one such project).

Supported languages

Docra can currently be used with:

C-style languages (C, C++, Java, D, JavaScript, Swift, Kotlin, possibly others…)
INI-style files (like those used to configure git, samba, docra itself and many more…)
# -commented languages (Bash, TCL, many configuration file formats…)
AsciiDoc text files
YACC and Bison grammars
VHDL files
XML files

Installation

At the moment, no prebuilt binary packages are available and you will have to build the project from source as described in the Developer Guide.

Configuration

By default, docra will search for a configuration file called Docrafile or .docra in the current working directory. ¶ If additional directories are given on the command line, they will be interpreted as source paths and docra will try to load the default configuration file from there as well before trying to analyze the directory contents. ¶ The location of the configuration file can be overridden with the -c (--config) command line option. ¶

Example

As an example, this is the configuration used by docra when generating its own documentation: ¶

 (1)
[common]
  outdir = docs/
  exclude = tests/
  exclude = extras/
  exclude = *.html
  exclude = *.pdf
  exclude = *.png

 (2)
[path "./"]

[path "extras/docker/"]
  level = 2

 (3)
[lexer "clike"]
  include = *.lpp

 (4)
[template "docs/dg.dc"]

[template "docs/ug.dc"]

[template "docs/docra.dc"]

1	Common section
2	Path specifications
3	Lexer-specific section
4	Templates

The common section defines global parameters, for example the default output directory (this can be overriden on the command line) and some global exclude patterns for files that we do not want to be scanned by docra (in addition to the exclude patterns that are predefined by default).

The path sections define, using relative paths, the files that we want to analyze. If no such section is present then the current working directory will be traversed completely. The second path directive shows how to explicitly add a path that is otherwise excluded by the global settings. The level parameter defines the nesting level to be used when generating relative links in the cross-referenced output. This value defaults to the length of the path so level = 2 would not be necessary here but is included to complete the example.

The lexer section defines parameters that are specific to one of the different lexers implemented in docra. These are described further in the documentation. What these lines accomplish is to analyze any file containing a flex scanner with the same rules used to analize C-like languages. Each lexer comes with a predefined list of extensions that it can recognize.

The template section defines the docra templates to be used when generating the AsciiDoc output, by convention these files have a .dc or .dcrt extension. ¶

Syntax

The configuration file format is very similar to git. Lines that start with a # or ; character are ignored and can be used for comments (including docra documentation!). The rest of the file is composed of sections and variables. A section begins with the name of the section in square brackets: ¶

[sectionname]

and ends when the next section begins or the file ends. In some cases, a section must be defined with an additional subsection name in quotes, like so:

[sectionname "subsectionname"]

Any non-empty line that is not a section header is recognized as setting a variable, since each variable must belong to a section, the first variable in a section must be preceded by a section header. Variables follow the following format:

variable = value

The following sections and variables are recognized: ¶

[common] section

This section defines global parameters that are applied in addition to the internal defaults, the following variables are recognized in this section: ¶

exclude = pattern

Exclude rules determine whether a file or directory will be ignored by docra while traversing a given filesystem hierarchy. ¶ Several rules are defined by default to exclude unrecognized file formats, hidden ("dot") folders or paths that do not contain source code. ¶

include = pattern

Include rules determine whether a file or directory that would have otherwise been excluded should instead be analyzed by docra. ¶

outdir = output/path

Sets the directory where the AsciiDoc files should be generated. The current working directory is used by default. The -o command line option has higher precedence than this parameter and can be used to override it. ¶

verbose = verblevel

The global verbosity level, identified by an integer. The following levels are defined, in increasing order of verbosity:

0 : Display critical errors only
1 : Display warnings
2 : Display information messages (default)
3 : Display extra information for debugging
4 : Trace internal state of each lexer and parser operation ¶

The verbosity level can be further increased or decreased from the command line, using the -v, -V and -q parameters. ¶

[lexer "type"] section

This section can be used to control how docra decides what lexer to use to analyze a given file. The following variables are available:

include = pattern: Files matching this pattern will be scanned by the given lexer. If multiple lexers accept the same patterns the result is undefined.
exclude = pattern: Files matching this pattern will not be scanned by this lexer. Another lexer may accept this pattern instead.

The following lexer types are available: ¶

adoc: This lexer recognizes line comments and block comments according to the AsciiDoc specification. ¶ This lexer is used by default on AsciiDoc files. ¶ This lexer is also used to scan docra templates. ¶
clike: This lexer recognizes the syntax used for line comments and block comments in many C-inspired languages. In addition it can do some very rudimentary semantic analysis by tracking the nesting level of {…}-delimited blocks as it scans the input. This latter information is used to simplify the generation of cross-referenced identifiers. ¶ This lexer is used by default on C, C++, Javascript, JSON, Java, Kotlin and Groovy files. ¶
hash: This lexer is extremely simple and will simply recognize a # character as the beginning of a comment until the end of the line. Anything else is classified as code. ¶ This lexer is used by default on CMake, shell, YAML, TCL, make and docker files. ¶
ini: This lexer will recognize any line starting in # or ; as a comment. In addition it recognizes INI-style section names in square brackets. The generated token stream uses the latter information to classify together all lines within a given section. ¶ This lexer is used by default on INI and docra configuration files. ¶
python: Very similar to the hash lexer, the Python lexer only adds the ability to deduce scope nesting by tracking leading whitespace on each line. ¶
verilog: This lexer recognizes basic Verilog and SystemVerilog syntax (comments and blocks). ¶
vhdl: This lexer tries to recognize the structure of VHDL code, it is based on the VHDL 2008 grammar kindly provided by Sigasi at this page. ¶
xml: This lexer recognizes XML comments. Like the clike lexer it tracks the current nesting level within the document. ¶ This lexer is used by default on XML files. ¶
yacc: This lexer recognizes terminal and non-terminal symbol definitions in Yacc/Bison grammars. ¶ This lexer is used by default on .y and .ypp files (Flex scanners use the clike lexer instead). ¶

[path "relative/path/"] section

A path section can be used to tell docra to traverse specific paths. ¶ By default, if no such section is present, the entire source directory is traversed. If at least one such section is present, only the paths explicitly included in the configuration will be traversed. ¶ Allowed variables are: ¶

level: The nesting level that this path starts at when traversing. This means that only the last level elements of the path will be used when constructing links to source code. This parameter defaults to the length of the path which should be sufficient in most cases. ¶

[template "file/path.dc"] section

Such a section indicates to docra to load the given file as an AsciiDoc template. These templates will be converted to AsciiDoc files after docra has collected the documentation tags from the entire source code tree. ¶

outdir = output/path: This variable has the same meaning as the common parameter of the same name, it can be used to override the output directory for a specific template. ¶

Patterns

Patterns can be used in a configuration file to select specific sets of files or directories. Two pattern syntaxes are available:

Globbing syntax: In this case only two special globbing characters are supported, * matching any number of characters and ? matching any single character.

For example, a pattern like *.adoc can be used to match any AsciiDoc file. Something like version?.log will match version1.log, versionb.log and so on.

Regular expression syntax: Given the limitations of the above syntax, it is possible to use full regular expressions (as implemented in the C++ standard library) by starting the pattern with a $ sign (the dollar will not be part of the regular expression).

For example, a pattern like $^.*/?[.]git/$ can be used to match a .git folder anywhere in the hierarchy being traversed. Note that the terminating dollar is part of the regular expression (as it matches the end of the string). Also note that this could not be accomplished just with globbing, as that would have matched any file or directory whose name ended in .git.

The second syntax is clearly more powerful but globbing is simpler and easier to read and should be preferred when possible. In these patterns, / should be used as the path separator. When running on Windows, those forward slashes will be automatically converted to backslashes, so that patterns will behave in the same way across platforms. ¶ When patterns are used to specify include and exclude filters, they are processed in the order they appear in the configuration file. The first matching pattern is used to decide whether to include or exclude a given path. ¶

Documentation

Example

Syntax

Templates

Example

Syntax

Patterns

Patterns can be used to control which tags are selected by template directives like ?includexref.

Considert the following two rules:

$^(TODO|FIXME|BUG)$``
$^(TODO|FIXME|BUG)$`$^.+$`

The first rule only includes tags under the TODO, FIXME and BUG prefix with an empty symbol name. The second only those with a nonempty symbol name.

This syntax is identical to what is used in configuration file patterns, with the only difference that in this case path separators are not touched. Both the tag prefix and the symbol are regular expressions, the symbol part is surrounded by backticks following the usual tag syntax, a regular expression used inside a tag selector should not contain backticks. ¶

-v, --verbose: Emit verbose output. Repeat option to increase verbosity level. ¶
-V, --less-verbose: Less verbose mode. Repeat option to reduce verbosity level. ¶
-q, --quiet: Quiet mode. Suppress any status output message. ¶
-c, --config=config: Load settings from config configuration file. If this option is repeated, only the last one will be considered. ¶
-o, --outdir=outdir: Output generated files to outdir. ¶
-u, --urlprefix=prefix: Use prefix as prefix when creating hyperlinks to source code. This option can be repeated after the first path specification to change its value for the following paths. ¶
-U, --urlsuffix=suffix: Use suffix as suffix when creating hyperlinks to source code. This option can be repeated after the first path specification to change its value for the following paths. ¶
-l, --level=level: For each source path specification, discard all but the last level levels of hierarchy in the output. This option can be repeated after the first path specification to change its value for the following paths. ¶
-p, --paths: Just list to stdout the paths of the files that would be parsed and exit. ¶
sourcepath: Either a folder or a regular file, to be scanned in addition to the paths included by the configuration file. ¶
template.dc: Extra template to be rendered in addition to those specified by the configuration file. ¶

Exit Status

Non-zero in case of invalid configuration and zero otherwise. ¶

Docra — User Guide

Motivation

Asciidoc(tor)

Comparisons

Doxygen

Sphinx

Language-specific tools

CWEB

Limitations

Supported languages

Installation

Configuration

Example

Syntax

[common] section

[lexer "type"] section

[path "relative/path/"] section

[template "file/path.dc"] section

Patterns

Documentation

Example

Syntax

Templates

Example

Syntax

Patterns

Appendix A: `docra(1)` man page

Synopsis

Description

Options

Exit Status