Documenting your Python projects using GenAI

Diverger
4 min readFeb 27, 2024

--

by Manuel Renner

TL;DR

  • Manual documentation is a crucial yet tedious task, which overlooked, can quickly become a bottleneck in the development and maintenance of your software development projects.
  • Advances in generative AI are changing this, allowing us to automatically generate documentation for our code. However, for an LLM to document an entire project, it requires a vast amount of context about its codebase and how different parts of it interact with each other.
  • For this purpose, we’ve built pycodedoc, a Python tool that can automatically document large Python projects using LLMs.
  • It follows a bottom-up approach, first generating descriptions of low-level entities in the codebase such as functions and classes, moving up to modules and their relationships until reaching an understanding of the project as a whole.
  • The tool is fully open-source, can be executed using a CLI application, and comes with a Python API for easy integration with other tools.

Table of contents

Problem statement: manual documentation

In software development, documenting code is a crucial but often tedious task. In an ideal world, every technical detail related to the project structure, modules, classes, and entity relationships is documented, and every posterior change in your codebase is reflected in the existing documentation. However, this is either not initially done, or not maintained properly over time leading to outdated documentation that eventually renders it obsolete.

Solution: generative AI

pycodedoc is a Python tool that leverages generative AI to automatically generate documentation. It analyses your Python project to generate comprehensive markdown documentation that includes an overview of the project, detailed descriptions of modules, classes, and their relationships, and even graphs illustrating the code’s execution flow.

The tool includes an easy-to-use command-line interface (CLI) as well as a flexible Python API for those wanting to extend the tool or integrate it with other Python applications.

How it works

The tool uses the OpenAI API to generate descriptions at different levels of detail within the codebase. The workflow it follows is outlined below:

pycodedoc workflow

Project documentation is generated using a bottom-up approach: it starts with generating descriptions of low-level entities in the codebase such as functions and classes before moving up to modules and their relationships until it can generate an overview of the project as a whole. The output from each step is used as context for the following step, allowing the LLM to gain a gradual understanding of the overall codebase as it moves up the process.

Installation

The Python package can be installed from Pypi using the following command:

pip install pycodedoc

Additional requirements

  • OpenAI Key: pycodedoc uses OpenAI models to generate the documentation. Before running the tool, make sure to set the OPENAI_API_KEY as an environment variable.
  • Graphviz [optional]: you will need to install Graphviz if you want pycodedoc to generate execution flow graphs of your code. An easy way of installing it is via conda: conda install graphviz

CLI usage

The easiest way to run the tool is via the CLI. If you have cloned the pycodedoc repository and want to try the tool by making it document itself, you can run the following command (this will cost approx. $0.01):

pycodedoc -d src/pycodedoc

The output is written to a markdown file at “./docs/project-doc.md”.
NOTE: Only Python files are currently used for documenting the project.

Here is a comprehensive list of the options you can use as part of the CLI:

List of options for CLI usage

See project’s README for more details on the CLI usage.

API usage

Additionally to the CLI, pycodedoc can be used via its Python API. This enables you to build on top of the tool or integrate with other existing Python projects.

Generating the full documentation

The below code is the equivalent of running pycodedoc -d src/pycodedoc using the CLI:

from pycodedoc import DocGen

# to modify the prompts used, you can pass a dict reflecting the prompts.toml file to the `prompts=` argument
docgen = DocGen(base_dir="src/pycodedoc") # other configurations are the same as in the CLI
docgen.generate_documentation()

Generating part of the documentation

The generate_descriptions method can be used to generate descriptions at different levels of detail in the codebase.

from pycodedoc import DocGen

# to modify the prompts used, you can pass a dict reflecting the prompts.toml file to the `prompts=` argument
docgen = DocGen(base_dir="src/pycodedoc") # other configurations are the same as in the CLI
docgen.generate_descriptions("functions")
# OUTPUT -> list of descriptions for each function in the project
docgen.generate_descriptions("functions", module_path="parser.py")
# OUTPUT -> list of descriptions for the functions inside the parser.py module
docgen.generate_descriptions("classes")
# OUTPUT -> list of descriptions for each classes in the project
docgen.generate_descriptions("classes", module_path="parser.py")
# OUTPUT -> list of descriptions for the classes inside the parser.py module
docgen.generate_descriptions("modules")
# OUTPUT -> list of descriptions for each modules in the project
docgen.generate_descriptions("modules_relations")
# OUTPUT -> list of descriptions for each classes in the project
docgen.generate_descriptions("project")
# OUTPUT -> project description [str]. NOTE: first you need to generate the modules descriptions

Together, all of these steps make up for generating the overall project’s documentation.

Conclusion

By bridging the gap between the necessity for comprehensive documentation and the desire to focus on core development tasks, pycodedoc aims at boosting software development productivity, improving not only documentation quality and relevance but also saving software developers time and energy.

While the tool currently focuses on Python projects only, we are working on generalising it to other programming languages so that it can document larger and more complex codebases with a broader set of technologies.

We hope you found this article useful. If you have any questions please reach out and we’ll be happy to help you in your inquiries. Happy coding!!

--

--

Diverger
Diverger

Written by Diverger

Inteligencia Artificial Generativa aplicada para los profesionales de la información y para el desarrollo de software.

Responses (1)