By Esras in Technology — 18 Oct 2023

An Opinionated Take on Python Formatting

As opinionated as Black, without the lack of options!

Recently, a friend pointed me towards Ruff (a Python linter), and that they have a formatter for Python that's in alpha. I am not fond of Black's approach, but have found that projects like YAPF and Clang-Format sometimes offer too much configurability, and it leads either to decision paralysis or awful fights within teams.

Ruff's default formatter right now, with configuring line length and using tabs for indentation is almost exactly what I want. So, let's break down some of the decisions via my .style.yapf file and show why I made each decision.

Will this be entirely consistent? Probably not, but I try to be and give justification for each one.

TL;DR

ruff format ./* is actually pretty good and you should probably use it.

Basis

There are a few guiding principles for the decisions I made:

Code is semantic; we want the formatting to represent the meaning
Monitors are larger, so more code can fit on the screen at once, even if using a larger font size

Almost every decision derives from those two principles. Barring the above, falling back to community convention is preferred.

The Configuration

First, here's the complete configuration of my .style.yapf file, which lives in the root of a given Python project.

I use the PyYapf Sublime Text plugin to auto-format as I write / review code, and also have the yapf pre-commit hook so that it runs on every commit, if for some reason my editor missed something.

Each section below will be divided into sections based on the type of thing it is modifying, while the main file is organized alphabetically.

Complete style file

[style]
based_on_style = pep8
align_closing_bracket_with_visual_indent = false
allow_multiline_lambdas = false
allow_multiline_dictionary_keys = false
allow_split_before_default_or_named_assigns = true
allow_split_before_dict_value = true
arithmetic_precedence_indication = false
blank_line_before_nested_class_or_def = true
blank_line_before_class_docstring = false
blank_line_before_module_docstring = false
blank_lines_around_top_level_definition = 2
blank_lines_between_top_level_imports_and_variables = 1
coalesce_brackets = false
column_limit = 120
continuation_align_style = fixed
continuation_indent_width = 1
dedent_closing_brackets = true
indent_closing_brackets = false
disable_ending_comma_heuristic = true
each_dict_entry_on_separate_line = true
force_multiline_dict = true
indent_dictionary_value = true
indent_width = 1
indent_blank_lines = false
join_multiple_lines = false
space_between_ending_comma_and_closing_bracket = true
space_inside_brackets = false
spaces_around_power_operator = false
spaces_around_default_or_named_assign = false
spaces_around_dict_delimiters = false
spaces_around_list_delimiters = false
spaces_around_subscript_colon = false
spaces_around_tuple_delimiters = false
spaces_before_comment = 2
split_arguments_when_comma_terminated = false
split_all_comma_separated_values = true
split_all_top_level_comma_separated_values = true
split_before_arithmetic_operator = true
split_before_bitwise_operator = true
split_before_closing_bracket = true
split_before_dict_set_generator = true
split_before_dot = true
split_before_expression_after_opening_paren = true
split_before_first_argument = true
split_before_logical_operator = true
split_before_named_assigns = true
split_complex_comprehension = true
use_tabs = true

Overall

use_tabs = true - The biggest one up front. You can configure your editor to make tab sizes whatever you'd like them to be, and in the splitting / blocks section, I'll show why I think this is preferred and obviates the need for "alignment" concerns. Indentation helps with indicating when something is subordinate, and is simpler on analyzers (citation needed).

column_limit = 120 - I find that this is a reasonable compromise between "this line is too long" and "why is this line only using a fourth of my monitor?" Realistically, I think it can be fine to exceed the line length in some circumstances, and I think that most code will be below this line length anyway (barring complex list comprehensions and lambdas).

disable_ending_comma_heuristic = true - The point of the code formatter is to format it consistently always. Having an extra knob inside of your code that is not explicitly called out like a #yapf: disable-type comment just makes it more complicated.

based_on_style = pep8 - If a new option comes along, fall back to PEP8 as a sane default.

Indentation

Intent: Visually distinguish when a set of statements is subordinate to another, such as with code flow through if / else and for statements or in class / function declarations or invocations.

indent_width = 1 - One tab is all you need.

indent_blank_lines = false - Don't add whitespace to diffs unnecessarily.

indent_closing_brackets = false and dedent_closing_brackets = true - Oddly two config options, but the closing character should be on another line at the same level as the expression, as it is closing out that statement and indicating it is complete, as opposed to half-finished.

align_closing_bracket_with_visual_indent = false - Similar config option to the above two.

Example

# Preferred
    config = {
        'key1': 'value1',
        'key2': 'value2',
    }  # <--- this bracket is dedented and on a separate line

    time_series = self.remote_client.query_entity_counters(
        entity='dev3246.region1',
        key='dns.query_latency_tcp',
        transform=Transformation.AVERAGE(window=timedelta(seconds=60)),
        start_ts=now()-timedelta(days=3),
        end_ts=now(),
    )  # <--- this bracket is dedented and on a separate line

# Undesireable
    config = {
        'key1': 'value1',
        'key2': 'value2',
        }  # <--- this bracket is indented and on a separate line

    time_series = self.remote_client.query_entity_counters(
        entity='dev3246.region1',
        key='dns.query_latency_tcp',
        transform=Transformation.AVERAGE(window=timedelta(seconds=60)),
        start_ts=now()-timedelta(days=3),
        end_ts=now(),
        )  # <--- this bracket is indented and on a separate line

Taken directly from the yapf readme.

coalesce_brackets = false - Prefer to keep the ending characters separated to continue enforcing the proper indentation. People in the JS community might disagree on this one.

continuation_align_style = fixed and continuation_indent_width = 1 - When statements need to split, prefer to indent only once and only by one tab. If something is further split inside the expression, indent it further by another single tab.

indent_dictionary_value = true - If, for some reason, you have a key and a value that cannot fit on the same line, even with all of the other splitting, prefer it to be indented to indicate that it is not a key.

Code Blocks (Braces)

Intent: Code blocks should be clearly separated and opening and closing elements kept separate to help identify contained elements.

split_arguments_when_comma_terminated = false - Similar to the heuristic option above, don't make the process more complicated than it already is.

split_all_comma_separated_values = true - Despite the name, this only splits if the line length is exceeded. So if your simple dictionary fits on one line, great! Otherwise, don't split in the middle of it and have two lines with similar meaning, instead make it clear how many elements it has.

Example

# This is < 120 characters, so will not be broken up.
my_simple_dict = {"key_one": "value_one", "key_two": "value_two"}

#The following is too long
my_simple_dict = {"key_one": "value_one", "key_two": "value_two", "key_three": "value_three", "key_four": "value_four", "key_five": "value_five"}

# Instead of this (or a variant):
my_simple_dict = {"key_one": "value_one", "key_two": "value_two", "key_three": "value_three", "key_four": "value_four", 
"key_five": "value_five"}

# It should look like this:
my_simple_dict = {
    "key_one": "value_one",
    "key_two": "value_two",
    "key_three": "value_three",
    "key_four": "value_four",
    "key_five": "value_five"
}

split_before_{arithmetic|bitwise|logical}_operator = true - If you have multiple operations, it's easier to show what is happening to the next element if the operator is on a new line.

Example

# If the following must be split...
value = one + two - three + four / five

# Prefer this:
value = one
+ two
- three
+ four
/ five

# Over this:
value = one +
two - 
three +
four /
five

split_before_closing_bracket = true - Should be irrelevant given some other splitting options, but if a split is happening, put the closing bracket on its own line to respect indentation rules.

split_before_dict_set_generator = true - I'll just link to the documentation on this one. I don't know that anything I've written has had to use this.

split_before_dot = true - Same justification as the arithemetic operator above.

split_before_expression_after_opening_paren = true - If you are splitting an expression in the first place, then don't keep the first argument on the same line, it is associated with the items on the new lines, not the calling expression. This is the main one that helps avoid the "Tabs for indentation, spaces for alignment" problem.

Example

function_call(argument_one, some_other_function, dictionary_defined_somewhere, **expanded_dictionary)

# Should split to:
function_call(
	argument_one, 
	some_other_function, 
	dictionary_defined_somewhere, 
	**expanded_dictionary
)

# As opposed to:
function_call(argument_one, 
	some_other_function, 
	dictionary_defined_somewhere, 
	**expanded_dictionary
)

split_before_first_argument - Same as the prior, but prefer to keep the parenthesis on the line of the outer expression.

split_before_named_assigns = true - Same as prior.

allow_split_before_default_or_named_assigns = true - Same as prior.

allow_split_before_dict_value = true - Same as prior.

split_complex_comprehension = true - For generator expressions that become very complex, this is there to help split it to multiple lines similarly to how other content is getting split, that is, before the for and if statements of the expression as much as possible.

join_multiple_lines = false - Single line if statements are some of my least favorite constructs, which likely comes from seeing them misused in C/C++. Prefer legibility over compactness.

allow_multiline_lambdas = false - I'm open to debate on this. Most of the lambdas I've seen should be one-liners, or else they should be broken into a full function.

allow_multiline_dictionary_keys = false - The key is a semantic whole and should be kept together, even if it is incredibly long for some reason.

each_dict_entry_on_separate_line = true - If splitting a dictionary, have each entry (key and value) on a separate line to distinguish data boundaries.

Spaces

Intent: Visually separate operations from one another to ensure that elements don't get blended together. The set of config options here:

space_between_ending_comma_and_closing_bracket = true
space_inside_brackets = false
spaces_around_power_operator = false
spaces_around_default_or_named_assign = false
spaces_around_dict_delimiters = false
spaces_around_list_delimiters = false
spaces_around_subscript_colon = false
spaces_around_tuple_delimiters = false
spaces_before_comment = 2
arithmetic_precedence_indication = false

space_between_ending_comma_and_closing_bracket = true - If you're using a trailing comma, distinguish it from the closing element.

space_inside_brackets = false - Setting this to true would add spaces between all braces and brackets, including in array accesses, like array[3], so disabled as the symbology already distinguishes it.

spaces_around_power_operator = false - Python's power operator (**) already visually distinguishes the two values with more spacing.

spaces_around_default_or_named_assign = false - I don't necessarily agree with this one, but Python community convention is that named / default assigns in a function definition do not have spaces.

spaces_around_{dict|list|tuple}_delimiters = false - This changes {1: 2} to { 1: 2 }. I find this unnecessary, especially given the indentation and splitting rules. Note that the tuple version of this refers to the parentheses, even though the comma is technically the delimiter.

spaces_around_subscript_colon = false - Disabled for the same reason as the power operator, the colon is already the distinction. Would otherwise convert an_array_to_slice[1:20:3] to an_array_to_slice[1 : 20 : 3]. Feels like the reading equivalent of speedbumps.

spaces_before_comment = 2 - Debatable usefulness, but I like the extra space to keep the code and comments separate. Editor theming largely makes this irrelevant. There is also a case to be made to use a multi-number variant that aligns end-comments on successive lines, but generally that's not something I'm using in my code.

arithmetic_precedence_indication = false - When using lots of numbers and operations next to one another, it can become confusing to have some of the operators lumped together. Otherwise, the splitting options should catch on anyway.

Splitting

Intent: If something should not be on a single line, then all of its elements should be split to help distinguish individual elements.

Blank Lines

Intent: Blank lines should help separate sections but not be excessive.

blank_line_before_nested_class_or_def = true
blank_line_before_class_docstring = false
blank_line_before_module_docstring = false
blank_lines_around_top_level_definition = 2
blank_lines_between_top_level_imports_and_variables = 1

blank_line_before_nested_class_or_def = true - I personally like the separation between the class and first definition. They are part of the same idea and are delineated by the indentation of the child, but it helps to separate the idea of the next chunk of code, since it is no more special than any other child function or class.

Example

# Before
class Foo:
  def function_one():
    pass

  def function_two():
    pass

---
# After
class Foo:

  def function_one():
    pass

  def function_two():
    pass

blank_line_before_{class|module}_docstring = false - The docstring itself, combined with editor highlighting, already separates this in a meaningful way. The docstring also is associated with the overarching class, even if it is indented.

blank_lines_around_top_level_definition = 2 - When there are multiple classes or definitions in a file, I prefer a bit of visual separation between them while scanning code.

blank_lines_between_top_level_imports_and_variables = 1 - No strong opinions here, there is some discussion here about another knob for isort-related configuration.

Closing Statements

Phew, that's kind of a mess, and I could go back and add more examples, but I wanted to get this out fairly quickly for some other discussions and to be able to pass it to people when they talk about editor configs.

TL;DR

Basis

The Configuration

Complete style file

Overall

Indentation

Example

Code Blocks (Braces)

Example

Example

Example

Spaces

Splitting

Blank Lines

Example

Closing Statements

Subscribe to Esras' Blog