.. title: A Hint for Language-Specific Runtime Function Optimization in RPython's Meta-JIT
.. slug: record-known-result
.. date: 2025-05-23 20:00:00 UTC
.. tags: pypy, jit
.. category: 
.. link: 
.. description: 
.. type: rest

**Meta Note:** This post was in my draft folder for the PyPy blog for three
years, and various people gave the feedback "wtf are you even talking about".
Since I haven't worked on it since then I clearly am not going to fix anything
about it, so I'll just post it almost unchanged to my personal blog now.

RPython's meta-JIT cannot reason about the properties of many RPython
implementation-level functions, because those are often specific to the
different interpreters written in RPython. Expressing language-specific
optimizations on these functions would allow the JIT to more effectively reason
about language-specific data structures and potentially remove costly calls
into the runtime support code. In this blog post I will describe a new hint
that makes it possible to express invariants about the functions of the data
structures used in a specific language interpreter. These hints make it
possible for the meta-JIT optimizer to do more language-specific optimizations
across function calls
without having to change the meta-JIT itself.

Introduction and Background
===========================

RPython has a meta-JIT, that can be applied to a variety of languages, one of
them being PyPy, a Python implementation (that's really two implementations,
one PyPy2, one PyPy3). Most modern languages have many
built-in functions and data-types that are written in the implementation
language (RPython, in our case, C in the case of CPython). We want the JIT
optimizer to be able to reason about these functions and datatypes.

One way to do this would be to extend the meta-JIT by adding a new language-specific
optimization pass that knows how to deal with these functions. This approach
however goes against the idea of a meta-JIT, since we want the JIT to be
language independent.

So we developed various tools in the past that make it
possible for the author of RPython interpreters to express various
optimizations. These tools take the forms of *hints*. Hints are small
changes in the RPython source of the interpreter that communicate with the
meta-JIT. Hints can take one of two major forms, either they are special
functions that will be called at runtime from the RPython code of the
interpreter, or decorators that can be applied to the RPython functions that
make up the interpreter. Both the special functions as well as the decorators
are part of the ``rpython.rlib.jit`` library.

One thing I want to stress here at the beginning is that all these hints, and
everything I talk about in the rest of the blog post, is needed by the people
that are working on the RPython code that makes up the PyPy Python interpreter
(or other languages implemented in RPython). "Regular" Python code that just
runs on PyPy does not need these hints, and in fact cannot use them.


The ``@elidable`` Decorator
---------------------------

One very important decorator is called ``@elidable``, which makes it possible
to mark an RPython
function as pure (actually a small generalization, but for the purpose of this
post thinking of them as pure is fine). This decorator was described `in a blog
post in 2011`__, it was still called ``@purefunction`` then. The post later
turned into a paper__.

.. __: https://www.pypy.org/posts/2011/03/controlling-tracing-of-interpreter-with-871085470935630424.html
.. __: https://www3.hhu.de/stups/downloads/pdf/BoCuFiLePeRi11.pdf

An elidable function will be constant-folded by the JIT if all its arguments
are constant. If it can't be removed because some arguments aren't constant,
it's still subject to `common subexpression elimination`__ (CSE). Subsequent
calls to the same function with the same arguments can be removed and replaced
by the previous result.

.. __: https://en.wikipedia.org/wiki/Common_subexpression_elimination

There are lots of examples of ``@elidable`` functions, both in RPython in
general and in the PyPy interpreter. In the post linked above they are used for
optimizing object operations in a tiny example interpreter.

But also a lot of functionality that is more directly exposed to the Python
programmer is elidable. Most string methods are elidable for example:
``str.lower`` will always return the same result for some argument, and if you
call it twice on the same string, it will also return the same result. So
adding the ``@elidable`` decorator to the implementation of these string methods
allows the JIT to constant-fold them if they are applied to a constant string, and to
remove a second identical call.

Another big class of examples are all the
implementation functions for our big integer operations, which underlie the
Python ``int`` type once the values don't fit into a machine word anymore. On
the implementation level we implement those in an ``rbigint`` class, and most
methods of that are also elidable. This enables the JIT to constant-fold big
integer addition, and do CSE on big integer arithmetic.

Limitations of ``@elidable``
-------------------------------

This is all very useful! But it's still only a limited amount of things that
the interpreter author can express to the JIT with this one function decorator.
Recently (*ie in 2022*) I merged a branch that adds two new hints that the interpreter author
can use to communicate possible optimizations to the meta-JIT! The work has
been an ongoing project since a while. So far, only the meta-JIT support has
been merged, in future enhancements we plan to apply them to the PyPy Python
interpreter where appropriate.


``record_known_result``
=======================

So, what are the new hints? One of them is called ``record_known_result``, and that
hint is what this blog post is about. The other is called
``record_exact_value``, it's conceptually quite simple, but it's much harder to
see how it can be used. It was implemented by Lin Cheng from Cornell, and it is
described (together with a possible optimization to PyPy that uses the hint) in
another paper__.

.. __: https://dl.acm.org/doi/10.1145/3368826.3377907

What is ``record_known_result`` used for? One of the limitations of ``elidable``
is that often there are properties that connect *several* function calls that
are connected in some way. Sometimes there are functions that are inverses of
each other, so that ``f(g(x)) == x``  for all ``x`` (example: negation on
numbers is its own inverse, ``--x == x``). Sometimes functions are
idempotent__, which means that if you call the function several times you can
remove all but the first call. An example would be ``abs`` on numbers, after
the first call to ``abs`` the result is positive, so calling the function again
on the result has no effect, i.e. ``abs(abs(x)) == abs(x)``. These properties
could in theory be nicely used by a hypothetical optimizer that knows about
them and the functions. However, as described above, we don't want to change
the meta-JIT to add knowledge about interpreter-specific functionality. So we
wanted to add a hint that can express them to the meta-JIT.

.. __: https://en.wikipedia.org/wiki/Idempotence#Idempotent_functions

What could hints look like, that make it possible to express these
properties? At first I was experimenting with some declarative decorators
like ``@idempotent`` and ``@is_inverse_of(func)`` but I felt like it wouldn't
scale to add lots of these decorators and support for all of them in the
meta-JIT. In the end I found a not fully obvious hint that is not a
decorator, that is powerful enough to implement at least these last two and
many similar patterns. This hint piggy-backs on the existing CSE
implementation of pure functions in the meta-JIT.

The hint works as follows: It is a new function called
``record_known_result(result, func, *args)``. Calling that function has no
direct effect at runtime. Instead, it communicates to the meta-JIT, that if
it sees a function call to ``func`` with the arguments ``*args``, it can replace
the result of that function call with ``res``.

Since this is pretty abstract, it's best to look at an example. How would you
use this to express the fact that ``rbigint_neg`` is its own inverse, which is
function in the runtime that is responsible for computing the negative value
of a big integers? The implementation of ``rbigint_neg`` looks roughly like
this (it's actually a method and a tiny bit more complicated, but not really
significantly so):

.. code:: python

    @elidable
    def rbigint_neg(self):
        return rbigint(digits=self.digits, sign=-self.sign)

If we want to use the new hint to express that
``rbigint_neg(rbigint_neg(x)) == x``,
we need to rewrite the function somewhat, by introducing a pure helper
function that does the actual computation, and turning the original function
into a wrapper that calls the helper:

.. code:: python

    @elidable
    def _rbigint_neg_helper(self):
        return rbigint(digits=self.digits, sign=-self.sign)
        
    def rbigint_neg(self):
        res = _rbigint_neg_helper(self)
        record_known_result(self, _rbigint_neg_helper, res)
        return res

But when we trace the ``rbigint_neg`` function, the ``record_known_result`` hint tells the JIT
the following information: if at any point in the future (meaning further down the
trace) we see another call to ``_rbigint_neg_helper`` with ``res`` as the argument,
we can replace that call directly with ``self``, which is exactly the property
that ``_rbigint_neg_helper`` is its own inverse. As another example, let's
express the idempotence of ``bytes.lower``. We can imagine the implementation
looking something like this (`the "real" implementation`__ is actually quite
different, we don't want the extra copy of ``bytes.join``):

.. __: https://foss.heptapod.net/pypy/pypy/-/blob/ab597702f7d9a267d3ae7c3fc91a5f25cd36a12e/rpython/rtyper/lltypesystem/rstr.py#L526

.. code:: python

    @elidable
    def bytes_lower(b):
        # implementation looks very different in practice
        # just an illustration!
        res = ['\x00'] * len(b)
        for i, c in enumerate(b):
            if 'A' <= c <= 'Z':
                c = chr(ord(c) - ord('A') + ord('a'))
            res[i] = c
        return b"".join(res)

To express that the function is idempotent, we need to express that
``bytes_lower(bytes_lower(b)) == b``. We express this again with the same
approach, move the implementation into a helper function, call the helper from
the original function and call ``record_known_result`` too:

.. code:: python

    @elidable
    def _bytes_lower_helper(b):
        ... # as above

    def bytes_lower(b):
        res = _bytes_lower_helper(b)
        record_known_result(res, _bytes_lower_helper, res)
        return res


This tells the meta-JIT that if ``res`` is later itself passed to
``_bytes_lower_helper``, it can remove that call and replace it immediately
with ``res`` (because ``res`` is already all lower cased, as its the result of
a call to ``lower``), i.e. that ``_bytes_lower_helper`` is idempotent. (There
are also other properties of lower and upper we could express in this way, for
example that ``bytes.lower(bytes.upper(x)) == bytes.lower(x)``, let's leave it
at that for now though).

Both of these usage patterns of ``record_known_result`` could of course also be
pulled out into general decorators again. For example a generic ``@idempotent``
decorator could be implemented like this:

.. code:: python

    def idempotent(func):
        # idempotent implies elidable
        func = elidable(func)
        def wrapper(arg):
            res = func(arg)
            record_known_result(res, func, res)
            return res
        return wrapper

Then the decorator could be used like this for ``bytes_lower``:

.. code:: python

    @idempotent
    def bytes_lower(b):
        # implementation as in the original code above
        ...


Implementing ``record_known_result``
========================================

How is ``record_known_result`` implemented? As I wrote above, the implementation
of that hint builds on the existing support for ``elidable`` functions
in the optimizer of the meta-JIT. There are several optimizations that do
something with elidable function calls: `constant folding`__, CSE__, `dead code
elimination`__. Let's look at those work on ``elidable`` functions:

 - Constant folding removes calls to elidable functions with constant results
   (technically this is a bit complicated, but conceptually this is what
   happens).
 - CSE will replace calls to an elidable function by previous results, if they
   appear a second time further down the trace.
 - Dead code elimination will remove elidable function calls in the trace that
   have unused results.

.. __: https://en.wikipedia.org/wiki/Constant_folding
.. __: https://en.wikipedia.org/wiki/Common_subexpression_elimination
.. __: https://en.wikipedia.org/wiki/Dead_code_elimination

So if there is a trace like this:

.. code:: python

    r1 = call_elidable((f), (1)) # constant-folded to 17
    r2 = call_elidable((g), a, b)
    r3 = call_elidable((g), a, b) # replaced by r2
    r4 = call_elidable((h), c, d) # removed, result unused
    print(r1, r2, r3)

It will be optimized to:

.. code:: python

    r2 = call_elidable((g), a, b)
    print((17), r2, r2)

Some general notes about these traces: They are all in `single-static-assignment
form`__ (SSA), meaning that every variable is assigned to only once. `¹`_
They are also slightly simplified compared to "real" traces.

.. __: https://en.wikipedia.org/wiki/Static_single_assignment_form

Let's look at how the CSE pass that optimizes elidable calls, that is part of
the meta-JIT works. In pseudocode it could look something like this:

.. code:: python

    def cse_elidable_calls(trace):
        seen_calls = {}
        output_trace = []
        for op in trace:
            if is_call_elidable(op):
                # op.args are the function,
                # followed by the arguments
                # which are variables or constants
                key = op.args
                previous_op = seen_calls.get(key)
                if previous_op is not None:
                    replace_result_with(op, previous_op)
                    # don't need to emit the op
                    continue
                else:
                    seen_calls[key] = op
            output_trace.append(op)
        return output_trace

There is quite a bit of hand-waving here, particularly around how
``replace_result_with`` can work. But this is conceptually what the real
optimization does. `²`_

Making use of the information provided by ``record_known_result`` is done by
changing the CSE pass in particular. Let's say you trace something like this:

.. code:: python

    x = bytes_lower(s)
    ... some other code ...
    y = bytes_lower(x)
    print(x, y)


This should  trigger the idempotence optimization. The resulting trace could
look like this:

.. code:: python

    # bytes_lower itself is inlined into the trace:
    r1 = call_elidable((_bytes_lower_helper), s1)
    record_known_result(r1, (_bytes_lower_helper), r1)
    ... intermediate operations ...

    # second call to bytes_lower inlined into the trace:
    r2 = call_elidable((_bytes_lower_helper), r1)
    record_known_result(r2, (_bytes_lower_helper), r2)
    print(r1, r2)

The CSE pass on elidable functions will now optimize away the call that results
in ``r2``. It does this not by replacing ``r2`` by a previous call to
``_bytes_lower_helper`` with the same arguments (such a call doesn't exist),
but instead makes use of the information conveyed by the first
``record_known_result`` trace operation. That operation states that if you see
a call like the second ``_bytes_lower_helper`` you can replace it with ``r1``.
The resulting optimized trace therefore looks like this:

.. code:: python

    r1 = call_elidable((_bytes_lower_helper), s1)
    ... intermediate optimizations, optimized ...
    # call removed, r2 replaced with r1
    print(r1, r1)

The ``record_known_result`` operations are also removed, because further
optimization passes and the backends don't need them. To get this effect, we
have to change the pseudocode above to teach the CSE pass about
``record_known_result`` operations in the following way:

.. code:: python

    def cse_elidable_calls(trace):
        seen_calls = {}
        output_trace = []
        for op in trace:
            # <---- start new code
            if is_record_known_result(op):
                # remove the first argument,
                # which is the result
                key = op.args[1:]
                # the remaining key is function called,
                # followed by arguments, like below
                seen_calls[key] = op.args[0]
                # don't emit the record_known_result op
                continue
            # end new code ---->
            if is_call_elidable(op):
                # op.args are the function,
                # followed by the arguments
                # which are variables or constants
                key = op.args
                previous_op = seen_calls.get(key)
                if previous_op is not None:
                    replace_result_with(op, previous_op)
                    # don't need to emit the op
                    continue
                else:
                    seen_calls[key] = op
            output_trace.append(op)
            return output_trace

That's all! So from the point of view of the implementation of CSE of elidable
functions, the new hint is actually very natural.

In the case of function inverses, dead code elimination also plays an important
role. Let's look at the trace of a double negation, maybe like this:
``x = -y; ...; print(-x)``:

.. code:: python

    r1 = call_elidable((_rbigint_neg_helper), a1)
    record_known_result(a1, (_rbigint_neg_helper), r1)
    ... intermediate stuff
    r2 = call_elidable((_rbigint_neg_helper), r1)
    record_known_result(r1, (_rbigint_neg_helper), r2)
    print(r2)

After CSE, the second call is removed and the trace looks like this, because
``r2`` was found to be the same as ``a1``:

.. code:: python

    r1 = call_elidable((_rbigint_neg_helper), a1) # dead
    ... intermediate stuff, CSEd
    # call removed
    print(a1)

Now dead code elimination notices that the first call is not needed any more
either and removes it.

What is good about this design? It very neatly ties into the existing
infrastructure and is basically only about 100 lines of changes in the
meta-JIT. The amount of work the optimizer does stays essentially the same, as
the new hints are basically directly usable by CSE which we already do anyway.

Performance effects
====================

So far, we haven't actually used this new hint in PyPy much. At this point, the
hint is only a new tool in the interpreter author toolbox, and we still need to
find the best places to use this tool. The only use of the hint so far is an
annotation that tells the JIT that encoding and decoding to and from utf8 are
inverses of each other, to be able to optimize this kind of code:
``x = someunicode.encode("utf-8").decode("utf-8")`` by replacing ``x`` with
``someunicode`` (of course in practice there is usually again some distance
between the encode and decode calls). This happens in a bunch of places in real
code that I saw, but I didn't do a super careful study of what the performance
effect is yet.

Limitations
=============

What are the problems and the limitations of the approach I described in this
post?

Correctness remains tricky! If you write the wrong hints, the meta-JIT will
potentially miscompile your users' programs. To at least get some signal for
that, ``record_known_result`` actually performs the hinted call and does an
assert on the result if you run the program untranslated while executing tests.
In combination with for example property-based testing this can find a lot of
the bugs, but is of course no guarantee.

Many things aren't expressible. The new hint is much less powerful than some of
the recent pattern based optimization systems (e.g. `metatheory.jl`__) that
allow library authors to
express rewrites. Instead, we designed the hint to minimally fit into the
existing optimizers at the cost of power and ease of use. The most
obvious limitation compared to pattern based approaches is that the
``record_known_result`` hint cannot quantify over unknown values. It can only use values
that are available in the program. As an example, it's not really possible to
express that ``bigint_sub(x, x) == bigint(0)`` for *arbitrary* big integers
``x``.

.. __: https://arxiv.org/abs/2112.14714

Another limitation of the hint is that currently it is only applicable to
pure/elidable functions. This makes it not really applicable to any kind of
*mutable* data structure. As an example, in theory ``sorted(list)`` is
idempotent, but only as long as the lists involved aren't mutated between the
two calls to ``sorted``. Reasoning about mutation doesn't really fit into the
model easily. The meta-JIT itself is actually able to do a lot of tracking of
what kinds of mutations occurred and what the heap must look like. But we
haven't found a good way to combine this available information with
user-provided information about function behaviour.

Conclusion
==============

We added two new hints to RPython's meta-JIT that allow the interpreter author
to express language-specific optimizations. We are still only getting used to
these new hints and their possible applications and will need to collect more
experience about how big the performance implications are in practice for real
programs.

Footnotes
------------

.. _`¹`:

¹ In fact, there is not really a concept of "variable" at all, instead all
variables are identical with the operations that produce them.

.. _`²`:

² Some details on the hand-waving: replacing ops with other ops is implemented
using a union-find__ data-structure to efficiently allow doing arbitrary
replacements. These replacements need to influence the lookup in the
``seen_calls`` dict, so in practice it's not even a dictionary at all. Another
way that the pseudocode is simplified is that we don't in practice have tiny
passes like this that go over the trace again and again. Instead, we have a
single optimization pass that goes over the trace in forward direction once.

.. __: https://en.wikipedia.org/wiki/Disjoint-set_data_structure

 
