Static Lookups

Let's write some circuits weird machines

In this installment, we will introduce the concept of lookups.

Rather than trotting through yet another example with range checks, we will explore lookups by creating a "circuit" that can (very efficiently) match regular expressions. Regular expression matching has been used in projects such as zk-email, although the approach outlined here is more efficient than that of zk-email owing to the power of Halo2.

You might be thinking that regular expressions are relatively inefficient to check using a circuit over a finite field, and you would be right. Fortunately, lookups enable us to implement this kind of very non-field-friendly functionality in an efficient way. It might also sound complicated, but in fact, the whole (barebones) circuit fits in just over 200 lines of code.

Brief Detour: Regular Expressions

Cite

Some people, when confronted with a problem, think "I know, I'll use regular expressions.".
Now they have two problems.

-- Jamie Zawinski

The particular regular expression we will match is a+b+c, meaning:

  • One or more as.
  • Followed by one or more bs.
  • Followed by a single c.

For example, it would match:

  • aaabbc
  • abc

But not:

  • aaabbb
  • bbbc
  • aaac

For us, the convenient way to view a regular expression is as a "Non-Deterministic Finite Automaton" (NFA). You can visualize this as a directed graph where each node is a state and every edge is labeled with a character. You are allowed to move between states if the next character in the string matches the edge label. For instance, for our regular expression a+b+c, an NFA would look like this:

Regex

Meaning:

  • You start in a state ST_A.
  • In this state ST_A, you can either:
    • Move to ST_A if the next character is a.
    • Move to ST_B if the next character is a.
  • In state ST_B, you can either:
    • Move to ST_B if the next character is b.
    • Move to ST_C if the next character is b.
  • In state ST_C, you can:
    • Only move to ST_DONE if the next character is c.

For instance, if the string we are matching is aaabbc, we would move through the states like this:

ST_A -> ST_A -> ST_A -> ST_B -> ST_B -> ST_C -> ST_DONE

Conversely, if you have the string aaabbb, you would get stuck in state ST_B after the third b and not be able to move to ST_C. We can represent this "NFA" as a table of valid state transitions:

StateCharacterNext State
ST_AaST_A
ST_AaST_B
ST_BbST_B
ST_BbST_C
ST_CcST_DONE

In Rust we could encode this table as follows:

const ST_A: usize = 1;
const ST_B: usize = 2;
const ST_C: usize = 3;

// start and done states
const ST_START: usize = ST_A;
const ST_DONE: usize = 4;

// end of file marker:
// "dummy padding character"
const EOF: usize = 0xFFFF;

// conversion of the regular expression: a+b+c
const REGEX: [(usize, usize, Option<char>); 6] = [
    (ST_A, ST_A, Some('a')),    // you can stay in ST_A by reading 'a'
    (ST_A, ST_B, Some('a')),    // or move to ST_B by reading 'a'
    (ST_B, ST_B, Some('b')),    // you can stay in ST_B by reading 'b'
    (ST_B, ST_C, Some('b')),    // or move to ST_C by reading 'b'
    (ST_C, ST_DONE, Some('c')), // you can move to ST_DONE by reading 'c'
    (ST_DONE, ST_DONE, None),   // you can stay in ST_DONE by reading EOF
];

The "values" of the states (e.g. const ST_A: usize = 1) are just arbitrary distinct integers. We introduce a special EOF character which we will use to "pad" the end of the string: our circuit has a fixed size, but we want it to accommodate strings of different lengths. We also add another transition ST_DONE -> ST_DONE for the EOF character: you can stay in the ST_DONE state forever by consuming the special EOF padding character.

If you are still confused, it simply means that we are now matching strings like:

aaabbc<EOF><EOF><EOF>...<EOF>

Of some fixed length, where <EOF> is a special character.

Note

In this example, we will just match the regular expression, but in general you want to do something more interesting with the result or restrict the string being matched.

For instance, in zk-email it is used to extract the sender's address from an email after verifying a DKIM signature on the email: in this case the string being matched comes with a digital signature (an RSA signature).

In our case, the string ends up in a column and it is trivial to use it for further processing, like hashing it to check a digital signature on it.

Configuration

In order for us to match against the regular expression, we will have the prover supply a table of transitions, we will then look up each transition in a fixed table of valid state transitions. We need two gates:

  • A gate to force the current state to be a fixed value: this is used at the start and end, to ensure that the prover starts in the ST_A state and ends in the ST_DONE state.

  • A gate to check the transition is valid: this is used to check each transition in the table.

Our configuration step looks as follows:

    fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config {
        let q_regex = meta.complex_selector();
        let q_match = meta.complex_selector();

        let st = meta.advice_column();
        let ch = meta.advice_column();

        let fix_st = meta.fixed_column();

        let tbl_st_cur = meta.lookup_table_column();
        let tbl_st_nxt = meta.lookup_table_column();
        let tbl_ch = meta.lookup_table_column();

The new thing to notice is the meta.lookup_table_column() function calls: these introduce a new column on which we can lookup rows. In our case, the lookup table will have three columns:

  1. The current state (e.g. ST_B).
  2. The next state (e.g. ST_C).
  3. The character being consumed (e.g. b).

Gate: Fix

Let us start with the gate that can force the current state to be a fixed value, to the reader, there should be no surprises here:

        meta.create_gate("fix-st", |meta| {
            let st = meta.query_advice(st, Rotation::cur());
            let fix_st = meta.query_fixed(fix_st, Rotation::cur());
            let en = meta.query_selector(q_match);
            vec![en * (st - fix_st)]
        });

It simply:

  • Reads the current state, st.
  • Reads the fixed state, fix_st.
  • Forces st = fix_st if q_match = 1.

Gate: Transition

The new magic is happening in the "transition" gate:

        meta.lookup("transition-st", |meta| {
            let st_cur = meta.query_advice(st, Rotation::cur());
            let st_nxt = meta.query_advice(st, Rotation::next());
            let ch = meta.query_advice(ch, Rotation::cur());
            let en = meta.query_selector(q_regex);
            vec![
                (en.clone() * st_cur, tbl_st_cur),
                (en.clone() * st_nxt, tbl_st_nxt),
                (en.clone() * ch, tbl_ch),
            ]
        });

Let's break it down:

  • It reads the current state, st_cur, from the st (state column).
  • It reads the next state, st_nxt, from the st (state column).
  • It reads the next character, ch, from the ch (character column).

The layout looks like this:

It then multiplies them all by the selector q_regex, meaning:

  • If q_regex = 0, then the lookup checks:
    • (0, 0, 0) in (tbl_st_cur, st_nxt, tbl_ch).
  • If q_regex = 1, then the lookup checks:
    • (st_cur, st_nxt, ch) in (tbl_st_cur, st_nxt, tbl_ch).

This 3-tuple is looked up in our table of valid state transitions.

Synthesize

With our gates and lookup table in place, we can now synthesize the circuit.

Assign Table

Let us now take a look at how we populate the table of valid state transitions:

    fn synthesize(
        &self,
        config: Self::Config, //
        mut layouter: impl Layouter<F>,
    ) -> Result<(), Error> {
        // assign the transition table
        layouter.assign_table(
            || "table",
            |mut table| {
                // convert the numbers to field elements
                let mut transitions: Vec<(F, F, F)> = vec![
                    // (0, 0, 0) is in the table to account for q_regex = 0
                    (F::ZERO, F::ZERO, F::ZERO),
                ];
                for tx in REGEX.iter() {
                    let (st_cur, st_nxt, ch) = tx;
                    transitions.push((
                        F::from(*st_cur as u64),
                        F::from(*st_nxt as u64),
                        ch.map(|c| F::from(c as u64)).unwrap_or(F::from(EOF as u64)),
                    ));
                }

                // assign the table
                for (offset, (st_cur, st_nxt, char)) in transitions //
                    .into_iter()
                    .enumerate()
                {
                    table.assign_cell(
                        || format!("st_cur"),
                        config.tbl_st_cur,
                        offset,
                        || Value::known(st_cur),
                    )?;
                    table.assign_cell(
                        || format!("st_nxt"),
                        config.tbl_st_nxt,
                        offset,
                        || Value::known(st_nxt),
                    )?;
                    table.assign_cell(
                        || format!("char"),
                        config.tbl_ch,
                        offset,
                        || Value::known(char),
                    )?;
                }
                Ok(())
            },
        )?;

Take a second to look at this code. It's mostly self-explanatory, but let's go through it:

  1. We use layouter.assign_table to get a closure with a Table argument.
  2. We convert the REGEX table into tuples of field elements:
  • We encode the state as a field element.
  • We encode each (e.g. ASCII) character as a field element in the obvious way.
  • We encode EOF as some arbitrary value outside the range of valid characters.
  1. This is then stored in transitions as a vector of such tuples.
  2. Finally, we assign the vector of tuples to the table using table.assign_cell.

Start Constraint

At the first row of our region, we will need to force the state to be ST_A (aka. ST_START). This is done by assigning the fixed column fix_st to be ST_A and turning on the q_match selector:

        layouter.assign_region(
            || "regex",
            |mut region| {
                // at offset 0, the state is ST_START
                region.assign_fixed(
                    || "initial state",
                    config.fix_st,
                    0,
                    || Value::known(F::from(ST_START as u64)),
                )?;
                config.q_match.enable(&mut region, 0)?;

Nothing new here, we saw similar code in the section of Column<Fixed>.

Assign Transitions

The meat of the circuit is in the region where we assign the sequence of states we are transitioning through, as well as the sequence of characters we are consuming:

                // assign each step
                for i in 0..MAX_STR_LEN {
                    // enable the regex automaton
                    config.q_regex.enable(&mut region, i)?;

                    // state
                    region.assign_advice(
                        || "st",
                        config.st,
                        i,
                        || {
                            self.sts.as_ref().map(|s| {
                                F::from(
                                    s.get(i) //
                                        .cloned()
                                        .unwrap_or(ST_DONE)
                                        as u64,
                                )
                            })
                        },
                    )?;

                    // character
                    region.assign_advice(
                        || "ch",
                        config.ch,
                        i,
                        || {
                            self.str.as_ref().map(|s| {
                                s.chars()
                                    .nth(i)
                                    .map(|c| F::from(c as u64))
                                    .unwrap_or(F::from(EOF as u64))
                            })
                        },
                    )?;
                }

End Constraint

At the end, similar to the start, we need to ensure that the state is ST_DONE:

                // at offset MAX_STR_LEN, the state is ST_START
                region.assign_advice(
                    || "st",
                    config.st,
                    MAX_STR_LEN,
                    || Value::known(F::from(ST_DONE as u64)),
                )?;
                region.assign_fixed(
                    || "final state",
                    config.fix_st,
                    MAX_STR_LEN,
                    || Value::known(F::from(ST_DONE as u64)),
                )?;
                config.q_match.enable(&mut region, MAX_STR_LEN)?;
                Ok(())
            },
        )?;

Exercises

The full code is at the end.

Exercise

Modify the regular expression to be a*b+c, i.e. the change of a+ to a*.

Hint

You do not need to change the circuit, only the REGEX constant.

However, this exercise is slightly more complicated than it seems at first: we must somehow be able to transition from matching as without consuming an a (in case there are zero a).

Exercise

Create a circuit which computes a single round of AES encryption.

You may ignore the key schedule.

Hint

Use four separate lookup tables to represent:

  • The S-box. A table with 256 entries.
  • The GF(2^8) addition (XOR). A table with 256x256 entries.
  • A "multiplication by 2" in GF(2^8) table. A table with 256 entries.
  • A "multiplication by 3" in GF(2^8) table. A table with 256 entries.

Hint

The above is dominated by the XOR table, which is a 256x256 table. To avoid this massive table we can use the fact that XOR acts component-wise on the bits of the input:

\[ (l_1, l_2) \oplus (r_1, r_2) = (l_1 \oplus r_1, l_2 \oplus r_2) \]

Therefore, we can replace a single lookup in a 256x256 table with two lookups in 16x16 tables.

Exercise

Create a circuit which takes an AES ciphertext as public input (instance) and a key as private input (witness) and checks that the ciphertext decrypts to a known plaintext.

use std::marker::PhantomData;

use halo2_proofs::{
    circuit::{Layouter, SimpleFloorPlanner, Value},
    dev::MockProver,
    plonk::{
        Advice,
        Circuit,
        Column, //
        ConstraintSystem,
        Error,
        Fixed,
        Selector,
        TableColumn,
    },
    poly::Rotation,
};

use ff::{Field, PrimeField};

// ANCHOR: regex
const ST_A: usize = 1;
const ST_B: usize = 2;
const ST_C: usize = 3;

// start and done states
const ST_START: usize = ST_A;
const ST_DONE: usize = 4;

// end of file marker:
// "dummy padding character"
const EOF: usize = 0xFFFF;

// conversion of the regular expression: a+b+c
const REGEX: [(usize, usize, Option<char>); 6] = [
    (ST_A, ST_A, Some('a')),    // you can stay in ST_A by reading 'a'
    (ST_A, ST_B, Some('a')),    // or move to ST_B by reading 'a'
    (ST_B, ST_B, Some('b')),    // you can stay in ST_B by reading 'b'
    (ST_B, ST_C, Some('b')),    // or move to ST_C by reading 'b'
    (ST_C, ST_DONE, Some('c')), // you can move to ST_DONE by reading 'c'
    (ST_DONE, ST_DONE, None),   // you can stay in ST_DONE by reading EOF
];
// ANCHOR_END: regex

const MAX_STR_LEN: usize = 20;

struct TestCircuit<F: Field> {
    _ph: PhantomData<F>,
    str: Value<String>,
    sts: Value<Vec<usize>>,
}

#[derive(Clone, Debug)]
struct TestConfig<F: Field + Clone> {
    _ph: PhantomData<F>,
    q_match: Selector,
    q_regex: Selector,  // enable the regex gate
    st: Column<Advice>, // current state of automaton
    ch: Column<Advice>, // current character
    tbl_st_cur: TableColumn,
    tbl_st_nxt: TableColumn,
    tbl_ch: TableColumn,
    fix_st: Column<Fixed>,
}

impl<F: PrimeField> Circuit<F> for TestCircuit<F> {
    type Config = TestConfig<F>;
    type FloorPlanner = SimpleFloorPlanner;

    fn without_witnesses(&self) -> Self {
        TestCircuit {
            _ph: PhantomData,
            str: Value::unknown(), // the string
            sts: Value::unknown(), // state of the automaton
        }
    }

    // ANCHOR: columns
    fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config {
        let q_regex = meta.complex_selector();
        let q_match = meta.complex_selector();

        let st = meta.advice_column();
        let ch = meta.advice_column();

        let fix_st = meta.fixed_column();

        let tbl_st_cur = meta.lookup_table_column();
        let tbl_st_nxt = meta.lookup_table_column();
        let tbl_ch = meta.lookup_table_column();

        // ANCHOR_END: columns

        // ANCHOR: fix
        meta.create_gate("fix-st", |meta| {
            let st = meta.query_advice(st, Rotation::cur());
            let fix_st = meta.query_fixed(fix_st, Rotation::cur());
            let en = meta.query_selector(q_match);
            vec![en * (st - fix_st)]
        });
        // ANCHOR_END: fix

        // ANCHOR: lookup
        meta.lookup("transition-st", |meta| {
            let st_cur = meta.query_advice(st, Rotation::cur());
            let st_nxt = meta.query_advice(st, Rotation::next());
            let ch = meta.query_advice(ch, Rotation::cur());
            let en = meta.query_selector(q_regex);
            vec![
                (en.clone() * st_cur, tbl_st_cur),
                (en.clone() * st_nxt, tbl_st_nxt),
                (en.clone() * ch, tbl_ch),
            ]
        });
        // ANCHOR_END: lookup

        TestConfig {
            _ph: PhantomData,
            q_regex,
            st,
            ch,
            tbl_st_cur,
            tbl_st_nxt,
            tbl_ch,
            fix_st,
            q_match,
        }
    }

    // ANCHOR: assign_table
    fn synthesize(
        &self,
        config: Self::Config, //
        mut layouter: impl Layouter<F>,
    ) -> Result<(), Error> {
        // assign the transition table
        layouter.assign_table(
            || "table",
            |mut table| {
                // convert the numbers to field elements
                let mut transitions: Vec<(F, F, F)> = vec![
                    // (0, 0, 0) is in the table to account for q_regex = 0
                    (F::ZERO, F::ZERO, F::ZERO),
                ];
                for tx in REGEX.iter() {
                    let (st_cur, st_nxt, ch) = tx;
                    transitions.push((
                        F::from(*st_cur as u64),
                        F::from(*st_nxt as u64),
                        ch.map(|c| F::from(c as u64)).unwrap_or(F::from(EOF as u64)),
                    ));
                }

                // assign the table
                for (offset, (st_cur, st_nxt, char)) in transitions //
                    .into_iter()
                    .enumerate()
                {
                    table.assign_cell(
                        || format!("st_cur"),
                        config.tbl_st_cur,
                        offset,
                        || Value::known(st_cur),
                    )?;
                    table.assign_cell(
                        || format!("st_nxt"),
                        config.tbl_st_nxt,
                        offset,
                        || Value::known(st_nxt),
                    )?;
                    table.assign_cell(
                        || format!("char"),
                        config.tbl_ch,
                        offset,
                        || Value::known(char),
                    )?;
                }
                Ok(())
            },
        )?;
        // ANCHOR_END: assign_table

        // ANCHOR: region_start
        layouter.assign_region(
            || "regex",
            |mut region| {
                // at offset 0, the state is ST_START
                region.assign_fixed(
                    || "initial state",
                    config.fix_st,
                    0,
                    || Value::known(F::from(ST_START as u64)),
                )?;
                config.q_match.enable(&mut region, 0)?;
                // ANCHOR_END: region_start

                // ANCHOR: region_steps
                // assign each step
                for i in 0..MAX_STR_LEN {
                    // enable the regex automaton
                    config.q_regex.enable(&mut region, i)?;

                    // state
                    region.assign_advice(
                        || "st",
                        config.st,
                        i,
                        || {
                            self.sts.as_ref().map(|s| {
                                F::from(
                                    s.get(i) //
                                        .cloned()
                                        .unwrap_or(ST_DONE)
                                        as u64,
                                )
                            })
                        },
                    )?;

                    // character
                    region.assign_advice(
                        || "ch",
                        config.ch,
                        i,
                        || {
                            self.str.as_ref().map(|s| {
                                s.chars()
                                    .nth(i)
                                    .map(|c| F::from(c as u64))
                                    .unwrap_or(F::from(EOF as u64))
                            })
                        },
                    )?;
                }
                // ANCHOR_END: region_steps

                // ANCHOR: region_end
                // at offset MAX_STR_LEN, the state is ST_START
                region.assign_advice(
                    || "st",
                    config.st,
                    MAX_STR_LEN,
                    || Value::known(F::from(ST_DONE as u64)),
                )?;
                region.assign_fixed(
                    || "final state",
                    config.fix_st,
                    MAX_STR_LEN,
                    || Value::known(F::from(ST_DONE as u64)),
                )?;
                config.q_match.enable(&mut region, MAX_STR_LEN)?;
                Ok(())
            },
        )?;
        // ANCHOR_END: region_end

        Ok(())
    }
}

fn main() {
    use halo2_proofs::halo2curves::bn256::Fr;

    // run the MockProver
    let circuit = TestCircuit::<Fr> {
        _ph: PhantomData,
        // the string to match
        str: Value::known("aaabbbc".to_string()),
        // manually create a trace of the state transitions
        sts: Value::known(vec![
            ST_A,    // ST_A -a-> ST_A (START)
            ST_A,    // ST_A -a-> ST_A
            ST_A,    // ST_A -a-> ST_A
            ST_B,    // ST_A -a-> ST_B
            ST_B,    // ST_B -b-> ST_B
            ST_B,    // ST_B -b-> ST_B
            ST_C,    // ST_B -b-> ST_C
            ST_DONE, // ST_C -c-> ST_DONE
        ]),
    };
    let prover = MockProver::run(8, &circuit, vec![]).unwrap();
    prover.verify().unwrap();
}