Static Lookups
Let's write some circuits weird machines
In this installment, we will introduce the concept of lookups.
Rather than trotting through yet another example with range checks, we will explore lookups by creating a "circuit" that can (very efficiently) match regular expressions. Regular expression matching has been used in projects such as zk-email, although the approach outlined here is more efficient than that of zk-email owing to the power of Halo2.
You might be thinking that regular expressions are relatively inefficient to check using a circuit over a finite field, and you would be right. Fortunately, lookups enable us to implement this kind of very non-field-friendly functionality in an efficient way. It might also sound complicated, but in fact, the whole (barebones) circuit fits in just over 200 lines of code.
Brief Detour: Regular Expressions
Some people, when confronted with a problem, think "I know, I'll use regular expressions.".
Now they have two problems.
-- Jamie Zawinski
The particular regular expression we will match is a+b+c
, meaning:
- One or more
a
s. - Followed by one or more
b
s. - Followed by a single
c
.
For example, it would match:
aaabbc
abc
But not:
aaabbb
bbbc
aaac
For us, the convenient way to view a regular expression is
as a "Non-Deterministic Finite Automaton" (NFA).
You can visualize this as a directed graph where each node is a state
and every edge is labeled with a character.
You are allowed to move between states if the next character in the string matches the edge label.
For instance, for our regular expression a+b+c
, an NFA would look like this:
Meaning:
- You start in a state
ST_A
. - In this state
ST_A
, you can either:- Move to
ST_A
if the next character isa
. - Move to
ST_B
if the next character isa
.
- Move to
- In state
ST_B
, you can either:- Move to
ST_B
if the next character isb
. - Move to
ST_C
if the next character isb
.
- Move to
- In state
ST_C
, you can:- Only move to
ST_DONE
if the next character isc
.
- Only move to
For instance, if the string we are matching is aaabbc
,
we would move through the states like this:
ST_A -> ST_A -> ST_A -> ST_B -> ST_B -> ST_C -> ST_DONE
Conversely, if you have the string aaabbb
,
you would get stuck in state ST_B
after the third b
and not be able to move to ST_C
.
We can represent this "NFA" as a table of valid state transitions:
State | Character | Next State |
---|---|---|
ST_A | a | ST_A |
ST_A | a | ST_B |
ST_B | b | ST_B |
ST_B | b | ST_C |
ST_C | c | ST_DONE |
In Rust we could encode this table as follows:
const ST_A: usize = 1;
const ST_B: usize = 2;
const ST_C: usize = 3;
// start and done states
const ST_START: usize = ST_A;
const ST_DONE: usize = 4;
// end of file marker:
// "dummy padding character"
const EOF: usize = 0xFFFF;
// conversion of the regular expression: a+b+c
const REGEX: [(usize, usize, Option<char>); 6] = [
(ST_A, ST_A, Some('a')), // you can stay in ST_A by reading 'a'
(ST_A, ST_B, Some('a')), // or move to ST_B by reading 'a'
(ST_B, ST_B, Some('b')), // you can stay in ST_B by reading 'b'
(ST_B, ST_C, Some('b')), // or move to ST_C by reading 'b'
(ST_C, ST_DONE, Some('c')), // you can move to ST_DONE by reading 'c'
(ST_DONE, ST_DONE, None), // you can stay in ST_DONE by reading EOF
];
The "values" of the states (e.g. const ST_A: usize = 1
) are just arbitrary distinct integers.
We introduce a special EOF
character which we will use to "pad" the end of the string:
our circuit has a fixed size, but we want it to accommodate strings of different lengths.
We also add another transition ST_DONE -> ST_DONE
for the EOF
character:
you can stay in the ST_DONE
state forever by consuming the special EOF
padding character.
If you are still confused, it simply means that we are now matching strings like:
aaabbc<EOF><EOF><EOF>...<EOF>
Of some fixed length, where <EOF>
is a special character.
In this example, we will just match the regular expression, but in general you want to do something more interesting with the result or restrict the string being matched.
For instance, in zk-email it is used to extract the sender's address from an email after verifying a DKIM signature on the email: in this case the string being matched comes with a digital signature (an RSA signature).
In our case, the string ends up in a column and it is trivial to use it for further processing, like hashing it to check a digital signature on it.
Configuration
In order for us to match against the regular expression, we will have the prover supply a table of transitions, we will then look up each transition in a fixed table of valid state transitions. We need two gates:
-
A gate to force the current state to be a fixed value: this is used at the start and end, to ensure that the prover starts in the
ST_A
state and ends in theST_DONE
state. -
A gate to check the transition is valid: this is used to check each transition in the table.
Our configuration step looks as follows:
fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config {
let q_regex = meta.complex_selector();
let q_match = meta.complex_selector();
let st = meta.advice_column();
let ch = meta.advice_column();
let fix_st = meta.fixed_column();
let tbl_st_cur = meta.lookup_table_column();
let tbl_st_nxt = meta.lookup_table_column();
let tbl_ch = meta.lookup_table_column();
The new thing to notice is the meta.lookup_table_column()
function calls:
these introduce a new column on which we can lookup rows.
In our case, the lookup table will have three columns:
- The current state (e.g.
ST_B
). - The next state (e.g.
ST_C
). - The character being consumed (e.g.
b
).
Gate: Fix
Let us start with the gate that can force the current state to be a fixed value, to the reader, there should be no surprises here:
meta.create_gate("fix-st", |meta| {
let st = meta.query_advice(st, Rotation::cur());
let fix_st = meta.query_fixed(fix_st, Rotation::cur());
let en = meta.query_selector(q_match);
vec![en * (st - fix_st)]
});
It simply:
- Reads the current state,
st
. - Reads the fixed state,
fix_st
. - Forces
st = fix_st
ifq_match = 1
.
Gate: Transition
The new magic is happening in the "transition" gate:
meta.lookup("transition-st", |meta| {
let st_cur = meta.query_advice(st, Rotation::cur());
let st_nxt = meta.query_advice(st, Rotation::next());
let ch = meta.query_advice(ch, Rotation::cur());
let en = meta.query_selector(q_regex);
vec![
(en.clone() * st_cur, tbl_st_cur),
(en.clone() * st_nxt, tbl_st_nxt),
(en.clone() * ch, tbl_ch),
]
});
Let's break it down:
- It reads the current state,
st_cur
, from thest
(state column). - It reads the next state,
st_nxt
, from thest
(state column). - It reads the next character,
ch
, from thech
(character column).
The layout looks like this:
It then multiplies them all by the selector q_regex
, meaning:
- If
q_regex = 0
, then the lookup checks:(0, 0, 0)
in(tbl_st_cur, st_nxt, tbl_ch)
.
- If
q_regex = 1
, then the lookup checks:(st_cur, st_nxt, ch)
in(tbl_st_cur, st_nxt, tbl_ch)
.
This 3-tuple is looked up in our table of valid state transitions.
Synthesize
With our gates and lookup table in place, we can now synthesize the circuit.
Assign Table
Let us now take a look at how we populate the table of valid state transitions:
fn synthesize(
&self,
config: Self::Config, //
mut layouter: impl Layouter<F>,
) -> Result<(), Error> {
// assign the transition table
layouter.assign_table(
|| "table",
|mut table| {
// convert the numbers to field elements
let mut transitions: Vec<(F, F, F)> = vec![
// (0, 0, 0) is in the table to account for q_regex = 0
(F::ZERO, F::ZERO, F::ZERO),
];
for tx in REGEX.iter() {
let (st_cur, st_nxt, ch) = tx;
transitions.push((
F::from(*st_cur as u64),
F::from(*st_nxt as u64),
ch.map(|c| F::from(c as u64)).unwrap_or(F::from(EOF as u64)),
));
}
// assign the table
for (offset, (st_cur, st_nxt, char)) in transitions //
.into_iter()
.enumerate()
{
table.assign_cell(
|| format!("st_cur"),
config.tbl_st_cur,
offset,
|| Value::known(st_cur),
)?;
table.assign_cell(
|| format!("st_nxt"),
config.tbl_st_nxt,
offset,
|| Value::known(st_nxt),
)?;
table.assign_cell(
|| format!("char"),
config.tbl_ch,
offset,
|| Value::known(char),
)?;
}
Ok(())
},
)?;
Take a second to look at this code. It's mostly self-explanatory, but let's go through it:
- We use
layouter.assign_table
to get a closure with aTable
argument. - We convert the
REGEX
table into tuples of field elements:
- We encode the state as a field element.
- We encode each (e.g. ASCII) character as a field element in the obvious way.
- We encode
EOF
as some arbitrary value outside the range of valid characters.
- This is then stored in
transitions
as a vector of such tuples. - Finally, we assign the vector of tuples to the table using
table.assign_cell
.
Start Constraint
At the first row of our region, we will need to force the state to be ST_A
(aka. ST_START
).
This is done by assigning the fixed column fix_st
to be ST_A
and turning on the q_match
selector:
layouter.assign_region(
|| "regex",
|mut region| {
// at offset 0, the state is ST_START
region.assign_fixed(
|| "initial state",
config.fix_st,
0,
|| Value::known(F::from(ST_START as u64)),
)?;
config.q_match.enable(&mut region, 0)?;
Nothing new here, we saw similar code in the section of Column<Fixed>
.
Assign Transitions
The meat of the circuit is in the region where we assign the sequence of states we are transitioning through, as well as the sequence of characters we are consuming:
// assign each step
for i in 0..MAX_STR_LEN {
// enable the regex automaton
config.q_regex.enable(&mut region, i)?;
// state
region.assign_advice(
|| "st",
config.st,
i,
|| {
self.sts.as_ref().map(|s| {
F::from(
s.get(i) //
.cloned()
.unwrap_or(ST_DONE)
as u64,
)
})
},
)?;
// character
region.assign_advice(
|| "ch",
config.ch,
i,
|| {
self.str.as_ref().map(|s| {
s.chars()
.nth(i)
.map(|c| F::from(c as u64))
.unwrap_or(F::from(EOF as u64))
})
},
)?;
}
End Constraint
At the end, similar to the start, we need to ensure that the state is ST_DONE
:
// at offset MAX_STR_LEN, the state is ST_START
region.assign_advice(
|| "st",
config.st,
MAX_STR_LEN,
|| Value::known(F::from(ST_DONE as u64)),
)?;
region.assign_fixed(
|| "final state",
config.fix_st,
MAX_STR_LEN,
|| Value::known(F::from(ST_DONE as u64)),
)?;
config.q_match.enable(&mut region, MAX_STR_LEN)?;
Ok(())
},
)?;
Exercises
The full code is at the end.
Hint
Hint
You do not need to change the circuit, only the REGEX
constant.
However, this exercise is slightly more complicated than it seems at first:
we must somehow be able to transition from matching a
s without consuming an a
(in case there are zero a
).
Create a circuit which computes a single round of AES encryption.
You may ignore the key schedule.
Hint
Hint
Use four separate lookup tables to represent:
- The S-box. A table with 256 entries.
- The GF(2^8) addition (XOR). A table with 256x256 entries.
- A "multiplication by 2" in GF(2^8) table. A table with 256 entries.
- A "multiplication by 3" in GF(2^8) table. A table with 256 entries.
Hint
Hint
The above is dominated by the XOR table, which is a 256x256 table. To avoid this massive table we can use the fact that XOR acts component-wise on the bits of the input:
\[ (l_1, l_2) \oplus (r_1, r_2) = (l_1 \oplus r_1, l_2 \oplus r_2) \]
Therefore, we can replace a single lookup in a 256x256 table with two lookups in 16x16 tables.
Create a circuit which takes an AES ciphertext as public input (instance) and a key as private input (witness) and checks that the ciphertext decrypts to a known plaintext.
use std::marker::PhantomData;
use halo2_proofs::{
circuit::{Layouter, SimpleFloorPlanner, Value},
dev::MockProver,
plonk::{
Advice,
Circuit,
Column, //
ConstraintSystem,
Error,
Fixed,
Selector,
TableColumn,
},
poly::Rotation,
};
use ff::{Field, PrimeField};
// ANCHOR: regex
const ST_A: usize = 1;
const ST_B: usize = 2;
const ST_C: usize = 3;
// start and done states
const ST_START: usize = ST_A;
const ST_DONE: usize = 4;
// end of file marker:
// "dummy padding character"
const EOF: usize = 0xFFFF;
// conversion of the regular expression: a+b+c
const REGEX: [(usize, usize, Option<char>); 6] = [
(ST_A, ST_A, Some('a')), // you can stay in ST_A by reading 'a'
(ST_A, ST_B, Some('a')), // or move to ST_B by reading 'a'
(ST_B, ST_B, Some('b')), // you can stay in ST_B by reading 'b'
(ST_B, ST_C, Some('b')), // or move to ST_C by reading 'b'
(ST_C, ST_DONE, Some('c')), // you can move to ST_DONE by reading 'c'
(ST_DONE, ST_DONE, None), // you can stay in ST_DONE by reading EOF
];
// ANCHOR_END: regex
const MAX_STR_LEN: usize = 20;
struct TestCircuit<F: Field> {
_ph: PhantomData<F>,
str: Value<String>,
sts: Value<Vec<usize>>,
}
#[derive(Clone, Debug)]
struct TestConfig<F: Field + Clone> {
_ph: PhantomData<F>,
q_match: Selector,
q_regex: Selector, // enable the regex gate
st: Column<Advice>, // current state of automaton
ch: Column<Advice>, // current character
tbl_st_cur: TableColumn,
tbl_st_nxt: TableColumn,
tbl_ch: TableColumn,
fix_st: Column<Fixed>,
}
impl<F: PrimeField> Circuit<F> for TestCircuit<F> {
type Config = TestConfig<F>;
type FloorPlanner = SimpleFloorPlanner;
fn without_witnesses(&self) -> Self {
TestCircuit {
_ph: PhantomData,
str: Value::unknown(), // the string
sts: Value::unknown(), // state of the automaton
}
}
// ANCHOR: columns
fn configure(meta: &mut ConstraintSystem<F>) -> Self::Config {
let q_regex = meta.complex_selector();
let q_match = meta.complex_selector();
let st = meta.advice_column();
let ch = meta.advice_column();
let fix_st = meta.fixed_column();
let tbl_st_cur = meta.lookup_table_column();
let tbl_st_nxt = meta.lookup_table_column();
let tbl_ch = meta.lookup_table_column();
// ANCHOR_END: columns
// ANCHOR: fix
meta.create_gate("fix-st", |meta| {
let st = meta.query_advice(st, Rotation::cur());
let fix_st = meta.query_fixed(fix_st, Rotation::cur());
let en = meta.query_selector(q_match);
vec![en * (st - fix_st)]
});
// ANCHOR_END: fix
// ANCHOR: lookup
meta.lookup("transition-st", |meta| {
let st_cur = meta.query_advice(st, Rotation::cur());
let st_nxt = meta.query_advice(st, Rotation::next());
let ch = meta.query_advice(ch, Rotation::cur());
let en = meta.query_selector(q_regex);
vec![
(en.clone() * st_cur, tbl_st_cur),
(en.clone() * st_nxt, tbl_st_nxt),
(en.clone() * ch, tbl_ch),
]
});
// ANCHOR_END: lookup
TestConfig {
_ph: PhantomData,
q_regex,
st,
ch,
tbl_st_cur,
tbl_st_nxt,
tbl_ch,
fix_st,
q_match,
}
}
// ANCHOR: assign_table
fn synthesize(
&self,
config: Self::Config, //
mut layouter: impl Layouter<F>,
) -> Result<(), Error> {
// assign the transition table
layouter.assign_table(
|| "table",
|mut table| {
// convert the numbers to field elements
let mut transitions: Vec<(F, F, F)> = vec![
// (0, 0, 0) is in the table to account for q_regex = 0
(F::ZERO, F::ZERO, F::ZERO),
];
for tx in REGEX.iter() {
let (st_cur, st_nxt, ch) = tx;
transitions.push((
F::from(*st_cur as u64),
F::from(*st_nxt as u64),
ch.map(|c| F::from(c as u64)).unwrap_or(F::from(EOF as u64)),
));
}
// assign the table
for (offset, (st_cur, st_nxt, char)) in transitions //
.into_iter()
.enumerate()
{
table.assign_cell(
|| format!("st_cur"),
config.tbl_st_cur,
offset,
|| Value::known(st_cur),
)?;
table.assign_cell(
|| format!("st_nxt"),
config.tbl_st_nxt,
offset,
|| Value::known(st_nxt),
)?;
table.assign_cell(
|| format!("char"),
config.tbl_ch,
offset,
|| Value::known(char),
)?;
}
Ok(())
},
)?;
// ANCHOR_END: assign_table
// ANCHOR: region_start
layouter.assign_region(
|| "regex",
|mut region| {
// at offset 0, the state is ST_START
region.assign_fixed(
|| "initial state",
config.fix_st,
0,
|| Value::known(F::from(ST_START as u64)),
)?;
config.q_match.enable(&mut region, 0)?;
// ANCHOR_END: region_start
// ANCHOR: region_steps
// assign each step
for i in 0..MAX_STR_LEN {
// enable the regex automaton
config.q_regex.enable(&mut region, i)?;
// state
region.assign_advice(
|| "st",
config.st,
i,
|| {
self.sts.as_ref().map(|s| {
F::from(
s.get(i) //
.cloned()
.unwrap_or(ST_DONE)
as u64,
)
})
},
)?;
// character
region.assign_advice(
|| "ch",
config.ch,
i,
|| {
self.str.as_ref().map(|s| {
s.chars()
.nth(i)
.map(|c| F::from(c as u64))
.unwrap_or(F::from(EOF as u64))
})
},
)?;
}
// ANCHOR_END: region_steps
// ANCHOR: region_end
// at offset MAX_STR_LEN, the state is ST_START
region.assign_advice(
|| "st",
config.st,
MAX_STR_LEN,
|| Value::known(F::from(ST_DONE as u64)),
)?;
region.assign_fixed(
|| "final state",
config.fix_st,
MAX_STR_LEN,
|| Value::known(F::from(ST_DONE as u64)),
)?;
config.q_match.enable(&mut region, MAX_STR_LEN)?;
Ok(())
},
)?;
// ANCHOR_END: region_end
Ok(())
}
}
fn main() {
use halo2_proofs::halo2curves::bn256::Fr;
// run the MockProver
let circuit = TestCircuit::<Fr> {
_ph: PhantomData,
// the string to match
str: Value::known("aaabbbc".to_string()),
// manually create a trace of the state transitions
sts: Value::known(vec![
ST_A, // ST_A -a-> ST_A (START)
ST_A, // ST_A -a-> ST_A
ST_A, // ST_A -a-> ST_A
ST_B, // ST_A -a-> ST_B
ST_B, // ST_B -b-> ST_B
ST_B, // ST_B -b-> ST_B
ST_C, // ST_B -b-> ST_C
ST_DONE, // ST_C -c-> ST_DONE
]),
};
let prover = MockProver::run(8, &circuit, vec![]).unwrap();
prover.verify().unwrap();
}