|
2001-05-02
Security proof
The insecurity level
of a keyed function is by convention defined as the probability that you
might distinguish it from a random function after a certain number of
queries to it using any distinguisher. (cf Goldwasser & Bellare 1999)
The task at hand is to (1) find an appropriate way to model Steak as a structure
of keyed functions, (2) model the way these functions are promoted, and (3) find the most effective distinguisher for each
of these functions.
Firstly, let Fk: {0,1}8
=> {0,1}8 be the function such that k is the key data
of Steak, and Fk(x) is the key stream output of
Steak given the key data k and the preceding cipher text byte x.
If the vector size is large enough, this function has a very low insecurity
level. In order to prove so, the following observations should be made:
Proposition 1: Let h be a selector of
any combination of 256 distinct elements out of a set of at least 65536 of
the least natural numbers. Let the injective function Gh:
{0,1}8 => {0,1}32 be such a combination. Then, for
any random function g: {0,1}8 => {0,1}8, there is an
h such that for all x we have that g(x) = Gh(x)
mod 256.
Now, let’s structure
the function Fk in the following manner: Let Gik:
{0,1}8 => {0,1}32 be the dynamic key data generator
of Fk at round i. Then for all x we have
that Fk(x) = Lk(Gnk(x))
mod 256, where n is the vector size, and the function Lk
simply undoes the key data feedback of the last round. Furthermore, due to
the PV-pipe of the feedback process, we have that the 256-element combination
corresponding to the function Gik has a
selector corresponding to the preceding round function Gi-1k.
In other words, since the main function of Steak is bijective, the probability
increases with the vector size of Steak that for any random function g
there is an equivalent function Fk. The bijectivity of
the main function also ensures that the number of collisions for each Fk
in the set {Fk} converges to a constant function of k
as the vector size increases. This means that for all practical purposes
the function Fk might be modeled as a random function.
Secondly, we ought to prove that the process
of selecting the next function Fx°k also
might be modeled as a random function.
However, one ought to
notice that there would be a potential weakness to any such proof. For each
of the 22048 possible g there is only a limited number of keys k such that Fk
is equivalent to g. If this set is known, then for each cipher text x it
would be trivial to compute the set of possible functions Fx°k.
For all we know it would be an immensely complex task to find the set {k}
corresponding to a random function g since the set of all keys is larger
than 26000, but we can’t prove that it can’t be
done.
One should also note
that there are more than 24000 full sized keys
corresponding to each function g. Since all S-boxes are single cycle
permutations, the ratio of keys actually inconsistent with the extension of the promotion from Fk to Fx°k
cannot be larger than 3 divided by 256 (given that full sized random keys
are used), because that's the trivial upper bound of the probability that,
for a vector size of one, you will not find two keys k, k',
that differs in at most three of the S-boxes and are such that both Fk
and Fk' are equivalent to g0 and for all
possible values of x both Fx°k
and Fx°k' are equivalent to g1.
(Note that this is an approximate model. The MTT is not MDS, so the relation
between the sboxes would have to be more complex, but since the MTT is
bijective the probability of finding them is roughly bounded by the same
number, in particular if the vector size is larger than one.) If one does
not take the static and dynamic vectors into account, this entails that the
unicity distance of Steak cannot be less than 3 bytes and key retreival
requires an effort of at least O(224), where O is bounded
downwards by the effort of finding the set {k}
corresponding to the value pair (x,g(x)) for a random function g.
These bounds are fully independant of the security of the promotion
function.
This said, the proof
is more or less trivial.
To begin with, it is
important that the dynamic key data Vm
significantly contributes to the promotion of
Fk to Fx°k. If it
doesn't, then it might be possible to extract the s-boxes and the vectors
separately from the sequence (or rather 256-branched tree) of (Fmk),
and it would make less difference whether the dynamic vector is pseudo
random or not. More specifically, we would prefer that for any random
function g and fixed Ak and Dk,
there is a constant epistemic probability that one might find a vector Vm
such that Fmk = g. This seems to be the
case: Each Gi°mk
depends (close to) linearly on the value of Vm[n-i+1],
but a linear change to Gi°mk
will, after it has passed through Ak[n-i]
and Dk, result in a non-linear change to Gi+1°mk.
This is largely due to the fact that the s-boxes are single cycle, and hence
never can be fully linear. Consequently, the probability is extremely low
that there are non-zero values x and y such that (Vm[n-i+1]+x,Vm[n-i])
and (Vm[n-i+1],Vm[n-i]+y)
will result in the same function Gi+1°mk.
The degree to which this property holds for large chunks of Vm
ought to be high. The s-boxes will always be non-linear to some degree, and
this non-linearity will show in the way Gi°mk
selects Gi+1°mk.
The pseudo
randomness of Vm
might be established as follows: Let Hi°m°jk
be such that for any i, j, m, k, we have that Hi°m°jk(PVn-i+j+1)
= PVn-i+1,
where PVi
= V[i] according to the notation of the algorithm. Define the
value PVn-i+j+1
such that if n-i+j+1 is greater than n, then
this is the corresponding internal state of Fm-1k
at round i-j-n, etc. Note that the four most and least
significant bits of H1°m°jk
will always depend (close to linearly) on the cipher text byte Cm-1.
This byte will be pseudo random if the least significant byte of Hn°m-1°jk
is pseudo random. It isn't material in this context that this byte is
public. Now, according to the assements about the impact of Vm
on the selection of each Gi°mk,
it is correspondingly reasonable to assess that there is a value e such
that if j > e, then Hi°m°jk
is a pseudo random permutation. (Our statistical tests indicate that e
= 3 is sufficient.) This entails that if e.g. n = 8 and j = 4,
then the string of V1[5],
V1[1],
V2[5],
V2[1],
... will be pseudo random up to a length corresponding to at least the
strength of each H4°m°4k,
H8°m°4k.
The entropy of the promotion from Fmk to Fm+1k
cannot be less than the entropy of this string, because the strings V1[8],
V1[4],..
, V1[7],
V1[3],
... etc will all have the same security bound, and if the conjunction of
them would decrease the strength of the promotion from Fmk
to Fm+1k, then this must be
due to the way some Hi°m°jk
selects the next Hi+1°m°jk,
and that would be contrary to our assumptions that this step cause a high
probability security increment.
It is important to
note that there is a close relationship between the function Fmk
and the promotion of Fmk
to Fm+1k,
since each Gi°mk
is constituted
by the same steps in the algorithm as Hi°m°jk.
There is however a "but": Although the set {...,(Gi-1°mk(x),Gi°mk(x)),...}
is a subset of Hi°m°jk,
knowing Gi°mk
extensively, but nothing about Gi-1°mk,
will not leak any information at all about Hi°m°jk
given that i > 1. In the informal sense, a cipher like Steak will
never be cracked by an adversary who first learns the value of one vector
element, solves an equation to learn another value, etc, but rather as the
result of an statistical attack of some kind. Primarily because the only
thing known to attacker in the first place is the cipher text and possibly
the plain text - secondarily because unequivocal knowledge of isolated
elements of key data will still leave too many unknown variables for the
equations to be solved. The point of this exercise is foremost to prove that
it is impossible that knowledge of one function Fmk
could enable
the adversary to make more than a lucky guess about any internal key data,
but also to justify the assessment that knowledge of several functions in
the tree (Fmk)
could not enable the adversary to do significantly better, because the
relationship between these functions is too complex.
For example: Knowing
both Gi-1°mk
and Gi°mk
extensively will reveal no more than a 2-24 fraction of the
extension of Hi°m°jk.
Furthermore, knowing Gi-1°mk
and Gi°mk
extensively would make it theoretically possible to compute the set {B}
of all possible values of the sum B = A[n-i+1]+V[n-i]
through exhaustive key search, but only Bwould be revealed. If the
set {B} is sufficiently large, extensive knowledge of Gi-1°m+1k
and Gi°m+1k
would not add a significant amount of information about either Vm[n-i]
or Vm+1[n-i].
A conservative estimation is that |{B}| > 231.9 (based
on the number of ways to modify the s-boxes, and obtain the same functions Gi-1°mk
and Gi°mk,
that would not be possible because the s-boxes are single cycle and the MTT
is not MDS), implying that one would need at least 10 value pairs (Gi-1°mk(x),
Gi°mk(x))
for different m before the intersection of the sets {B} was
less than 231. Given that the adversary's knowledge of Gi-1°mk
and Gi°mk
was likely to actually only be probabilistic to begin with, the increment of
knowledge obtained by making more queries is more or less negligleble.
Thirdly, we prove that the advantage of
any distinguisher for Steak has a polynomial bound, and base this on few
assumptions:
- The relation between the key
stream output and the least byte V[1][0] of the internal state
vector has no distinguisher with an advantage greater than any other
distinguisher for Steak. This is a justified assumption: V[1][0]
is a bijective function of Vm[1][0]
with the key stream output x as the key parameter, and Vm[1]
will have an impact on x only after being distorted by nearly all
other key data. This assumption is clearly similar to the assessment
that Fmk might be modeled as a random
function.
- The probability is negligleble that, for any pair of key data k and
k', there is a more efficient way to tell if Fk
= Fk' other than computation of at least one element
in the each respective code book.
The second
assumption is justified by the properties of Gi°mk
and implies that it would be infeasible to find an advantageous trade-off
between time, memory and probability, for any method that would narrow the
set {k} corresponding to Fmk
down to the
subset corresponding to Fm+1k.
Suppose that we have found a way of representing a subset Mm
of the set {k} corresponding to Fmk.
(For the sake of simplicitly, I will assume that Mm
is represented either extensionally, or by the combination of formulaes in
the vectors and s-boxes separately. There might be ways to represent
Mm
that put the s-boxes and the vectors into the same formula, but I will
assume that since the s-boxes belong to the entire set of single cycle
permutations, it will be infeasible to parameterize the entire key space
using such formulae, and that it in particular will be infeasible to
"do calculus" on such formulae, e.g. to take two arbitrary
formulae and in linear time calculate the intersection of the two sets.)
Suppose that the
probability that m, k is actually in M m
equals p = |Mm|/|{k}|,
and that we also have found the next subset Mm+1
and that |Mm|
= |Mm+1|.
Then, by the second assumption, we would have to have at least 2|Mm|
bytes of memory or perform O(|Mm|2)
computations in order to find the intersection of Mm
and Mm+1.
If we instead have a method of telling if two sets Mm
and Mm'
are intersected, and do so with a constant probability of q, then the
advantage of the distinguisher would be an exponential function of the
number of queries, but the effort would be a polynomial function of that
same number of queries (we would have to test each pair Mm
and Mm'
or else the advantage would at most increase polynomially). If we don't have
that kind of resources, and we won't have that if p is significantly
larger than 2-4000,
and if the effort is a linear function of the number of queries, then the
advantage of the distinguisher we are working with cannot increase
exponentially depending on the number of queries (as it would if we were
able to unequivocally represent the entire sets {k} and were able to
find and represent their intersections).
Conversely, if Mm
has a brute forceable size, then p cannot be significantly
larger than 2-4000,
and the attack would most likely fail for the first sequence of queries. The
attack would not be successful unless it was possible to learn something
from the failures, so that the set M'm
selected for the next attempt had the same cardinality, but the probability
was larger that k actually belonged to it. It is a provable fact that
the set Mm
would be too small for such an attack to successfully eliminate possible
values of the vector elements (except Vm[n][3],
which would already be known),
because too many s-boxes in the complement set of Mm
would make each of these vector element values possible. The converse is
also true, because it is, by assumption, not possible to unequivocally rule
out that a particular set of s-boxes is inconsistent with a sequence (Fmk)
with substantially less effort than brute force on the vectors. So the
elimination would have to be probabilistic, and hence the argument of the
preceding paragraph applies.
The advantage of any
distinguisher must hence be bounded by a polynomial function of the number
of queries. This concludes this rather informal proof. Q.E.D.
Practically, some considerations regarding
differential crypto analysis are accounted for on
a separate page. Click here to go there. The PRNG tests performed on Steak are also of interest in
this context. Click here to go there.
In order to determine
the insecurity level of any of the round functions, one might proceed as follows: Suppose that you use step (2.2) of the
encryption algorithm as a 32-bit block cipher, with B as the plain text and
E as the cipher text and the key data D set up as with Steak. You
would then be able to unequivocally determine the S-boxes with only 255
blocks of chosen plain text, such that if i is not equal to j,
then Bi[0] is not equal to Bj[0], Bi[1]
is not equal to Bj[1], Bi[2] is not
equal to Bj[2] and Bi[3] is not equal
to Bj[3]. A high probability distinguisher would be to
make only two queries by selecting E and E’ such that only the most
significant byte differs. The advantage of this distinguisher would be as
high as 1-224.
However, the advantage
of this distinguisher would quickly be lost if whitening steps were added
and the function was iterated. In fact, the advantage of the distinguisher drops to absolute
zero after four iterations, given that the number of queries is two. If the
number of queries were higher, the advantage would presumably be higher, but still
insignificant.
|