StreamSec  

 

 
News
Products
Research
How to - algorithms with proofs
PCFB-mode
Steak
Key Setup Scheme
Feedback process
Main Function
Algorithm
Matrix Table Transform
Hash Scheme
PRNG test results
Security proof
Source - Steak
Source - SecUtils
Source - CombUtils
Order
Demos
Downloads
About Streamsec
Contact

2001-05-02

Security proof

The insecurity level of a keyed function is by convention defined as the probability that you might distinguish it from a random function after a certain number of queries to it using any distinguisher. (cf Goldwasser & Bellare 1999) The task at hand is to (1) find an appropriate way to model Steak as a structure of keyed functions, (2) model the way these functions are promoted, and (3) find the most effective distinguisher for each of these functions.

Firstly, let Fk: {0,1}8 => {0,1}8 be the function such that k is the key data of Steak, and Fk(x) is the key stream output of Steak given the key data k and the preceding cipher text byte x. If the vector size is large enough, this function has a very low insecurity level. In order to prove so, the following observations should be made:

Proposition 1: Let h be a selector of any combination of 256 distinct elements out of a set of at least 65536 of the least natural numbers. Let the injective function Gh: {0,1}8 => {0,1}32 be such a combination. Then, for any random function g: {0,1}8 => {0,1}8, there is an h such that for all x we have that g(x) = Gh(x) mod 256.

Now, let’s structure the function Fk in the following manner: Let Gik: {0,1}8 => {0,1}32 be the dynamic key data generator of Fk at round i. Then for all x we have that Fk(x) = Lk(Gnk(x)) mod 256, where n is the vector size, and the function Lk simply undoes the key data feedback of the last round. Furthermore, due to the PV-pipe of the feedback process, we have that the 256-element combination corresponding to the function Gik has a selector corresponding to the preceding round function Gi-1k. In other words, since the main function of Steak is bijective, the probability increases with the vector size of Steak that for any random function g there is an equivalent function Fk. The bijectivity of the main function also ensures that the number of collisions for each Fk in the set {Fk} converges to a constant function of k as the vector size increases. This means that for all practical purposes the function Fk might be modeled as a random function.

Secondly, we ought to prove that the process of selecting the next function Fx°k also might be modeled as a random function.

However, one ought to notice that there would be a potential weakness to any such proof. For each of the 22048 possible g there is only a limited number of keys k such that Fk is equivalent to g. If this set is known, then for each cipher text x it would be trivial to compute the set of possible functions Fx°k. For all we know it would be an immensely complex task to find the set {k} corresponding to a random function g since the set of all keys is larger than 26000, but we can’t prove that it can’t be done. 

One should also note that there are more than 24000  full sized keys corresponding to each function g. Since all S-boxes are single cycle permutations, the ratio of keys actually inconsistent with the extension of the promotion from Fk to Fx°k  cannot be larger than 3 divided by 256 (given that full sized random keys are used), because that's the trivial upper bound of the probability that, for a vector size of one, you will not find two keys k, k', that differs in at most three of the S-boxes and are such that both Fk and Fk' are equivalent to g0 and for all possible values of x both Fx°k and Fx°k' are equivalent to g1. (Note that this is an approximate model. The MTT is not MDS, so the relation between the sboxes would have to be more complex, but since the MTT is bijective the probability of finding them is roughly bounded by the same number, in particular if the vector size is larger than one.) If one does not take the static and dynamic vectors into account, this entails that the unicity distance of Steak cannot be less than 3 bytes and key retreival requires an effort of at least O(224), where O is bounded downwards by the effort of finding the set {k} corresponding to the value pair (x,g(x)) for a random function g. These bounds are fully independant of the security of the promotion function.

This said, the proof is more or less trivial.

To begin with, it is important that the dynamic key data Vm significantly contributes to the promotion of  Fk to Fx°k. If it doesn't, then it might be possible to extract the s-boxes and the vectors separately from the sequence (or rather 256-branched tree) of (Fmk), and it would make less difference whether the dynamic vector is pseudo random or not. More specifically, we would prefer that for any random function g and fixed Ak and Dk, there is a constant epistemic probability that one might find a vector Vm such that Fmk = g. This seems to be the case: Each Gi°mk depends (close to) linearly on the value of Vm[n-i+1], but a linear change to Gi°mk will, after it has passed through Ak[n-i] and Dk, result in a non-linear change to Gi+1°mk. This is largely due to the fact that the s-boxes are single cycle, and hence never can be fully linear. Consequently, the probability is extremely low that there are non-zero values x and y such that (Vm[n-i+1]+x,Vm[n-i]) and (Vm[n-i+1],Vm[n-i]+y) will result in the same function Gi+1°mk. The degree to which this property holds for large chunks of Vm ought to be high. The s-boxes will always be non-linear to some degree, and this non-linearity will show in the way Gi°mk selects Gi+1°mk.

The pseudo randomness of Vm might be established as follows: Let Hi°m°jk be such that for any i, j, m, k, we have that Hi°m°jk(PVn-i+j+1) = PVn-i+1, where PVi = V[i] according to the notation of the algorithm. Define the value PVn-i+j+1 such that if n-i+j+1 is greater than n, then this is the corresponding internal state of Fm-1k at round i-j-n, etc. Note that the four most and least significant bits of Hm°jk will always depend (close to linearly) on the cipher text byte Cm-1. This byte will be pseudo random if the least significant byte of Hn°m-1°jk is pseudo random. It isn't material in this context that this byte is public. Now, according to the assements about the impact of Vm on the selection of each Gi°mk, it is correspondingly reasonable to assess that there is a value e such that if j > e, then Hi°m°jk is a pseudo random permutation. (Our statistical tests indicate that e = 3 is sufficient.) This entails that if e.g. n = 8 and j = 4, then the string of V1[5], V1[1], V2[5], V2[1], ... will be pseudo random up to a length corresponding to at least the strength of each Hm°4k, Hm°4k. The entropy of the promotion from Fmk to Fm+1k cannot be less than the entropy of this string, because the strings V1[8], V1[4],.. , V1[7], V1[3], ... etc will all have the same security bound, and if the conjunction of them would decrease the strength of the promotion from Fmk to Fm+1k, then this must be due to the way some Hi°m°jk selects the next Hi+1°m°jk, and that would be contrary to our assumptions that this step cause a high probability security increment.

It is important to note that there is a close relationship between the function Fmk and the promotion of Fmk to Fm+1k, since each Gi°mk is constituted by the same steps in the algorithm as Hi°m°jk. There is however a "but": Although the set {...,(Gi-1°mk(x),Gi°mk(x)),...} is a subset of Hi°m°jk, knowing Gi°mk extensively, but nothing about Gi-1°mk, will not leak any information at all about Hi°m°jk given that i > 1. In the informal sense, a cipher like Steak will never be cracked by an adversary who first learns the value of one vector element, solves an equation to learn another value, etc, but rather as the result of an statistical attack of some kind. Primarily because the only thing known to attacker in the first place is the cipher text and possibly the plain text - secondarily because unequivocal knowledge of isolated elements of key data will still leave too many unknown variables for the equations to be solved. The point of this exercise is foremost to prove that it is impossible that knowledge of one function Fmk could enable the adversary to make more than a lucky guess about any internal key data, but also to justify the assessment that knowledge of several functions in the tree (Fmk) could not enable the adversary to do significantly better, because the relationship between these functions is too complex.

For example: Knowing both Gi-1°mk and Gi°mk extensively will reveal no more than a 2-24 fraction of the extension of Hi°m°jk. Furthermore, knowing Gi-1°mk and Gi°mk extensively would make it theoretically possible to compute the set {B} of all possible values of the sum B = A[n-i+1]+V[n-i] through exhaustive key search, but only Bwould be revealed. If the set {B} is sufficiently large, extensive knowledge of Gi-1°m+1k and Gi°m+1k  would not add a significant amount of information about either Vm[n-i] or Vm+1[n-i]. A conservative estimation is that |{B}| > 231.9 (based on the number of ways to modify the s-boxes, and obtain the same functions Gi-1°mk and Gi°mk, that would not be possible because the s-boxes are single cycle and the MTT is not MDS), implying that one would need at least 10 value pairs (Gi-1°mk(x), Gi°mk(x)) for different m before the intersection of the sets {B} was less than 231. Given that the adversary's knowledge of Gi-1°mk and Gi°mk was likely to actually only be probabilistic to begin with, the increment of knowledge obtained by making more queries is more or less negligleble.

Thirdly, we prove that the advantage of any distinguisher for Steak has a polynomial bound, and base this on few assumptions:

  1. The relation between the key stream output and the least byte V[1][0] of the internal state vector has no distinguisher with an advantage greater than any other distinguisher for Steak. This is a justified assumption: V[1][0] is a bijective function of Vm[1][0] with the key stream output x as the key parameter, and Vm[1] will have an impact on x only after being distorted by nearly all other key data. This assumption is clearly similar to the assessment that Fmk might be modeled as a random function.
  2. The probability is negligleble that, for any pair of key data k and k', there is a more efficient way to tell if Fk = Fk' other than computation of at least one element in the each respective code book. 

The second assumption is justified by the properties of Gi°mk and implies that it would be infeasible to find an advantageous trade-off between time, memory and probability, for any method that would narrow the set {k} corresponding to Fmk down to the subset corresponding to Fm+1k. Suppose that we have found a way of representing a subset Mm of the set {k} corresponding to Fmk. (For the sake of simplicitly, I will assume that Mm is represented either extensionally, or by the combination of formulaes in the vectors and s-boxes separately. There might be ways to represent Mm that put the s-boxes and the vectors into the same formula, but I will assume that since the s-boxes belong to the entire set of single cycle permutations, it will be infeasible to parameterize the entire key space using such formulae, and that it in particular will be infeasible to "do calculus" on such formulae, e.g. to take two arbitrary formulae and in linear time calculate the intersection of the two sets.)

Suppose that the probability that m, k is actually in Mm equals p = |Mm|/|{k}|, and that we also have found the next subset Mm+1 and that |Mm| = |Mm+1|. Then, by the second assumption, we would have to have at least 2|Mm| bytes of memory or perform O(|Mm|2) computations in order to find the intersection of Mm and Mm+1. If we instead have a method of telling if two sets Mm and Mm' are intersected, and do so with a constant probability of q, then the advantage of the distinguisher would be an exponential function of the number of queries, but the effort would be a polynomial function of that same number of queries (we would have to test each pair Mm and Mm' or else the advantage would at most increase polynomially). If we don't have that kind of resources, and we won't have that if p is significantly larger than 2-4000, and if the effort is a linear function of the number of queries, then the advantage of the distinguisher we are working with cannot increase exponentially depending on the number of queries (as it would if we were able to unequivocally represent the entire sets {k} and were able to find and represent their intersections).

Conversely, if Mm has a brute forceable size, then  p cannot be significantly larger than 2-4000, and the attack would most likely fail for the first sequence of queries. The attack would not be successful unless it was possible to learn something from the failures, so that the set M'm selected for the next attempt had the same cardinality, but the probability was larger that k actually belonged to it. It is a provable fact that the set Mm would be too small for such an attack to successfully eliminate possible values of the vector elements (except Vm[n][3], which would already be known), because too many s-boxes in the complement set of Mm would make each of these vector element values possible. The converse is also true, because it is, by assumption, not possible to unequivocally rule out that a particular set of s-boxes is inconsistent with a sequence (Fmk) with substantially less effort than brute force on the vectors. So the elimination would have to be probabilistic, and hence the argument of the preceding paragraph applies.

The advantage of any distinguisher must hence be bounded by a polynomial function of the number of queries. This concludes this rather informal proof. Q.E.D. 

Practically, some considerations regarding differential crypto analysis are accounted for on a separate page. Click here to go there. The PRNG tests performed on Steak are also of interest in this context. Click here to go there.

In order to determine the insecurity level of any of the round functions, one might proceed as follows: Suppose that you use step (2.2) of the encryption algorithm as a 32-bit block cipher, with B as the plain text and E as the cipher text and the key data D set up as with Steak. You would then be able to unequivocally determine the S-boxes with only 255 blocks of chosen plain text, such that if i is not equal to j, then Bi[0] is not equal to Bj[0], Bi[1] is not equal to Bj[1], Bi[2] is not equal to Bj[2] and Bi[3] is not equal to Bj[3]. A high probability distinguisher would be to make only two queries by selecting E and E’ such that only the most significant byte differs. The advantage of this distinguisher would be as high as 1-224.

However, the advantage of this distinguisher would quickly be lost if whitening steps were added and the function was iterated. In fact, the advantage of the distinguisher drops to absolute zero after four iterations, given that the number of queries is two. If the number of queries were higher, the advantage would presumably be higher, but still insignificant.

©2000-2001StreamSec