Author: Peter FairbrotherPeter Fairbrother
Date: Dec 6, 2006 01:13
Peter Fairbrother wrote:
> Dmitry Chumack wrote:
>
>> Hi *
>>
>> I have a question. There is few doesns of __terabytes__ of data. I need
>> to split this data to blocks of fixed size. Than I need to calculate a
>> sha1 hash for each block and store it. The question is which minimal
>> size of block I have to choose to minimize the probability of existence
>> of two equal hashes? What probability of such a coincidence would be if
>> I'll use an md5, crc32 or sha2 algorithms instead of sha1? Or if I'll
>> use a combination of this algorithms (e.g. sha1 and crc32 or sha2 and
>> md5 and crc32)?
>>
>> Thanks in advance.
>>
>
> For n << b
Ooops - should read "for n^2 << 2^b". Sorry.
> a good approximation of the probability of a collision is
> (n^2)/(2^b) where n is the number of blocks and b is the size of the hash in
> bits.
>
> That's assuming no-one is trying to cheat and create...
|