Home     |     Java    |     Php General    |     Oracle Database    |     Oracle Server  

MS Dynamics CRM 3.0

  •  Setting up and Configuring Microsoft Dynamics CRM 3.0
  •  Managing Security and Information Access
  •  Entity Customization: Concepts and Attributes
  •  Entity Customization: Forms and Views
  •  Entity Customization: Relationships, Custom Entities, and Site Map
  •  Reporting and Analysis
  •  Workflow
  •  Server-Side SDK
  •  Client-Side SDK
  •  Integration with External Applications
  • Cervo Technologies
    The Right Source to Outsource

    Sharepoint Portal Server KB

    Microsoft CRM Info

    WPF Interview Questions

    SilverLight Interview Qs

    Asp.Net 2.0 Interview Qs

    Asp.NET 1.1 FAQs

    Oracle Interview Questions

    SAP Interview Questions

    Java Programming

    MD5 algorithm


    I'm trying to implement the MD5 algorithm. (Note: I'm not looking for an
    existing MD5 library; this is an exercise). It compiles and returns
    something that looks like a hash superficially, but that's clearly the
    wrong result. Even if there were a Unicode/Ascii problem, the empty
    string should return the correct result (since it is always padded to
    one 1-bit and 511 0-bits), but it doesn't.

    Also, several strings (say, "" and "The") return the same incorrect hash
    - but by dumping the 128-bit hash in each iteration, I've found only
    very little in common between the intermediate steps, only the final one
    is completely identical.

    This is the function, as implemented from the pseudocode on Wikipedia.
    http://en.wikipedia.org/wiki/MD5#Pseudocode

    --------------------------------------------------------

    Constants:

    int[4] s = {
                    0x67452301,
                    0xEFCDAB89,
                    0x98BADCFE,
                    0x10325476
            };      
    int[64] shift= {
             7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,
       5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,
       4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,
       6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21
                  };

    int[64] k;

       for (int i=0;i<64;i++)
       {
         k[i]=(int)(Math.abs(Math.sin(i+1))*1073741824*4);
       }

    private static int[] processFin(int[] message)
    {
       int[] result=s.clone(); // initialize
       // message is already padded, just split:
       int blocks=message.length/16;
       for (int i=0;i<blocks;i++)
       { // for each 16-int block
         int[] state=result.clone(); // copy state
         for (int j=0;j<64;j++)
           {
             int  f,g,temp;
             if  (j<16)
             {
               f= state[1] & state[2]  | ~state[1]  &  state[3];
               g=j;
             }
             else  if  (j<32)
             {
               f=state[3]  &  state[1] | ~state[3] & state[2];
               g=(5*j+1)%16;
             }
             else  if  (j<48)
             {
               f=state[1]^state[2]^state[3];
               g=(3*j+5)%16;
             }
             else
             {
               f=state[2]^ (state[1] | ~state[3]);
               g=(7*j)%16;
             }
             temp=state[3];
             state[3]=state[2];
             state[2]=state[1];
             state[1]=(state[0]+f+k[j]+message[i*16+g])  <<  shift[j];
             state[0]=temp;
           }
           for(int  j=0;j<4;j++)  result[i]+=state[i];
         }
         return  result;

    }

    ---------------------------------------------
    The hex string returned for both "" and "The" is
    3c00a101efcdab8998badcfe10325476.

    I just can't see what's wrong with the code - all parts of the
    pseudocode description seem to be implemented exactly as they should
    be... unless the signed ints don't wrap modulo 2^32 as unsigned ones
    should. But they do for all examples I could think of, and anyway, that
    shouldn't cause those hashes to be exactly the same...

    --cb

    Her i have found some intressting webpages about it. Maybe it helps.

    http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html

    sebbaer schrieb:

    > Her i have found some intressting webpages about it. Maybe it helps.

    > http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html

    If I had found something that helped me in the first 10 Google results,
    I wouldn't have had to ask here, now would I? :)

    The only thing I could use would be some kind of step-by-step
    walkthrough for the MD5 calculation of a short string, so I could check
    it against the intermediate steps in my own calculation

    --cb

    On Mon, 14 May 2007 13:40:38 +0200, Christoph Burschka
    <christoph.bursc@rwth-aachen.de> wrote, quoted or indirectly quoted
    someone who said :

    >The only thing I could use would be some kind of step-by-step
    >walkthrough for the MD5 calculation of a short string, so I could check
    >it against the intermediate steps in my own calculation

    Can you find Sun's code?  Do you have an assembler level bebugger than
    will let you single step through the code?
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com

    On Mon, 14 May 2007 13:40:38 +0200, Christoph Burschka
    <christoph.bursc@rwth-aachen.de> wrote, quoted or indirectly quoted
    someone who said :

    >The only thing I could use would be some kind of step-by-step
    >walkthrough for the MD5 calculation of a short string, so I could check
    >it against the intermediate steps in my own calculation

    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com

    Christoph Burschka wrote:
    > private static int[] processFin(int[] message)
    > {
    >   int[] result=s.clone(); // initialize
    >   // message is already padded, just split:
    >   int blocks=message.length/16;
    >   for (int i=0;i<blocks;i++)
    >   { // for each 16-int block

    It seems you are skipping the last 16 element in messages. Perhaps the
    "i < blocks" condition should have been "i <= blocks"?

    Regards,
    --
    Filip Larsen

    On Mon, 14 May 2007 12:38:17 +0200, Christoph Burschka

    [snip]

    >private static int[] processFin(int[] message)
    >{
    >   int[] result=s.clone(); // initialize
    >   // message is already padded, just split:

    Have you implemented the padding correctly?  To quote RFC 1321:
    "3.1 Step 1. Append Padding Bits

       The message is "padded" (extended) so that its length (in bits) is
       congruent to 448, modulo 512. That is, the message is extended so
       that it is just 64 bits shy of being a multiple of 512 bits long.
       Padding is always performed, even if the length of the message is
       already congruent to 448, modulo 512.

       Padding is performed as follows: a single "1" bit is appended to
       the message, and then "0" bits are appended so that the length in
       bits of the padded message becomes congruent to 448, modulo 512. In
       all, at least one bit and at most 512 bits are appended.

    3.2 Step 2. Append Length

       A 64-bit representation of b (the length of the message before the
       padding bits were added) is appended to the result of the previous
       step. In the unlikely event that b is greater than 2^64, then only
       the low-order 64 bits of b are used. (These bits are appended as
       two 32-bit words and appended low-order word first in accordance
       with the previous conventions.)

       At this point the resulting message (after padding with bits and
       with b) has a length that is an exact multiple of 512 bits.
       Equivalently, this message has a length that is an exact multiple
       of 16 (32-bit) words. Let M[0 ... N-1] denote the words of the
       resulting message, where N is a multiple of 16."

    Note the endianness of the 64 bit message length field, which is not
    clearly explained in the Wikipedia article - two big-endian 32 bit
    words in little-endian order.  This is not a standard byte order and
    you will need to implement it carefully.

    It is also necessary to get all endian issues resolved correctly
    within the algorithm itself - they can often be a source of problems
    when implementing cryptographic algorithms.

    RFC 1321 (http://tools.ietf.org/html/rfc1321) also includes a
    canonical implementation of MD5 in C, which might be helpful to read.
    It also contains a few more Test Vectors than the Wikipedia article.

    rossum

    "Christoph Burschka" <christoph.bursc@rwth-aachen.de> wrote in message

    news:5ar05nF2psib9U1@mid.dfncis.de...

    > sebbaer schrieb:
    >> Her i have found some intressting webpages about it. Maybe it helps.

    >> http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html

    > If I had found something that helped me in the first 10 Google results,
    > I wouldn't have had to ask here, now would I? :)

        You'd be surprised...

        - Oliver

    Mh... you had me there for a few moments. But no, the message is already padded
    to a multiple of 16, so blocks is exactly the right number of blocks (if the
    message has 32 elements, blocks will be 2, and the loop will go from block #0 to
    block #1).

    If i is the number of the block, the messages in the block are message[16*i ..
    16*i+15]. Since blocks is exactly the message length divided by 16, the loop I
    have now goes from message[0] to message[(message.length / 16 -1) * 16 +15], or
    message[message.length-1]. If I changed the boundary condition to i<=blocks, I'd
    be running out of the array boundary.

    --cb

    Whoops. I didn't get the little-endian thing. So far I just padded my int[]
    array to a length of 14 mod 16, then added the length value (long) as two int
    words in normal big-endian order.

    So what I should do is just swap the order of the two ints that I split the long
    value into, leaving the ints themselves unchanged.

    Mh. But that can't be my only mistake. Consider the zero-length example - ""
    will always be padded to 0x8000[...]00, and the unpadded length, 0, is the same
    regardless of endianness. Still, this message gives me the wrong result.

    Not even mentioning that I'm somewhat skeptic of the possibility that the order
    of two 32bit words at the end of the message could so badly break the hashing
    function as to give it the same hash for two random short messages... but I'll
    fix this and see what happens.

    --cb

    Christoph Burschka wrote:
    > Mh... you had me there for a few moments. But no, the message is already padded
    > to a multiple of 16, so blocks is exactly the right number of blocks (if the
    > message has 32 elements, blocks will be 2, and the loop will go from block #0 to
    > block #1).

    Ok, hard to discount that without seeing the padding code.

    >  for(int  j=0;j<4;j++)  result[i]+=state[i];

    Next bug candidate is this line, where it looks like the arrays should
    have been indexed by j and not i.

    Regards,
    --
    Filip Larsen

    Whoops again. There's a lot more wrong with my padding than I thought. I appear
    to be adding the message length some 8 words before the actual end, and I don't
    even know how I managed to do that.

    Also, the initialization of my sine table is messed up - casting from double to
    int will apparently just set too-high values to INT_MAX_VALUE. But if I cast the
    sine to int before multiplying it by 2^32, well... it just ends up at 0, of
    course. I'll need to do something else here...

    Aha! Casting first to long to get a whole number, then to int, appears to give
    me something closer to what I want - the long->int conversion wraps instead of
    setting to INT_MAX_VALUE. Whether the result is also the correct one remains to
    be seen...

    --cb

    Wow... that thing is literally crawling with bugs. Unfortunately, that still
    didn't do the trick - now the result is different again, but the hashes are
    still wrong and also nearly identical:

    []: 0112a34159cdab8981f7dcfea9eb2476
    [ ]: 0112a34159cdab8981f7dcfea9eb2476
    [The]: 8112a34159cdab8981f7dcfea9eb2476
    [ttr]: 8112a34159cdab8981f7dcfea9eb2476
    [ t t]: 7512a34159cdab8981f7dcfea9eb2476
    [The quick brown fox jumps over the lazy dog]: f3b76b417dcdab89cf30dcfe97eb2476

    I think I'll sleep on it and look at the code again when I'm more awake. Thanks
    very much for the help! :)

    --
    cb

    Mh... several more problems fixed. For example, I missed another
    Endianness reversal: The words of the message itself are processed with
    the lowest byte first.

    My fix doesn't look too pretty, but I think it's the fastest it gets
    without heavily optimizing:

            private static int littleEndian(int in)
            {
                    int out=0;
                    if (in<0)
                    {
                            in-=Integer.MIN_VALUE;
                            out+=128;
                    }
                    for (int i=0;i<4;i++)
                    {
                            out+=in%256*Math.pow(256,3-i);
                            in/=256;
                    }
                    return out;
            }

    Also, I missed a crucial part of the algorithm:

             b := ((a + f + k[i] + w[g]) leftrotate r[i]) + b

    And finally, it appears that the bitwise shift operator isn't doing what
    I thought it would - no rotation; bits to the left are removed, 0s are
    added to the right. Too bad - I'd hoped I could use the built-in <<
    operator; now I need a new function for it. (i=i*2+(i<0?1:0)) is
    reasonably elegant for this, however.

    Oh, and the final problem: I forgot to reverse the Endianness of the
    hash result after calculation.

    With all this, I am now finally getting the correct result for the
    zero-length message (d41d8cd98f00b204e9800998ecf8427e). I'm still
    getting incorrect results for actual messages, but they have nothing in
    common and I can be reasonably sure that's because of incorrect padding
    or character encoding, not the algorithm.

    --
    cb

    Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc