This project has moved. For the latest updates, please go here.

HASHBYTES comparison

Aug 19, 2011 at 11:01 PM

I'm trying to take the output of your component using the MD5 algorithm with the TSQL HASHBYTES function using the same algorithm.  For simplicity I am using only a single field but not getting the same hash result.  How can I get the two to compare?

Coordinator
Aug 20, 2011 at 8:03 AM

MultipleHash converts unicode character data, and this could be causing the hash to be different between MultipleHash and HASHBYTES.  MultipleHash uses UTF8 as the encoding on all string types.  Use a CAST(xxx AS VARCHAR(MAX)) within SQL Server's HASHBYTES to work around this.

The other potential cause of differences between the two is the "Safe Null Handling", which keeps track of the length of columns, and if there are nullable column's and their state.  This is to prevent the situation shown in the examples below.

The end result of this is that the string being passed into HASHBYTES won't be as simple as you expect.

The string passed into Hash generator in Multiple Hash will be each column converted to Bytes concatenated together, followed by a string of Y/N flags (Null/NotNull), and integer lengths of the data in strings.

For example:

The string "Hello" is to be hashed, with Safe Null Handling enabled.  The string "HelloN5" is what will actually be hashed. (0x7CBAC2525C55EBD0BD136025D3F0AB63)

The strings "Hello", NULL and "Worlds" are to be hashed, with Safe Null Handling enabled.  The string "HelloWorldsN5YN6" is what will actually be hashed. (0xE2AA92A697B6167999A1AF863AE7DBBB)

The strings "Hello", "Worlds" and NULL are to be hashed, with Safe Null Handling enabled.  The string "HelloWorldsN5N6Y" is what will actually be hashed. (0xABADA53B6A002C54F5AA96FA82773189)

The strings "Hello", NULL and "Worlds" are to be hashed, with Safe Null Handling disabled.  The string "HelloWorlds" is what will actually be hashed. (0x5537544C38F136C6DAECD7C9EE031342)

The strings "Hello", "Worlds" and NULL are to be hashed, with Safe Null Handling disabled.  The string "HelloWorlds" is what will actually be hashed. (0x5537544C38F136C6DAECD7C9EE031342)

 

Please note that the last two examples show why Safe Null Handling is required, as they both generate the same Hash value...

Look at the function CalculateHash in the file Utility.cs if you wish to see what is actually being done in the C# code.

Oct 25, 2013 at 4:13 PM
Is it still valid?

When trying to hash using SHA1 the 2 strings 'C-2012-76187' AND '45' with Safe Null Handling enabled, I seems that I obtain 0x7CF71B76B3B3FB80F38BAACB467306E99CBEBACA000000000000000000000000 (version 1.6.1 + SSIS 2012).

SQL 2012 returns :
SELECT HASHBYTES('SHA1',CAST('C-2012-7618745N12N2' AS VARCHAR(MAX))) AS SHA1 ---> 0x3236A5939C26718BB9480393B1C97C122956F3D0

Did I miss something or are you using another algorithm?

Thank you for your time and cooperation
Jun 16, 2014 at 6:50 PM
I have this exact same problem. I must've tried 4-5 different methods and not one generates the same SHA1 hash in SQL than what is generated by Multiple Hash. I've used varchar, varbinary, even nvarchar. I also used a CLR function to generate the hash value. No go on all.

The sample query below is where I last left off at.
SET CONCAT_NULL_YIELDS_NULL OFF;     -- Not needed in SQL 2012. Use CONCAT() instead of +.

-- patssn is a decimal(10,0) field; patdob is a datetime field
SELECT
    HASHBYTES('SHA1', CONVERT(varchar(10), pt.patssn) + CONVERT(varchar(27), CONVERT(datetime2(7), pt.patdob)) +
        CASE WHEN pt.patssn IS NULL THEN 'Y' ELSE 'N' END +
        CASE WHEN pt.patdob IS NULL THEN 'Y' ELSE 'N' END
    ) AS HashBytes_VarChar,
    dbo.fnComputeHash_String(CONVERT(varchar(10), pt.patssn) + CONVERT(varchar(27), CONVERT(datetime2(7), pt.patdob)) +
        CASE WHEN pt.patssn IS NULL THEN 'Y' ELSE 'N' END +
        CASE WHEN pt.patdob IS NULL THEN 'Y' ELSE 'N' END, N'SHA1') AS HashCLR_VarChar,
    HASHBYTES('SHA1', CONVERT(varbinary(MAX), pt.patssn) + CONVERT(varbinary(MAX), CONVERT(datetime2(7), pt.patdob)) +
        CONVERT(varbinary(MAX), CASE WHEN pt.patssn IS NULL THEN 'Y' ELSE 'N' END) +
        CONVERT(varbinary(MAX), CASE WHEN pt.patdob IS NULL THEN 'Y' ELSE 'N' END)
    ) AS HashBytes_Binary,
        HASHBYTES('SHA1', CONVERT(varbinary(MAX), CONVERT(varchar(10), pt.patssn) + CONVERT(varchar(27), CONVERT(datetime2(7), pt.patdob)) +
        CASE WHEN pt.patssn IS NULL THEN 'Y' ELSE 'N' END +
        CASE WHEN pt.patdob IS NULL THEN 'Y' ELSE 'N' END)
    ) AS HashBytes_Binary2
FROM ...
The only time I get a matching hash is when the patssn and patdob fields are both null. The content in each field is only standard ASCII characters.
Coordinator
Jun 17, 2014 at 12:45 PM
MultipleHash is using the SSIS data types. These are NOT the same as the TSQL data types.
String's are easy, with a Y/N for Null, and a Length as string added to the end of the data to be hashed.
Date types are relatively easy, with the data converted to "121", and a length byte prefixed for that string. A Y/N for Null is added to the end of the data to be hashed. (No length string though).
Numeric types are difficult as SSIS and TSQL store them completely differently.

FYI:
Only variable length columns will have the length string appended to the end of the data to be hashed.

Examples:
-- Hash Bytes compared to Multiple Hash

-- Strings are Easy...
-- String "Hello"
-- Hashed Value = "HelloN5"
-- MD5 Result 0x7CBAC2525C55EBD0BD136025D3F0AB63
SELECT HASHBYTES('MD5', CAST('HelloN5' AS VARCHAR(MAX)))

-- Strings "Hello", "World"
-- Hashed Value = "HelloWorldN5N5"
-- MD5 Result 0x0F7830EDA9458CECD751ED7B357FB371
SELECT HASHBYTES('MD5', CAST('HelloWorldN5N5' AS VARCHAR(MAX)))

-- String "C-2012-76187", "45"
-- Hashed Value = "C-2012-7618745N12N2"
-- MD5 Result  0x82A96729550948C40513AC4308F7561E
-- SHA1 Result 0x3236A5939C26718BB9480393B1C97C122956F3D0
SELECT HASHBYTES('MD5', CAST('C-2012-7618745N12N2' AS VARCHAR(MAX)))
SELECT HASHBYTES('SHA1', CAST('C-2012-7618745N12N2' AS VARCHAR(MAX)))

-- DateTime2 = '2014-06-17 12:00:00Z'
-- Decimal 10,0 = 10
-- Hashed Value as Binary = 0x1B323031342D30362D31372031323A30303A30302E303030303030300A0000000000000000000000000000004E4E
-- Hashed Value as String  =  **2 0 1 4 - 0 6 - 1 7   1 2 : 0 0 : 0 0 . 0 0 0 0 0 0 0 **                              N N
-- The string example is not real as the 1B, 0A and 00's don't translate real well...
-- SHA1 Result 0x570B4CC87C13ABCC3B04D5A1AAC2079EEC90B495

-- How to cast the DateTime2 type...
DECLARE @DateTimeValue DATETIME2(7) = CAST('2014-06-17 12:00:00Z' AS DATETIME2(7))
SELECT CONCAT(CHAR(LEN(CONVERT(VARCHAR(30), @DateTime2, 121))), CONVERT(VARCHAR(30), @DateTime2, 121))

-- How to cast a Decimal type...
-- I haven't been able to do this.
-- Example Decimal TSQL values and the value that would be placed into the Hash.
--DECIMAL(10,0)
-- 10.0 -> 0A-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
-- 11.0 -> 0B-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
--DECIMAL(10,1)
-- 10.0 -> 64-00-00-00-00-00-00-00-00-00-00-00-00-00-01-00
-- 10.1 -> 65-00-00-00-00-00-00-00-00-00-00-00-00-00-01-00
-- 11.2 -> 70-00-00-00-00-00-00-00-00-00-00-00-00-00-01-00
-- 5000.0 -> 50-C3-00-00-00-00-00-00-00-00-00-00-00-00-01-00

-- INTEGER
-- 45 -> 2D-00-00-00
Hopefully there is enough information here to enable you to create the DECIMAL conversion that is needed.
Jun 17, 2014 at 1:30 PM
I am a little confused on the datetime2 portion. I am not able to even reproduce the binary representation. If I do the following:
DECLARE @DateTimeValue DATETIME2(7) = CAST('2014-06-17 12:00:00Z' AS DATETIME2(7))
SELECT CONVERT(varbinary(max), CONCAT(CHAR(LEN(CONVERT(VARCHAR(30), @DateTimeValue , 121))), CONVERT(VARCHAR(30), @DateTimeValue , 121)))
SQL returns 0x1B323031342D30362D31372031323A30303A30302E30303030303030. If you concatenate the 'NN' portion, to the above string, i.e.
SELECT CONVERT(varbinary(max), CONCAT(CHAR(LEN(CONVERT(VARCHAR(30), @DateTimeValue , 121))), CONVERT(VARCHAR(30), @DateTimeValue , 121), 'NN'))
You'll get 0x1B323031342D30362D31372031323A30303A30302E303030303030304E4E. If I reverse your binary string "0x1B323031342D30362D31372031323A30303A30302E303030303030300A0000000000000000000000000000004E4E",
[0x1B]2014-06-17 12:00:00.0000000
               NN
How do you end up with the 0x0A and 0x00's in the first place?
Coordinator
Jun 18, 2014 at 10:25 AM
Sorry to have confused you.

The Hashed Value as Binary below is made up of the input DateTime2 and the Decimal(10,0) values.
-- DateTime2 = '2014-06-17 12:00:00Z'
-- Decimal 10,0 = 10
-- Hashed Value as Binary = 0x1B323031342D30362D31372031323A30303A30302E303030303030300A0000000000000000000000000000004E4E
The 1B323031342D30362D31372031323A30303A30302E30303030303030 is the DateTime2 column.
The 0A000000000000000000000000000000 is the Decimal(10,0) which in SSIS is DT_NUMERIC(10,0).
The two 4E's (N) are because there are two input columns.

If you were just doing the datetime2 column, (ie. single column being hashed), the Hashed Value would be:
0x1B323031342D30362D31372031323A30303A30302E303030303030304E
Marked as answer by kmartin on 7/8/2014 at 6:14 AM