cardinal_pythonlib.hash


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Hash functions

In general, consider these hash functions:

  • hash64(), using MurmurHash3 to provide a 64-bit integer: for fast INSECURE COMPARISON operations.

  • an Hmac* class for SECURE cryptographic hashes.

Regarding None/NULL values (in CRATE):

  • For difference detection, it may be helpful to be able to compare a standard hash, in which case somehash(None) == somehash("None") == 'abcdefsomething'.

  • It is vital not to hash NULL patient IDs, though: for example, two different patients without an NHS number must not be equated by comparison on a hash of the (NULL) NHS number.

  • For anonymisation, this is handled in these functions:

    crate_anon/anonymise/anonymise.py / process_table()
    -> crate_anon/anonymise/configfiles.py / Config.encrypt_master_pid()
    -> crate_anon/anonymise/patient.py / Patient.get_rid
        ... via PatientInfo.rid
        ... to Config.encrypt_primary_pid()
    
class cardinal_pythonlib.hash.GenericHasher[source]

Abstract base class for a hasher.

hash(raw: Any) str[source]

Returns a hash of its input.

output_length() int[source]

Returns the length of the hashes produced by this hasher.

sqla_column_type() TypeEngine[source]

Returns a SQLAlchemy Column type instance, specifically String(length=self.output_length()).

class cardinal_pythonlib.hash.GenericHmacHasher(digestmod: Any, key: str)[source]

Generic representation of a hasher that hashes things via an HMAC (a hash-based message authentication code). See https://en.wikipedia.org/wiki/HMAC

HMAC hashers are the thing to use if what you are hashing is secret.

Parameters:
  • digestmod – see hmac.HMAC.__init__()

  • key – cryptographic key to use

hash(raw: Any) str[source]

Returns the hex digest of a HMAC-encoded version of the input.

class cardinal_pythonlib.hash.GenericSaltedHasher(hashfunc: Callable[[bytes], Any], salt: str)[source]

Generic representation of a simple salted hasher that stores a hash function and a salt.

Note that these are vulnerable to attack: if an attacker knows a (message, digest) pair, it may be able to calculate another. See https://benlog.com/2008/06/19/dont-hash-secrets/ and https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.8430

You should use HMAC instead if the thing you are hashing is secret.

Parameters:
  • hashfunc – hash function to use

  • salt – salt to use (following UTF-8 encoding)

hash(raw: Any) str[source]

Returns a hash of its input.

class cardinal_pythonlib.hash.HmacMD5Hasher(key: str)[source]

HMAC hasher based on MD5. (Even though MD5 is insecure, HMAC-MD5 is better. See Bellare M, Canetti R, Krawcyk H. Keying hash functions for message authentication. Lect. Notes Comput. Sci. Adv. Cryptol. - Crypto 96 Proc. 1996; 1109: 1–15.)

Parameters:
  • digestmod – see hmac.HMAC.__init__()

  • key – cryptographic key to use

class cardinal_pythonlib.hash.HmacSHA256Hasher(key: str)[source]

HMAC hasher based on SHA256.

Parameters:
  • digestmod – see hmac.HMAC.__init__()

  • key – cryptographic key to use

class cardinal_pythonlib.hash.HmacSHA512Hasher(key: str)[source]

HMAC hasher based on SHA512.

Parameters:
  • digestmod – see hmac.HMAC.__init__()

  • key – cryptographic key to use

class cardinal_pythonlib.hash.MD5Hasher(salt: str)[source]

Salted hasher based on MD5.

MD5 is cryptographically FLAWED; avoid using it or this class.

Parameters:
  • hashfunc – hash function to use

  • salt – salt to use (following UTF-8 encoding)

class cardinal_pythonlib.hash.SHA256Hasher(salt: str)[source]

Salted hasher based on SHA256.

Parameters:
  • hashfunc – hash function to use

  • salt – salt to use (following UTF-8 encoding)

class cardinal_pythonlib.hash.SHA512Hasher(salt: str)[source]

Salted hasher based on SHA512.

Parameters:
  • hashfunc – hash function to use

  • salt – salt to use (following UTF-8 encoding)

cardinal_pythonlib.hash.bytes_to_long(bytesdata: bytes) int[source]

Converts an 8-byte sequence to a long integer.

Parameters:

bytesdata – 8 consecutive bytes, as a bytes object, in little-endian format (least significant byte [LSB] first)

Returns:

integer

cardinal_pythonlib.hash.compare_python_to_reference_murmur3_32(data: Any, seed: int = 0) None[source]

Checks the pure Python implementation of 32-bit murmur3 against the mmh3 C-based module.

Parameters:
  • data – data to hash

  • seed – seed

Raises:

AssertionError – if the two calculations don’t match

cardinal_pythonlib.hash.compare_python_to_reference_murmur3_64(data: Any, seed: int = 0) None[source]

Checks the pure Python implementation of 64-bit murmur3 against the mmh3 C-based module.

Parameters:
  • data – data to hash

  • seed – seed

Raises:

AssertionError – if the two calculations don’t match

cardinal_pythonlib.hash.hash32(data: Any, seed: int = 0) int[source]

Non-cryptographic, deterministic, fast hash.

Parameters:
  • data – data to hash

  • seed – seed

Returns:

signed 32-bit integer

cardinal_pythonlib.hash.hash64(data: Any, seed: int = 0) int[source]

Non-cryptographic, deterministic, fast hash.

Parameters:
  • data – data to hash

  • seed – seed

Returns:

signed 64-bit integer

cardinal_pythonlib.hash.main() None[source]

Command-line validation checks.

cardinal_pythonlib.hash.murmur3_64(data: bytes | bytearray, seed: int = 19820125) int[source]

Pure 64-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash (plus RNC bugfixes).

Parameters:
  • data – data to hash

  • seed – seed

Returns:

integer hash

cardinal_pythonlib.hash.murmur3_x86_32(data: bytes | bytearray, seed: int = 0) int[source]

Pure 32-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash.

Parameters:
  • data – data to hash

  • seed – seed

Returns:

integer hash

cardinal_pythonlib.hash.pymmh3_hash128(key: bytes | bytearray, seed: int = 0, x64arch: bool = True) int[source]

Implements 128bit murmur3 hash, as per pymmh3.

Parameters:
  • key – data to hash

  • seed – seed

  • x64arch – is a 64-bit architecture available?

Returns:

integer hash

cardinal_pythonlib.hash.pymmh3_hash128_x64(key: bytes | bytearray, seed: int) int[source]

Implements 128-bit murmur3 hash for x64, as per pymmh3, with some bugfixes.

Parameters:
  • key – data to hash

  • seed – seed

Returns:

integer hash

cardinal_pythonlib.hash.pymmh3_hash128_x86(key: bytes | bytearray, seed: int) int[source]

Implements 128-bit murmur3 hash for x86, as per pymmh3, with some bugfixes.

Parameters:
  • key – data to hash

  • seed – seed

Returns:

integer hash

cardinal_pythonlib.hash.pymmh3_hash64(key: bytes | bytearray, seed: int = 0, x64arch: bool = True) Tuple[int, int][source]

Implements 64bit murmur3 hash, as per pymmh3. Returns a tuple.

Parameters:
  • key – data to hash

  • seed – seed

  • x64arch – is a 64-bit architecture available?

Returns:

tuple of integers, (signed_val1, signed_val2)

Return type:

tuple

cardinal_pythonlib.hash.signed_to_twos_comp(val: int, n_bits: int) int[source]

Convert a signed integer to its “two’s complement” representation.

Parameters:
  • val – signed integer

  • n_bits – number of bits (which must reflect a whole number of bytes)

Returns:

two’s complement version

Return type:

unsigned integer

cardinal_pythonlib.hash.to_bytes(data: Any) bytearray[source]

Convert anything to a bytearray.

See

cardinal_pythonlib.hash.to_str(data: Any) str[source]

Convert anything to a str.

cardinal_pythonlib.hash.twos_comp_to_signed(val: int, n_bits: int) int[source]

Convert a “two’s complement” representation (as an integer) to its signed version.

Parameters:
  • val – positive integer representing a number in two’s complement format

  • n_bits – number of bits (which must reflect a whole number of bytes)

Returns:

signed integer

See https://stackoverflow.com/questions/1604464/twos-complement-in-python