cardinal_pythonlib.hash¶
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Hash functions
In general, consider these hash functions:
hash64()
, using MurmurHash3 to provide a 64-bit integer: for fast INSECURE COMPARISON operations.- an
Hmac*
class for SECURE cryptographic hashes.
Regarding None/NULL values (in CRATE):
For difference detection, it may be helpful to be able to compare a standard hash, in which case
somehash(None) == somehash("None") == 'abcdefsomething'
.It is vital not to hash NULL patient IDs, though: for example, two different patients without an NHS number must not be equated by comparison on a hash of the (NULL) NHS number.
For anonymisation, this is handled in these functions:
crate_anon/anonymise/anonymise.py / process_table() -> crate_anon/anonymise/configfiles.py / Config.encrypt_master_pid() -> crate_anon/anonymise/patient.py / Patient.get_rid ... via PatientInfo.rid ... to Config.encrypt_primary_pid()
-
class
cardinal_pythonlib.hash.
GenericHmacHasher
(digestmod: Any, key: str)[source]¶ Generic representation of a hasher that hashes things via an HMAC (a hash-based message authentication code). See https://en.wikipedia.org/wiki/HMAC
HMAC hashers are the thing to use if what you are hashing is secret.
Parameters: - digestmod – see
hmac.HMAC.__init__()
- key – cryptographic key to use
- digestmod – see
-
class
cardinal_pythonlib.hash.
GenericSaltedHasher
(hashfunc: Callable[[bytes], Any], salt: str)[source]¶ Generic representation of a simple salted hasher that stores a hash function and a salt.
Note that these are vulnerable to attack: if an attacker knows a
(message, digest)
pair, it may be able to calculate another. See https://benlog.com/2008/06/19/dont-hash-secrets/ and https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.8430You should use HMAC instead if the thing you are hashing is secret.
Parameters: - hashfunc – hash function to use
- salt – salt to use (following UTF-8 encoding)
-
class
cardinal_pythonlib.hash.
HmacMD5Hasher
(key: str)[source]¶ HMAC hasher based on MD5. (Even though MD5 is insecure, HMAC-MD5 is better. See Bellare M, Canetti R, Krawcyk H. Keying hash functions for message authentication. Lect. Notes Comput. Sci. Adv. Cryptol. - Crypto 96 Proc. 1996; 1109: 1–15.)
-
class
cardinal_pythonlib.hash.
MD5Hasher
(salt: str)[source]¶ Salted hasher based on MD5.
MD5 is cryptographically FLAWED; avoid using it or this class.
-
cardinal_pythonlib.hash.
bytes_to_long
(bytesdata: bytes) → int[source]¶ Converts an 8-byte sequence to a long integer.
Parameters: bytesdata – 8 consecutive bytes, as a bytes
object, in little-endian format (least significant byte [LSB] first)Returns: integer
-
cardinal_pythonlib.hash.
compare_python_to_reference_murmur3_32
(data: Any, seed: int = 0) → None[source]¶ Checks the pure Python implementation of 32-bit murmur3 against the
mmh3
C-based module.Parameters: - data – data to hash
- seed – seed
Raises: AssertionError
– if the two calculations don’t match
-
cardinal_pythonlib.hash.
compare_python_to_reference_murmur3_64
(data: Any, seed: int = 0) → None[source]¶ Checks the pure Python implementation of 64-bit murmur3 against the
mmh3
C-based module.Parameters: - data – data to hash
- seed – seed
Raises: AssertionError
– if the two calculations don’t match
-
cardinal_pythonlib.hash.
hash32
(data: Any, seed: int = 0) → int[source]¶ Non-cryptographic, deterministic, fast hash.
Parameters: - data – data to hash
- seed – seed
Returns: signed 32-bit integer
-
cardinal_pythonlib.hash.
hash64
(data: Any, seed: int = 0) → int[source]¶ Non-cryptographic, deterministic, fast hash.
Parameters: - data – data to hash
- seed – seed
Returns: signed 64-bit integer
-
cardinal_pythonlib.hash.
murmur3_64
(data: Union[bytes, bytearray], seed: int = 19820125) → int[source]¶ Pure 64-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash (plus RNC bugfixes).
Parameters: - data – data to hash
- seed – seed
Returns: integer hash
-
cardinal_pythonlib.hash.
murmur3_x86_32
(data: Union[bytes, bytearray], seed: int = 0) → int[source]¶ Pure 32-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash.
Parameters: - data – data to hash
- seed – seed
Returns: integer hash
-
cardinal_pythonlib.hash.
pymmh3_hash128
(key: Union[bytes, bytearray], seed: int = 0, x64arch: bool = True) → int[source]¶ Implements 128bit murmur3 hash, as per
pymmh3
.Parameters: - key – data to hash
- seed – seed
- x64arch – is a 64-bit architecture available?
Returns: integer hash
-
cardinal_pythonlib.hash.
pymmh3_hash128_x64
(key: Union[bytes, bytearray], seed: int) → int[source]¶ Implements 128-bit murmur3 hash for x64, as per
pymmh3
, with some bugfixes.Parameters: - key – data to hash
- seed – seed
Returns: integer hash
-
cardinal_pythonlib.hash.
pymmh3_hash128_x86
(key: Union[bytes, bytearray], seed: int) → int[source]¶ Implements 128-bit murmur3 hash for x86, as per
pymmh3
, with some bugfixes.Parameters: - key – data to hash
- seed – seed
Returns: integer hash
-
cardinal_pythonlib.hash.
pymmh3_hash64
(key: Union[bytes, bytearray], seed: int = 0, x64arch: bool = True) → Tuple[int, int][source]¶ Implements 64bit murmur3 hash, as per
pymmh3
. Returns a tuple.Parameters: - key – data to hash
- seed – seed
- x64arch – is a 64-bit architecture available?
Returns: tuple of integers,
(signed_val1, signed_val2)
Return type: tuple
-
cardinal_pythonlib.hash.
signed_to_twos_comp
(val: int, n_bits: int) → int[source]¶ Convert a signed integer to its “two’s complement” representation.
Parameters: - val – signed integer
- n_bits – number of bits (which must reflect a whole number of bytes)
Returns: two’s complement version
Return type: unsigned integer
-
cardinal_pythonlib.hash.
to_bytes
(data: Any) → bytearray[source]¶ Convert anything to a
bytearray
.See
-
cardinal_pythonlib.hash.
twos_comp_to_signed
(val: int, n_bits: int) → int[source]¶ Convert a “two’s complement” representation (as an integer) to its signed version.
Parameters: - val – positive integer representing a number in two’s complement format
- n_bits – number of bits (which must reflect a whole number of bytes)
Returns: signed integer
See https://stackoverflow.com/questions/1604464/twos-complement-in-python