cardinal_pythonlib.hash
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Hash functions
In general, consider these hash functions:
hash64()
, using MurmurHash3 to provide a 64-bit integer: for fast INSECURE COMPARISON operations.an
Hmac*
class for SECURE cryptographic hashes.
Regarding None/NULL values (in CRATE):
For difference detection, it may be helpful to be able to compare a standard hash, in which case
somehash(None) == somehash("None") == 'abcdefsomething'
.It is vital not to hash NULL patient IDs, though: for example, two different patients without an NHS number must not be equated by comparison on a hash of the (NULL) NHS number.
For anonymisation, this is handled in these functions:
crate_anon/anonymise/anonymise.py / process_table() -> crate_anon/anonymise/configfiles.py / Config.encrypt_master_pid() -> crate_anon/anonymise/patient.py / Patient.get_rid ... via PatientInfo.rid ... to Config.encrypt_primary_pid()
- class cardinal_pythonlib.hash.GenericHmacHasher(digestmod: Any, key: str)[source]
Generic representation of a hasher that hashes things via an HMAC (a hash-based message authentication code). See https://en.wikipedia.org/wiki/HMAC
HMAC hashers are the thing to use if what you are hashing is secret.
- class cardinal_pythonlib.hash.GenericSaltedHasher(hashfunc: Callable[[bytes], Any], salt: str)[source]
Generic representation of a simple salted hasher that stores a hash function and a salt.
Note that these are vulnerable to attack: if an attacker knows a
(message, digest)
pair, it may be able to calculate another. See https://benlog.com/2008/06/19/dont-hash-secrets/ and https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.8430You should use HMAC instead if the thing you are hashing is secret.
- class cardinal_pythonlib.hash.HmacMD5Hasher(key: str)[source]
HMAC hasher based on MD5. (Even though MD5 is insecure, HMAC-MD5 is better. See Bellare M, Canetti R, Krawcyk H. Keying hash functions for message authentication. Lect. Notes Comput. Sci. Adv. Cryptol. - Crypto 96 Proc. 1996; 1109: 1–15.)
- class cardinal_pythonlib.hash.MD5Hasher(salt: str)[source]
Salted hasher based on MD5.
MD5 is cryptographically FLAWED; avoid using it or this class.
- cardinal_pythonlib.hash.bytes_to_long(bytesdata: bytes) int [source]
Converts an 8-byte sequence to a long integer.
- Parameters:
bytesdata¶ – 8 consecutive bytes, as a
bytes
object, in little-endian format (least significant byte [LSB] first)- Returns:
integer
- cardinal_pythonlib.hash.compare_python_to_reference_murmur3_32(data: Any, seed: int = 0) None [source]
Checks the pure Python implementation of 32-bit murmur3 against the
mmh3
C-based module.
- cardinal_pythonlib.hash.compare_python_to_reference_murmur3_64(data: Any, seed: int = 0) None [source]
Checks the pure Python implementation of 64-bit murmur3 against the
mmh3
C-based module.
- cardinal_pythonlib.hash.hash32(data: Any, seed: int = 0) int [source]
Non-cryptographic, deterministic, fast hash.
- cardinal_pythonlib.hash.hash64(data: Any, seed: int = 0) int [source]
Non-cryptographic, deterministic, fast hash.
- cardinal_pythonlib.hash.murmur3_64(data: bytes | bytearray, seed: int = 19820125) int [source]
Pure 64-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash (plus RNC bugfixes).
- cardinal_pythonlib.hash.murmur3_x86_32(data: bytes | bytearray, seed: int = 0) int [source]
Pure 32-bit Python implementation of MurmurHash3; see https://stackoverflow.com/questions/13305290/is-there-a-pure-python-implementation-of-murmurhash.
- cardinal_pythonlib.hash.pymmh3_hash128(key: bytes | bytearray, seed: int = 0, x64arch: bool = True) int [source]
Implements 128bit murmur3 hash, as per
pymmh3
.
- cardinal_pythonlib.hash.pymmh3_hash128_x64(key: bytes | bytearray, seed: int) int [source]
Implements 128-bit murmur3 hash for x64, as per
pymmh3
, with some bugfixes.
- cardinal_pythonlib.hash.pymmh3_hash128_x86(key: bytes | bytearray, seed: int) int [source]
Implements 128-bit murmur3 hash for x86, as per
pymmh3
, with some bugfixes.
- cardinal_pythonlib.hash.pymmh3_hash64(key: bytes | bytearray, seed: int = 0, x64arch: bool = True) Tuple[int, int] [source]
Implements 64bit murmur3 hash, as per
pymmh3
. Returns a tuple.
- cardinal_pythonlib.hash.signed_to_twos_comp(val: int, n_bits: int) int [source]
Convert a signed integer to its “two’s complement” representation.
- cardinal_pythonlib.hash.to_bytes(data: Any) bytearray [source]
Convert anything to a
bytearray
.See
- cardinal_pythonlib.hash.twos_comp_to_signed(val: int, n_bits: int) int [source]
Convert a “two’s complement” representation (as an integer) to its signed version.
- Parameters:
- Returns:
signed integer
See https://stackoverflow.com/questions/1604464/twos-complement-in-python