BIP39 Seed Phrase / Mnemonic Generator in Python

Seed phrases are sentences that encode some binary data; that data is used to seed the generation of private keys for the crypto wallets, hence the name "seed phrases."
BIP39 is a standard for generating these seed phases. In BIP39, 2048 words are chosen that can be used in these seed phases and are easy to distinguish, easy to remember, and sorted for efficient searching. The length of the seed phrases can be 12, 15, 18, 21, or 24 words.
The generation of the seed phase according to BIP39 is done in the following 5 steps:
- Setup: Download the wordlist from here and save it in the project folder as wordlist.txt
import secrets
import hashlib
wordlist_path = "./wordlist.txt"
with open(wordlist_path, "r") as f:
wordlist = [w.strip() for w in f.readlines()]
strength = 128 # strength should be a multiple of 32 and between 128 and 256
- Generating Entropy(Randomness): The more entropy or randomness, the better the security, but the trade-off is the longer the sentence gets, as when we increase the number of words in the seed phrase, it is getting less probable that someone will get the exact seed phrase. This is typically done using a cryptographically secure random number generator, such as the one provided by your operating system. BIP39 recommends using at least 128 bits of entropy to provide a high level of security against brute-force attacks.
entropy = secrets.token_bytes(strength // 8) # as one byte is 8 bits
- Generating Checksum: The checksum is used to detect errors in the seed phrase. By appending the first (entropy length / 32) bits of the SHA256 hash of the entropy to the end of the entropy, we can ensure that any errors in the seed phrase will be detected. For example, if the seed phrase is mistyped or copied incorrectly, the checksum won’t match and the wallet won’t be able to generate the correct private keys.
entropy_hash = hashlib.sha256(entropy).hexdigest()
seed_bin = (
bin(int.from_bytes(entropy, byteorder="big"))[2:].zfill(len(entropy) * 8)
+ bin(int(entropy_hash, 16))[2:].zfill(256)[: len(entropy) * 8 // 32]
)
- Splitting the entropy into groups of bits: BIP39 splits the entropy into groups of 11 bits since there are 2048 words in the BIP39 wordlist, which is 2¹¹. This ensures that each group of 11 bits can be uniquely represented by a word from the wordlist.
seed_bin_groups = []
for i in range(len(seed_bin) // 11):
seed_bin_groups.append(seed_bin[i * 11 : (i + 1) * 11])
- Convert to decimal: The groups of 11 bits are converted to decimal numbers, which are used as indices in the BIP39 wordlist.
seed_dec_groups = list(map(lambda x: int(x, 2), seed_bin_groups))
- Convert to Words: Finally, each decimal number is used to look up the corresponding word from the BIP39 wordlist, resulting in a list of words that make up the seed phrase.
seed_word_list = list(map(lambda x: wordlist[x], seed_dec_groups))
seed_phrase = " ".join(seed_word_list)
It's important to remember that BIP39 is only one way to make seed phrases and that different wallets and apps may use different standards or different versions of the BIP39 standard. But BIP39 is used a lot and has become a standard in the crypto community.