title txtIntroduction to the bookThere's a lot of excitement about Bitcoin and cryptocurrencies We hear about startupsinvestmentsmeetups, and even buying pizza with Bitcoin Optimists claim that Bitcoin will fundamentalroken and will suffer an inev politics around the world Pessimists claim Bitopayments, economics, and evenherentlyUnderlying these differing views is significant confusion about what Bitcoin is and how itwrote this book to help cut through the hype and get to the core of what makes BitcoinuniqueTo really understand what is special about Bitcoin, we need to understand how it works at atechnlevel Bitcoin truly is a new technology and we can only get so far by explaining it througanalogies to past technologiesWe'll assume that you have a basic understanding of computer science how computers work
structures and algorithms, and some programming experience If you're an undergraduate orgraduate student of computer science, a software developer, an entrepreneur, or ahobbyist, this textbook is for youIn this series of eleven chapters, we'll address the important questions about Bitcoin HowBitcoin work? What makes it different? How secure are your bitcoins? How anonymous arecoInusers? What applications can we build using Bitcoin as a platform? Can cryptocurrencies beregulated? If we were designing a new cryptocurrency today what would we change? Whatfuture hold?Each chapter has a series of homework questions to help you understand these questions at alevel We highly recommend you work through them In addition, there is a series of fiveprogramming assignments in which you'll implement various components of Bitcoin in simplifiedmodels If you're an auditory learner, most of the material of this book is also available assyou should also supplement your learninginformation you can find onlineBitcoin wiki, forums, and research papers, and by interacting with your peersBitcoin community
is this: if H has an n-bit output, then it can take any of 2" values Solving the puzzleequires finding an input so that the output falls within the set y, which is typically much smaller thaoftouts The size of y detehow hard the puzzle is If y is the setbit streas if Y has only 1 element the puzzle is maximally hard The fact thatpuzzle id has high min-entropy ensures that there are no shortcuts On the contrary, if a particularsomeone could cheat say bymputing a solution to the puzzleIf a search puzzle is puzzle- friendly this implies that there s no solving strategy for this puzzle which isve, we can do it this way as long as we can generate puzzle- lDs in a stse this idea later when we talk about bSHA-256 We've discussed three properties of hash functions and one appNow lets discuss a particular hash function that we re going to use a lot in this book there are lotshash functionxistence
but this is the oneuses primarlyts called sHA-256Recall that we require that our hasons workputs of arbitrary length Luckily, as longwe can build a hash function that works on fixed-length inputs there s a generic method to convert itrks on arbitrary-length inputs It's called the Merkle-Damgard transformSHA-256 is one of a number of commed hash functions that make use of this methodommon terminology the underlying fixed-length collision-resistant hash function is called thecompression function It has been proven that if the underlying compression function is collisionttrd tnple Say tiakes inputs of lenand produces an output of a smaller length n the input to the hash function, which can be of any sizedivided into blocksonstruction works as followse output of the preere is no previous block output, we instead use an initialization Vector (v) This number is reusedfor every call to the hash function and in practice you cala standards docuction that takes 768-bit input and produces 256-bit outputs, the
512 bitsessageMessage(block 1)(block 2)(block n256 bits256 bitsFigure 13: SHA-256 Hash Function(simplified/ SHA-256 uses the Merkle-Damgard transform to turnfixed-length colliscompressnto a hash functWeve talked about hash functions, cryptographic hasjons with special properties, applof those properties, and a specific hash function that we use in Bitcoin In the nextdiscuss ways of using hash functions to build more complicated data structures that are used12 Hash Pointers and data structuresto dhash pointers and thA hasha datamply a pointer to where some information is stored together with a cryptographic haFormation
Whereas a regular pointer gives you a way to retrieve the information, a hash pointeralso gives you a way to verify that the information hasnt changedH()igure 1 4 Hash pointer a hash pointer is a pointer to where data is stored together with aat data ae fixed point in til
shbuildds of data structuresitively we can take a familiar datastructureuses pointers such as a linkor a binary search tree and implement it witlpointers, instead of pBlock chaiFigure 15, we bked list using hash ps Were goingstructure a block chain Whereas asegular linked list where you have a series of blocks, eaclblock has dataointer to the previous block in the list in a block chain theblockeaof thprevious block was, but it also contains a digest of that value that allows us to verify that the valuee store the headmost recent data blockH(1)prev: H()prev:H()I prev: H(dataatadataFigure 15 Block chain
a block chain is a linked list that is built with hash pointers instead of pointersa use case for a block chain is a tamper-evident log that is, we want to build a log dataof data, andthe log, we' re going to detect ito understand why a block chaves this tamper-evident property let's ask what happens if aadversary wants to tamper with data thats in the middSpecifically, the adversarysdshblock chain wont be able to detect the tampering To achieve this goal, the adversary changesdata of some block k Since the data has bianged the hash in block k+1ash will not mate altered content since the hash function is collision resistant and so we widetect the inconsistencyock k and theup this change by changing thee list Specifically, as long as we store the hash pointer at the head of the list in a place where thethe adversary will bedetected
The upshot of this is that if the adversary wants to tamper with data anywhere in thisder to keep the story consistent he's going to have to tamper witlters all the wayback to the beginning and he s ultimately going too a roadblock becaont be ablemper with the head of the list Thus it emerges that by just rerwe' ve essentially remembered a tamper-evident hash of the entire list So we can build a block chainwe will call the genesis blockat the block chain construction is similar to the merkle-Damgard constructs section Indeed, they are quite similar and the same security argumentapplies to both of thempre H(preAH()atadataFigure 16 Tamper-evident log
If an adversary modifies data anywhere in the block chain, it will resulttheg block being incIf we store the head ofhe adversary modifies all of the pointers to be consistent with the modified data, the head pointedetect the tampeMerkle trees Another useful data structure thatd using hash pointers is a binary tree Anary tree with hash pointers is known as a merkle tree, after its inventor Ralph Merkle Suppose wehave a number of blocks containing data theese data blocks into pairs of two and then for each pair we build a data structure that has two hasrs, one toof thks These data structures makegroup these into groups of two, and forair create a new data structure that contains thehash of each We continue doing this until we reach a single block, the root of the tree
H(1)H(1)H(1)H(1)H(1)H(H()H(、)H()H(1)H()H(1))H(、)(data(data)(data)(dataFigure 1 7 Merkle tree In a Merkle tree data blocks are groupedof each ofese blocks is stored in a parent node the parent nodes are in turn grouped in pairs and their hashesstored one level up the tree
This continues all the way up the tree until we reach the root nodeAs before, we remember just the hash pointer at the head of the tree We now have the abilityaverse down through the hash pointers to any point in thedata hasn't been tampered wicause, use block chaers with some data block at the bottom ofe the hashter thats oneot match, and even if he continues to tamper with this block, the change will eventuallypropagate to the top of the tree where he won t be able to tamper with the hash pointer thatstored So again, any attempt to tamper with any piece of data will be detected by just rememberingopProof of membership Another nice feature of merkle trees is that unlike the block chain that we builtoof oprove that a certae remember justoot then tocks on the path from the data block to theof the tree, as the blocks on this path are enough to allow us to verify the hashes all the way up to therksthere are n nodes in the tree only about log n)items need to be shown and since each step justrequires computing the hash of the child block, it takes about log(n time for us to verify it and sober of blockselatively short time, Verification thus runs in time and space that are loga
nodes in the treeH(1)H()H()H(1)Figure 1 8 Proof of membershipe that a data blockshow the blocks in the path from that data block to the rootA sorted merkle tree is just a merkle tree where we take the blocks at the bottom and we sort thesing some ordering function this can be alphabetical, lexicographical order, numerical order, orroof of non-membership with a sorted Merkle tree, it becomes possible to verify non-membershipogarithmic time and space that is, we can prove that a particular block is not in the merkle treequestion would be and showing the path to the item that is just after where it would be
If these twobe between the two items shown but therebetweeWe've discussed using hash pointers in linked lists and binary trees, but more generally, it turnshat we can use hash pointers in any pointer-based data structure as long as the data structuredoesn't haveIf thdata structure thenbe abakehashes match up If you think aboutacyclic data structure, we can starteaves, othat don't have any pointt of theompute the hashesd the beginning but in a structure with cycles, there s no end we canstart with and compute back frobuild a directedof hash pointers an
be able to verify membership in that graph very efficiently and it will be easy to compute Using hashpointers in this manner is a general trick that you' ll see time and agthe context of thedistributed data structures and throughout the algorithmuss later in this chapter and13 Digital Signaturesis section, we'll look at digitalisIs the secog withhash functioatblocks forytonproperties from digital signatures that correspond well to the handwritten signature analogy firstlymakeify that it's valide signature to be tied to a particular document so that the signature cannot be used to indicateproperty is analogous to assuring that somebody can't take your signature and snip it off onethe bottom of another oneow can we build this in a digital form using cryptography Firstprevious intuitivediscussion slightly more concrete
This will allow us to reason better about digital signature schemesss their security propertiesDigital signature scheme a digital sith(sk, pk): generateKeys(keysize) The generateKeys method takes a key size and generatesThekey sk is kesig:=sign(sk, message)The sign method takes a message, msg, and a erify yourrybody Anyosecret key, sk, asut and outputs a signature for the msg under skify(pk, message, sig) The verify method takes a message, a signature and ae isvalidbe truekey pk, and fovalid signatures must verifye))Signatures are existentially unforgeableWe note that generateKeys and sign can be randomized algorithms Indeed generateKeys had bettered because it ought to be generating different keys for different people verieow examine the two properties that we require of a digital signature scheme in more d
etaie first property is straightforward -that valid signatures must verify If I sign a message with sk, mysecret key and someone later tries to validate that signature over that same message using my pubChapter 1: Introduction to Cryptography Cryptocurrenciesto control supply andrtiesfiat cus control the money supply anese secussible to counterfeit Ultimately law enforceakie rules ohat prevent people frpeople If Alice convinces bob that she paid him a digital cple, she should not be able toa centragests, cryptocurrencies make heavy use of cryptography Cryptography piinto a mathematical protocol
Before we can properly understand cryptocurrenciesCat are notoriously subtle and complicated to understand Fortunately Bitcoin only relies on aecifically study cryptographic hashes and digital signatures, two primitives that prove to be veryintroduce more complicated cchofs that are used in proposed exted modificationOnce weve learnt the necessary cryptographicves, we'l discuss some of thebuild cryptocurrencies We'll complete this chapter with some examples of11 Cryptographic Hash Functionse first cryptographic primitive that we'll need to understand is a cryptographic hash function Ahash functionathematical function with the followbeofzefoIt is efficiently computable intuitively this means that for a giveut string, you
tput of thet of timputing the hash of an n-bit string should have a running time that is o(n)on cryptographic hash fube cryptographically secure, were goinguire that it has the following three additionWe'll look more closely at each of these properties to gain an understanding of why it's usefulfunction that behaves that way the reader who has studied cryptography should be aware that theatment of hash functhis book is a bit differentstandard cryptography textbook Thenctions, but one that will be useful for cryptocurrencies specificallyProperty 1: Collision-resistance The first property that we needat it's collision -resistant
acccurs when two distincproduce the same output AHsistant if nobodyFormallyCollision-resistance: Ato be collision resistant if itsible to find twevalues, x and yatx≠y,yetH(x)=H(yFigure 11 A hash collision x and y are distinct values, yetnput into hash function H, theyuce thenotice that we said nobody can find a collision, but weno collisions exist Actually, weknow for a fact that collisions do exist ande this by a sistringstings of a specific fixed length because the input space is larger than the output space(indeed tlame output string In fact by the pigeonhole principle there will necessarily be a very large numbe
possible outputspossible inputsFigure 12 Because the nue number of outputs we are guaranteedere must be at least one output to which the hash function maps more than one inputNow, to make things even worse we said that it has to be impossible to find a collision Yet there areed to find a collision Consider themethod for finollision for a hash function with a 256-bit output size: pick 2distinct values, compute thehashchd check if thtwo outputsuts than possible outputs, some pair of them must collide when you apply the hash functioe method above is guaranteed to find a collision but if we pick random inputs and compute theith high probability long beforning 2+1 inputs In fact ifwe randomly choose just 2+1 inputs, it turns out theres a 99
8% chance that at least two of themare going to collide The fact that we can find a collision by only examining roughly the square root ofbility known as the birthdayparadox In the homework questions at the end of this chapter we will examine this in more detacollision-detection algorithm works for every hash function But, of course, the problem with it isong ti6tpnction 2250+1 times in the worst case and about 22 tiputer calculates 10,000 hdde bybeof tlodds that tlfoundds that the earthmeteore have thus seen a generalpractical algorithm to find a collisy hash function Auld be useddeh theision deteithnot feasible to use, there still may be some other algorithm that can efficiently find a collision for afic hash functConsider for
H(x)=x modfunction meetss of a hash function as it as inputs offixed sized output(256 bits), and is efficiently computable But this function also has an efficientmethod for finding a collision Notice that this function just returns the last 256 bits of the input Onewould be the valuesgeneric collision detection method is not usable in practice there are at least some hash functions foron method does existsuch measistant However there are no hasctions proven to besistant The cryptographichash functions that we rely on in practice are just furwhich pd collisions and haven 't vet succeeded In some cases such as the old md5 hash functionntually found after yehe function to be deprecated andphased out of practical use And so we choose to believe that those are collision resistantge digests Now that we know what coWhat is colltance useful for HeIf we know that two inputs x andollision-resistant hash function h have different hashes then it 's safe to assume that x and y aredifferent- if someone knew an x and y that were different but had the same hash that would violateour assumption that h is collision resistantis argument allows us to use hash outputs as a message digest Consider Secure Box,authenticated online file storage system that allows users to upload files and ensure their integritythat afile, and wants to be ablerifyater that the file she downloads is the same as the one she uploads
one way to do that would be tosaveo the file she downloads while this workss integrity, she can just use the local copy directlyCollision-free hashes pt and efficient solutiAlice iust needs toomputes the hash of the downloaded file and compares it to the one she stored If the hashes arede thatdeed the one she uploaded, but if they are differenthen Alice can conclude that the file has been tampered with remembering the hash thus allows herdetect accidental corruption of the file during transmission or on Secure Box's servers but alsontentional modification of the file by the server Such guarantees in the face of potentially maliciousbehavior by other entitiesThe hash serves as a fixed length digest or unambiguous summary of a message this gives us a veryefficient way to remember things we' ve seen before and recognize them again whereas the efile might have been gigabytes long the hash is of fixed length, 256-bits for the hle This greatly reduceser and throughout thebook, we'll see applications for which it's useful to use a hash as a message dige
Property 2: Hiding The second property that we want from our hash functions iss hidinghiding property asserts that if we re given the output of the hash function y H(x), theres no feasiay to figthe inputform Consider the following simple example: we re going to do an experiment where we flip a coire result of the coin flip was heads were going to announce the hash of the string"headsfigure out what the string was that was hashed (we 'll soon see why we might want to play games liketring"tails", and they could see which one they were given Andust a couple steps, they caless what the string was because there were only two possible values ofx, and it was easy for the adversary to just try both of them in order to be able to achieve the hidingperty it needs to be the case that there s no value of x which is particularly likely thaat' s, in some sense, very spread out If x is chosen from such a set, this methodThe big question is: can we achieve the hiding property when the values that we want do not comefrom a spread out set as in orrhaput thatad out byspread out We can now be slightly more precisewe meag(the double vertiadenotes concatenation)Hiding a hash fundhiding if: when a secret value r is chosen from a probability distributionhat has high min-entropy then given h(r/x) it is infeasible to find xformation-theory, min-entropy is a measure of how predictable an outcd higlmin-entropy captures the intuitive idea that the distributionom the distribution thehat s likely to occur So, for a concrete example if r is chosen uniformly from among all of the stringsat are 256 bitsstring was chosen with probability 1/2256, wlApplication: Commitments
Now lets look at an application of the hiding property In particular, whato dod a commitment a commitment is the digitaof taking aen you do that, you' ve committed yourself to what's inside the envelope But you haven 't openeven though you've committed to a value the value remains a secret from everyone else Later,you can open the envelope and reveal the value that you committed to earlie
sts of(com): =commit(msg, key) The commit function takes a message and secret key as inputd returns a commitmeValid verify(com, key, msg) The verify function takes a commitment key, and messageas input It returns true if the ceWBinding: For any value of key, it is infeasible to find two messages, msg and msg'such thamsg msgand verify(commit(msg, key), key, msg)==trueeme, one commits to a value and publishes the commitmertage is analogous to putting the sealed envelope on the table at a later point if they want to reveay publish the key key and the value, msgcan verify that msg was indeed the value committed to earlier this stage is analogous to opening upe envelopeerties dictate that thvelope First, given com, the commitment, someone looking at the envelope cae message is The second property is that it's binding That when you commit to what's in thevelope, you cant change your mind later
That is, it's infeasible to find two different messages, suchat you can commle message, and then later claim that you committed to anotherSo how do we know that these two properties hold? Before we can answer this, we need to discussent scheme We can do so using a cryptograowing commitment schemcommit(msg): =(H(key /msg), keyo where key is a random 256-bit vi· verify(com, key msg):= true if H(key∥com; false otherwiseenerate a random 256-bihich will serve as the ked then we return the hash of thehe message and they re going tomequal to the commitment that they sawofhe instantiation of commit and verify as well as h(key l msg) for com, then these propertiesHkey∥msg, infeasib|eBinding: For any value of key, it is infeasible to find two messages, msg and msg such thatmsg≠ msg'and H(key∥msg)=H(key∥msg
The hiding property of commitments is exactly the hiding property that we required for our haslfunctions If key was chosen as a random 256-bit value then the hiding property says that if we hafeasiblehe message from the hashput And it turns out that the binding property is implied by the collision-resistant property of thehash functithe hash functioon-resistant then it will be infeasible toinct values msg and msg' such that h(key msg) =h(key msg) since such values woulde, if h is a hash function that is collisstant and hiding this commitment schee sense that it will have the necessary security propertiesProperty 3: Puzzle friendliness The third security property weto need from hash functions isuzzle- friendly
is a btedfirst ewhatproperty is usefulPuzzle friendliness a hash function H is said to be puzzle- friendly if for every possible n-bit outputif k is chosen from a distribution with high min-entropy then it is infeasibley in time significantly less than 2tuitively,this means is that ifts to target the hash function to come out to someparticular output value y that if there's part of the input that is chosen in a suitably randomized wayt’difffinder value that hits exactly that targetApplication: Search puzzle Now,property In this application, we're going to build a search puzzle a mathematical problem whiche in order to find theshortcuts That is there's no way to find a valid solution other than searching that large spaceSearch puzzle Aonsists ofthe puzzle-/D), chosench thatH(id‖x)∈t's possible that yofind a colthe form hike∥H(key∥