How long does it take a new Bitcoin full node to sync to the blockchain in 2016? Does libsecp256k1 speed up the sync?

Bitcoin-Core version 12.0.0 shipped with a new heavily optimized elliptic curve math library. Prior to the release, there were whispers that the new library would significantly speed up the blockchain sync – a sync which was taking me 12-24 hours to complete. Excited for the potential speed up, I decided to benchmark the sync and talk about it on reddit live. Needless to say I was amazed at the difference. Thanks to the core developers for this!

How long does it take a new Bitcoin full node to sync to the blockchain? Does the recently merged libsecp256k1 speed up the sync?

687474703a2f2f692e696d6775722e636f6d2f6f62634c6a44442e706e67

These are good questions, so I wrote a small bash script that gives us some data about the sync time of the new release candidate, whose new optimized library for elliptic curve operations has been rumored to give up to a “700%” increase in the speed of verifying transactions. Disclaimer: it’s not wonderfully modular or well thought out; I did it quickly just to see how the performance was.

So let’s benchmark bitcoin-core with the new libsecp256k1 to see this speedup ourselves. Here is a script for making a log and getting a birds eye view of the sync process. Spoilers: it was really, really fast.

Motivation

This is the kind of thing I always thought about doing, but never did. It’s a simple thing to do, however it just takes time to write up a useful albeit simple script to give you the data you want. Not only that, but previously, when syncs were taking days to complete, this sort of test would eat up my machine.

Since I have been involved with bitcoin, I have known the user base to be (I love you guys) …well, prone to getting carried about with the fear of an impending doom of the BTC — especially if the inevitable doom will be brought upon by some physical limitations of the client, its capacity, and scalability.

In fact, back in 2012-2013, the total size of the blockchain was really all the rage, and the buzzword of the times was “blockchain pruning”. This involved lot of media focus on this problem, as well as efforts by developers and theorists alike to put forth a concerted effort to find a method for reduction of the very scary, problematic growth rate of the block chain!

That’s right everyone was talking about the block chain being too large on disk already, and growing too fast. How bloody ironic, eh.  I won’t say it.  Don’t even say it!

Over time I have built up a thick skin for these sorts of mobs of fear and uncertainty.  I just keep using my coins and really tend to take the doomsday scenarios with less than a grain of salt, because I have seen bitcoin overcome obstacle after obstacle, so many times that it becomes almost illogical to be anything but calm through it all, and feel as though we are in good hands with the devlopers (we are).

But node sync time has been a subtle concern to me for many months – not because I read it on www.scary-bitcoin-news.ninja, but because I noticed this sync time change over a long period of time and become a real hassle to me and running my node as a casual.

In early 2013, it took “pretty damn long” to bootstrap a new node. But I was able to run a full node that year, all year, on a mid level laptop, that I would wipe and re sync every few months.  But by the end of 2014 I was no longer running the laptop node and noticed long sync times on my main high performance monster of a desktop. I felt it took so much longer in fact, that I considered not even running the node, which would have really sucked for me, and put a big damper in the way I got used to used bitcoin. By the end of 2015 I was just trying to avoid a fresh node sync at all costs, which is hard as a man who re-installs his OS every few months. The sync would, on either of my computers, take between 36 and 96 hours for me, depending on how uninterrupted I could run the process and the state of my internet service, and other variables.




It dawned on me at some point that if I was dreading a sync, I couldn’t have been the only one. And so I did consider whether node sync time was a real deterrent to running a node.

However I decided to go for it because I thought if the figures I heard are true, and this library gives a significant speedup, it might require less than a day. And when I inquired in the IRC channel, I was told “could be as fast as 4 hours.” Hot damn – this I had to so. So I finally did it.  Result: Sp happy!  It finished in a blazing 4 hrs, 4 minutes, and 4 seconds!  Sweet victory for node operators and bitcoin as a whole!

This did truly make me very happy as it served my purposes and the community as a whole in no small part.  I was stoked, and to such a degree that had to spread the joy around the community, encouraging others to take note of the speedy sync time by benching it themselves.  I made a reddit post, and put the code on github, along with the data from my 4 hour full bootstrapping of bitcoin core.

Well anyway, I enjoyed this little treat from the devs, and hope y’all do as well.  Thanks guys! Long live the core!




Bitcoin-data manipulaton and plotting in python

 

plot.png - https://i0.wp.com/raw.githubusercontent.com/Altoidnerd/bitcoin-price/master/plot.png?resize=840%2C474&ssl=1
I wrote a python mini-api to manipulate and visualize the price of bitcoin historically using the coinbase version 1 api.

bitcoin-price (python module)

Get bitcoin price data from the coinbase API and parse it into an easily manipulated form.

>>> from fast_dump_v22 import *
>>> get_page(1)

Will return all the data on page 1 of coinbase’s bitcoin price API (this is the latest data).

You can almost always turn on optional debug statements

>>> get_first_N(3, show=True)
... fetching first 3

You can get all the price data that coinbase has

>>> get_all(show=True)
... getting page 1
... getting page 2
... getting page 3
... getting page 4
... getting page 5
... getting page 6
... getting page 7
... getting page 8
... getting page 9
... etc ...

All the ‘get_*’ functions return a price_data string, which is interlaced timestamps and prices littered with newlines and commas. You can print them to see what is going on more clearly:

>>> print(get_page(1))
2015-09-11T06:44:04-05:00,241.14
2015-09-11T06:34:04-05:00,240.8
2015-09-11T06:24:04-05:00,240.75
2015-09-11T06:14:05-05:00,240.68
2015-09-11T06:04:04-05:00,240.83
2015-09-11T05:54:05-05:00,240.92
2015-09-11T05:44:04-05:00,240.64
2015-09-11T04:34:04-05:00,241.27
2015-09-11T04:24:04-05:00,240.73
...

Turn on the optional show switch for printing large vectors

>>> prices(get_page(11), show=True)
... returning 11000 prices in specified range ...
['239.9',
'239.9',
'239.4',
'239.77',
'239.33',
'239.99',
'239.81',
'240.28',
'240.4',

You can use prices(data)[k] and timestamps()[j] to return the kth price in data, or the jth timestamp in data.

>>> data = get_page(1)
>>> prices(data)[4]
'241.2'
>>> prices(data, index=4)
'241.2'

are two equivalent ways of returning only the 4th price in the requested range (in this case, page 1). This also works for timestamps.

>>> timestamps(get_page(1)+get_page(2))[1166] == timestamps(get_first_N(2), index=1166)
True

This shows the expressiveness of this module. In general:

>>> prices(get_page(2)) == parse(get_page(2))[0]
True

prices() and timestamps() are just functions that return a parsed() object having a specific index, or indices.

>>> parse(get_page(1)+get_page(2)+get_page(3))[0] == prices(get_first_N(3))
True
>>> parse(get_page(2)+get_page(3))[0] == prices(get_range(2,3))
True

The parse() function is there to manually control the outputs instead of just getting prices, or timestamps

>>> x = parse(get_page(1))
>>> x[0][0]
'241.2'
>>> x[0][1]
'241.14'
>>> x[1][1]
'2015-09-11T04:34:04-07:00'

As you can see, parse(price_data)[0][k] returns the kth price in the list. Indices [1][k] return the kth timestamp.

The parse() function takes care of some weird edge cases:

>>> get_first_N(3) == get_page(1)+get_page(2)+get_page(3)
False
>>> parse(get_first_N(3)) == parse(get_page(1)+get_page(2)+get_page(3))
True
>>> x = get_page(1)
>>> y = get_range(2,7)
>>> prices(get_first_N(7)) == prices(x+y)
True

In general,

OPERATOR( get_page(1) + get_page(2) + ... + get_page(k) ) == OPERATOR(get_first_N(k))

where OPERATOR is parsed(), prices(), or timestamps(). We also know prices() can obviously display and return ranges of values. When returning large vectors, you can verify their length by setting show=True. The “show” parameter is optional for all get_* functions and provides some information about the operation being performed.

>>> print( prices(get_first_N(11), show=True) )
... returning 11000 prices in specified range...

since each page is a thousand pairs of values (timestamp, price).

>>> len(prices(get_page(2))
1000
>>> prices(get_page(2)) # returns a long list
['230.11',
'229.64',
'230.04',
'229.71',
'229.69',
'229.74',
'229.92',
'229.43',
'229.43',
'229.41',
...

fast_dump_v1* are older versions that are somewhat different. They are designed to store the fetched data in the .data directory. This in v2*, this was abandoned in favor of stdout redirection.
// 2015 altodinerd.com

Can you represent a sine curve using sawtooth waves?

Image

It occurred to me when looking at this picture that there may be a time in a laboratory setting when representing a sine curve with sawtooth waves might be useful.  Can it be done?

The answer is yes – for the impatient, here is the plot I eventually made. The recipe is given below…

saw

as was explained to me in the subreddit /r/puremathematics by reddit user /u/Gro-Tsen as follows:

First, let me discuss how one can formally compute the coefficients expressing a sine wave as a sum of sawtooth waves: assume we have a formal sum

f(x) = g(x) + c₂·g(2x) + c₃·g(3x) + c₄·g(4x) + …

where c₂,c₃,c₄,… are known coefficients, and we want to invert this to find a similar expression for g in function of f,

g(x) = f(x) + d₂·f(2x) + d₃·f(3x) + d₄·f(4x) + …

(our goal is to compute the d coefficients in function of the c's).

This can be done inductively as follows: assuming the N−1 first d's are known (starting with just d₁=1), truncate the expression of g to the first N terms (leaving the N-th d coefficient, d[N], as an unknown) and substitute this in the first expression, then equate the first N coefficients: clearly this will give an equation determining the unknown d[N] in function of the known ones and the c's, in fact, for all N>1 this gives

d[N] = − sum(d[i]·c[N/i]) where i ranges over divisors of N (including i=1, for which d₁=1, but excluding i=N)

so we can compute

  • d₂ = −c₂
  • d₃ = −c₃
  • d₄ = −c₄+(c₂)²
  • d₅ = −c₅
  • d₆ = −c₆ + 2·c₂·c₃

and so on (for any prime number, d[p] = −c[p] as is clear from my inductive formula).





Now we can try the above purely formal method in the case where g(x) = sin(x) and f(x) is the sawtooth wave defined by f(x)=x/2 for −π<x<π. We have

f(x) = sin(x) − sin(2x)/2 + sin(3x)/3 − sin(4x)/4 + …

in other words c[i] = (−1)i+1/i and we can compute the d's from the above process:

1, 1/2, -1/3, 1/2, -1/5, -1/6, -1/7, 1/2, 0, -1/10, -1/11, -1/6, -1/13, -1/14, 1/15, 1/2, -1/17, 0, -1/19, -1/10, 1/21, -1/22, -1/23, -1/6, 0, -1/26, 0, -1/14, -1/29, 1/30, -1/31, 1/2, …

so we should have sin(x) = f(x) + f(2x)/2 − f(3x)/3 + f(4x)/2 − f(5x)/5 − f(6x)/6 − f(7x)/7 + f(8x)/2 − f(10x)/10 − … (where, again, f(x) is x/2 − π·floor((x+π)/(2π))).

Unfortunately, this reasoning was completely formal and does not say anything about convergence. I don't think one can reasonably expect convergence a.e. or L² convergence, because one can easily see that d[2n] is always 1/2, for any n>0, so the d[i] don't even tend to zero! Still, there's probably some weak sense in which the series converges (e.g., I'm pretty sure it converges as distributions), but since I'm an algebraist and not an analyst I'll just leave it at that.




Well “terrific,” I thought.   But does it really work?  /u/Gro-Tsen warned us that it would not converge, and he was correct.  I fired up Mathematica and generated the following image with 100 terms of the expansion /u/Gro-Tsen provided, displaying the weird convergence (code below). I still don’t know if its the convergence of the series causing this effect, or the built in machine representation of a sawtooth in Mathematica.

The coefficients were pulled from OEIS sequence A067856 with each entry divided by it’s index.  Here is the Mathematica code for the plot:

Tooth[x_] := SawtoothWave[(x - Pi)/(2*Pi)] - .5

Plot[Tooth[x], {x, -2*Pi, 2*Pi}]  (* this will verify that the function Tooth[] is valid *)

BigArr =

{1, 1, -1, 2, -1, -1, -1, 4, 0, -1, -1, -2, -1, -1, 1, 8, -1, 0, -1, -2, 1, -1, -1, -4, 0, -1, 0, -2, -1, 1, -1, 16, 1, -1, 1, 0, -1, -1, 1, -4, -1, 1, -1, -2, 0, -1, -1, -8, 0, 0, 1, -2, -1, 0, 1, -4, 1, -1, -1, 2, -1, -1, 0, 32, 1, 1, -1, -2, 1, 1, -1, 0, -1, -1, 0, -2, 1, 1, -1, -8, 0, -1, -1, 2, 1, -1, 1, -4, -1, 0, 1, -2, 1, -1, 1, -16, -1, 0, 0, 0}

BigArray = BigArr*(Array[1/# &, 100, 1])

Plot[Array[Tooth[# x] &, 100, 1].BigArray, {x, -2*Pi, 2*Pi},
ImageSize -> 1800]

Donate Bitcoins

The cost of artificially pumping a low volume altcoin: pumping the alt markets by yourself with the BTC/LTC “pump machine” strategy