Block production pause during PoW


#1

This has been mentioned a few times in the TG miner group but I wanted to get it discussed here as well.

as I understand it currently (please correct me if I’m wrong)

As we reach the end of a round e.g. the PoW starts and all miners will again try to qualify to be a shard miner and everyone will try to qualify for the 1 DS committee spot. Currently, due to lack of many miners, the difficulty is very low so the mining process concludes relatively quickly e.g. 30-90 seconds.

during this PoW window, no new blocks are produced and the whole network is paused until new miner shards are created, and then it will resume.

as I understand it the aim is to eventually have this window be around 5 minutes long for proper PoW. For a blockchain that advertises itself as a high throughput blockchain having a 5-minute pause, every DS round does not sound very good.

of course, this can be mitigated to some degree by making each round containing more blocks but that still does not change the fact that the network will pause for up to 5 min.

Naive question/suggestion,

is it not possible to start the PoW X blocks before the end of the round? so everything is ready and new lists of miners are ready to take over with minimum pausing?


#2

This sounds pretty critical indeed. I have not followed all the discussions lately - Are there any noteable statements of the Zilliqa team of how to mitigate the issue?


#3

I think so, this will become stopper for real world dapp, zilliqa team need to find new algorithm to solved this


#4

There was a discussion with Amrit on the official Telegram channel at the beginning of december:

"TN:
Doesn’t this mean that every hour the networks stands still for 5 minutes? Is there no better solution as this doesn’t seem ideal.

Amrit Kumar:
This is certainly not ideal. However, the network needs enough time to be synchronized. Without that, the network may end up with nodes that have different views. We take advantage of this vaccuous epoch to distribute rewards. An ideal solution would be to ensure a cross-shard consensus that will gurantee that every node in the entire network has seen every transaction and has updated the global state. But, the protocol required to do this becomes very involved.

Digamma 889:
Okay. Thanks for the explanation. Are there any theories or plans to realize such cross-shard consensus in terms of implementing such a solution?

Amrit Kumar:
You may refer to the Chainspace/Omniledger paper that contains some idea. But, I am not sure how well it will do in practice"


#5

For Gaming DApps you could use layer 2 solutions but for payments you’d probably want these transactions onchain. In the worst case a wait of several minutes (however explainable) could hinder a proper fit for payments DApps I think.


#6

I guess the hard question is what is acceptable at layer 1. Right now, TX block production max time is ~2 mins. If you throw in the Vacuous epoch (~2 mins) + PoW window (~5 mins) + first TX epoch (~2 mins), the max time for one transaction confirmation in the worst case scenario will be ~9 mins.

Will you tolerate this for a settlement layer in a decentralized network? Is there a need to decrease it further and what does decreasing this time window do to the network? Could all these issues be mitigated fully by implementing a layer 2 solution that splits the payment and settlement layer, as do regular payment systems we have right now with credit cards and banks?

So what are the parameter can be tweaked and the implications of lowering them?

  • TX epoch - If we decrease the TX epoch from ~2 mins to ~1 mins, we risk the network not syncing in time. If that happens, we will run into risk of having multiple regular view changes as nodes does not agree with each other.
  • PoW Window - If we decrease the PoW window, regular folks who do not have a killer rig might not be able join the network as they can’t find the PoW solution that is above the difficulty threshold within the time window.

What are some things we could work together in future to solve then?

  • Cross-shard consensus - This is extremely hard to do as nodes have to do heavy communications among shards, and it might force the protocol to have to be synchronous at all times. This is not ideal as cross-region latency in real life does not permit that.
  • Mining proxy - Regular folk can pool their mining resources together to supply PoW solutions via RPC to a computer that is running as a node in Zilliqa network. This way, we introduce some centralization to prevent cutting them off totally, but we can reduce the PoW window significantly to say ~2 mins.

P.S. Not speaking as an authority, but as a regular folk among you all.


#7

There was an answer regarding this question from Amrit today via the telegram channel:

Amrit Kumar: There are couple of ways to fix this issue and we plan to explore them soon. The reason that there is a downtime because the network (all the shards) is being reshuffled entirely after every hour or so.

The first solution to the problem is to start doing PoW 5 minutes before the actual end of the DS epoch. The advantage of this approach is that there is no downtime now. The disadvantage is that the DS will have to do validation of these PoW submissions along with other things. The same holds for nodes which are in the current network configuration and wish to be part of the network post shuffle. In short, the end effect could be a slight drop in the TPS during this period.

The second solution is not to reshuffle the entire network in one go but to do so gradually. For instance, you could remove 10 shard nodes every few minutes and get new ones to replace them.

The last but not the least solution would be to do some Optimisations to reduce this 5 mins window.


#8

Good to hear they are thinking about it…

#1 I think is the best option, even if it does increase workload on the nodes.

#2 you would have to be done very carefully. because you do not want to change the PoW cycle, the whole idea is to have this big PoW event that will shuffle things around and only have one such even every epoch, because you dont want to have more of those because events because then it would be possible to use the same GPU to become a miner multiple times…


#9

The problem that you are trying to highlight with the second solution is valid.