Ethereum DeAnonymisation Techniques
Published:
Anonymize Ethereum with Bitcoin methods
Ethereum Architecture
- Ethereum stores state and account balance directly
- Smart contracts
- Each smart contract is identified by an Ethereum address
- Written in solidity
Blockchain
- Content in a block
- Block header - contain the hashes of the transaction trie root and the sibling list
- a trie data structure contain transaction
- a list of block headers for siblings of the block’s parent
- Content in a block
- P2P network
- Based on Kademlia P2P distributed hash table (DHT)
- Uses nodeID to identify node which is the node’s public key
- Each node keep track of the peers as Kademlia protocol
- Process of maintaining peers
- In rows, where each row $i$ contains peers whose nodeID has the same first $i$ bits as the node itself.
- Each row, node maintain exactly $k$ peers in that row
- A client discover more peers using
findnode
query with own nodeID - The recipient will received nodeID to select to which of its peers it will forward the query node
- Perform XOR on sha3 hashes of the sender’s nodeID and each of the nodeID of the known peers for this node.
- 16 results are forwarded to the querying node
- The querying node recursively queries the newly discovered peers to find more peers until no new peers are discovered
Existing attack on Bitcoin
Link addresses to IP
- Deanonymized the node through the entry node
- Require tremendous amount of network capacity
- Only feasible for large cooperation and government
- Might not be feasible in the future as the network of Bitcoin get bigger
Cluster different Bitcoin addresses
- Scarper - Crawl the web for Bitcoin addresses
- Block parser - Store all Bitcoin into a database and then cluster group based on two heuristics
- Assume multiple input in a single transaction are from the same user
- Assume the change address and the input address are from the same user
- Need big computational power
Transfer attack to Ethereum
- Link addresses to IP
- Ethereum do not have certain static entry node
- Connection between peers are based on distance are more volatile
- Each node has mode than 8 connections
- May need to user different method to identify entry/peer node
- Cluster Ethereum addresses
- Multiple-input transaction doesn’t exist in Ethereum
- Due to the lack of UTXO
- Deanonymize based on transaction is not possible
- Scraping function in BitIodine can be applied to Ethereum with addresses that available online
Graph analysis on Ethereum
Type of Analysis
Degree Distribution
- Fraction $P_k$ of nodes with degree $k$Clustering Coefficient
- Average of local clustering coefficient over all nodes with degree larger than oneAssortative coefficient
- Correlation between the features of connected non-identical node pairs. The feature used is the number of upstream and downstream edgesPearson Coefficient
- Measures the strength and direction of linear relationships between pairs of continuous variables. Evalueate indegree and outdegree of node.Strongly Connected Component (SCC)
- For every node $u$ and $v$, there a directed path from $u$ to $v$ and from $v$ to $u$Weakly connected component (WCC)
- For every node $u$ and $v$, there a undirected path from $u$ to $v$ and from $v$ to $u$. Direction is ignored in WCCImportance
- Degree centrality - Considers its degree
- Page Rank - Consider the importance of neighbors
Ether between two accounts (ETA)
- Average amount of ETH transferred per transaction between two account.Common edges
- Set of common edges and number of common edges between different graphEvolution
- Measure the matrix mentioned above and plot it over timeMFG Analysis
- Clustering coefficient
- 0.12 - meaning that if two accounts A and B trade with C, A and B is likely to trade with each other.
- Assortativity coefficient
- Approaches to 0
- Number of degree and if there exists an edge between two node is not correlated. One doesn’t implies another.
- Assumption that money flow is driven by the users.
- Pearson coefficient
- 0.45 which is moderately large
- Indicate that node with large indegree likely to have a large outdegree
- SCC/WCC
- Largest SCC contains 75% nodes
- Those node maybe exchange markets
- Number of SCC > WCC
PageRank
- Top 10 most important nodes in MFG
- We know all of the top 10’s nodes identity
- Degree of centrality
- 8/10 are exchange markets
- one are name service
- one are mining pool
ETA
- CDF plot of Ether per transaction between two accounts
- 63.3% transfer no more than 1 Ether
- 80.6% transfer no more than 10 Ether
- Turning point is cause by transaction with large value is exactly 100 and 1K Ether
- Common edges
- Common edges between MFG $\cap$ CCG / MFG is 0.005
- Ether transfer are not from contract creators to their created contract
- Evolution Analysis
- Number of node and edges increase 📈, because more Ether transferred over time ↗️
- SCC become larger overtime. SCC consist of large exchange market
- Number of SCC increase over time
- Number of WCC remain relatively stable
- Some WCC merge because it’s sent money to exchange institute
CCG Analysis
- Degree distribution
- CCG follows the power law
- Few nodes create the large number of smart contracts
- Clustering coefficient
- 0 because no two contracts are created by the same node
- Assortativity Coefficient
- -0.29
- Large-degree more often connect with small degree node
- Not many contract create other contract
- SCC/WCC
- Largest SCC has only one node - because no cycle
- Largest WCC is 1,501,271
- Which is 17.2% of all contract
- WCC larger than 10 is 5,554 which is 7% of all WCC
Degree centrality
- PageRank - cannot provide useful info
- Common edges
- Proportion of CCG $\cap$ CIG / CCGis low 0.26
- 74% of smart contract are not invoked by their creators.
- Smart contract aim to server others
- Evolution Analysis
- Node and edges increase 📈
- More smart contract deployed 🚢
CIG
- Degree distribution
- Follow power law ⚖️
- Most account invoked very few contract
- Clustering Coefficient
- Approach 0
- If A → B and A → C. B and C unlikely to call each other
- B and C are independent
- Assortativity Coefficient
- Approach 0
- No relation between between B and C which is connected by A
- Pearson Coefficient
- Don’t consider EOA
- 0.01 implies weak correlation between indegree and outdegree
- SCC/WCC
- Number of SCC > number of smart contract
- Implies many community structures where node intensely connect with each other
PageRank
- Top 10 are smart contracts
- 4/10 is token contract
- 1 is gambling application
Degree centrality
- Evolution Analysis
- More contract are being invoked over time
Application based on graph analysis
- More contract are being invoked over time
Attack forensics
- Given a malicious smart contract
- Find accounts controlled by the attacker
- Create a WCC from CCG to collect all the contract
- For each node in WCC we located callers in CIG
- List EOA that that invoked all the malicious contract
- Anomaly detection
- Detect abnormal contract creation
- Input: $x$(account), MFG, CCG, CIG and three threshold
- Obtain all the contract created by $x$
- Consider the number we edges and weights
- Contract in WCC rarely used both in money transfer and contract invocation
Deanonymization
- Use WCC with CCG
- Root is a EOA
- Processing the comment and string using NLP