Ethereum DeAnonymisation Techniques

7 minute read

Published:

Anonymize Ethereum with Bitcoin methods

Ethereum Architecture

  • Ethereum stores state and account balance directly
  • Smart contracts
    • Each smart contract is identified by an Ethereum address
    • Written in solidity
  • Blockchain

    • Content in a block
      • Block header - contain the hashes of the transaction trie root and the sibling list
      • a trie data structure contain transaction
      • a list of block headers for siblings of the block’s parent
  • P2P network
    • Based on Kademlia P2P distributed hash table (DHT)
    • Uses nodeID to identify node which is the node’s public key
    • Each node keep track of the peers as Kademlia protocol
    • Process of maintaining peers
      1. In rows, where each row $i$ contains peers whose nodeID has the same first $i$ bits as the node itself.
      2. Each row, node maintain exactly $k$ peers in that row
      3. A client discover more peers using findnode query with own nodeID
      4. The recipient will received nodeID to select to which of its peers it will forward the query node
      5. Perform XOR on sha3 hashes of the sender’s nodeID and each of the nodeID of the known peers for this node.
      6. 16 results are forwarded to the querying node
      7. The querying node recursively queries the newly discovered peers to find more peers until no new peers are discovered

Existing attack on Bitcoin

  • Link addresses to IP

    • Deanonymized the node through the entry node
    • Require tremendous amount of network capacity
    • Only feasible for large cooperation and government
    • Might not be feasible in the future as the network of Bitcoin get bigger
  • Cluster different Bitcoin addresses

    • Scarper - Crawl the web for Bitcoin addresses
    • Block parser - Store all Bitcoin into a database and then cluster group based on two heuristics
      • Assume multiple input in a single transaction are from the same user
      • Assume the change address and the input address are from the same user
      • Need big computational power

Transfer attack to Ethereum

  • Link addresses to IP
    • Ethereum do not have certain static entry node
    • Connection between peers are based on distance are more volatile
    • Each node has mode than 8 connections
    • May need to user different method to identify entry/peer node
  • Cluster Ethereum addresses
    • Multiple-input transaction doesn’t exist in Ethereum
    • Due to the lack of UTXO
    • Deanonymize based on transaction is not possible
    • Scraping function in BitIodine can be applied to Ethereum with addresses that available online

Graph analysis on Ethereum

Type of Analysis

  • Degree Distribution - Fraction $P_k$ of nodes with degree $k$
  • Clustering Coefficient - Average of local clustering coefficient over all nodes with degree larger than one
  • Assortative coefficient - Correlation between the features of connected non-identical node pairs. The feature used is the number of upstream and downstream edges
  • Pearson Coefficient - Measures the strength and direction of linear relationships between pairs of continuous variables. Evalueate indegree and outdegree of node.

  • Strongly Connected Component (SCC) - For every node $u$ and $v$, there a directed path from $u$ to $v$ and from $v$ to $u$
  • Weakly connected component (WCC) - For every node $u$ and $v$, there a undirected path from $u$ to $v$ and from $v$ to $u$. Direction is ignored in WCC
  • Importance
    • Degree centrality - Considers its degree
    • Page Rank - Consider the importance of neighbors
  • Ether between two accounts (ETA) - Average amount of ETH transferred per transaction between two account.
  • Common edges - Set of common edges and number of common edges between different graph
  • Evolution - Measure the matrix mentioned above and plot it over time

    MFG Analysis

  • Clustering coefficient
    • 0.12 - meaning that if two accounts A and B trade with C, A and B is likely to trade with each other.
  • Assortativity coefficient
    • Approaches to 0
    • Number of degree and if there exists an edge between two node is not correlated. One doesn’t implies another.
    • Assumption that money flow is driven by the users.
  • Pearson coefficient
    • 0.45 which is moderately large
    • Indicate that node with large indegree likely to have a large outdegree
  • SCC/WCC
    • Largest SCC contains 75% nodes
    • Those node maybe exchange markets
    • Number of SCC > WCC
  • PageRank

    • Top 10 most important nodes in MFG
    • We know all of the top 10’s nodes identity
  • Degree of centrality
    • 8/10 are exchange markets
    • one are name service
    • one are mining pool
  • ETA

    • CDF plot of Ether per transaction between two accounts
    • 63.3% transfer no more than 1 Ether
    • 80.6% transfer no more than 10 Ether
    • Turning point is cause by transaction with large value is exactly 100 and 1K Ether
  • Common edges
    • Common edges between MFG $\cap$ CCG / MFG is 0.005
    • Ether transfer are not from contract creators to their created contract
  • Evolution Analysis
    • Number of node and edges increase 📈, because more Ether transferred over time ↗️
    • SCC become larger overtime. SCC consist of large exchange market
    • Number of SCC increase over time
    • Number of WCC remain relatively stable
    • Some WCC merge because it’s sent money to exchange institute

      CCG Analysis

  • Degree distribution
    • CCG follows the power law
    • Few nodes create the large number of smart contracts
  • Clustering coefficient
    • 0 because no two contracts are created by the same node
  • Assortativity Coefficient
    • -0.29
    • Large-degree more often connect with small degree node
    • Not many contract create other contract
  • SCC/WCC
    • Largest SCC has only one node - because no cycle
    • Largest WCC is 1,501,271
      • Which is 17.2% of all contract
      • WCC larger than 10 is 5,554 which is 7% of all WCC
  • Degree centrality

    • PageRank - cannot provide useful info
  • Common edges
    • Proportion of CCG $\cap$ CIG / CCGis low 0.26
    • 74% of smart contract are not invoked by their creators.
    • Smart contract aim to server others
  • Evolution Analysis
    • Node and edges increase 📈
    • More smart contract deployed 🚢

      CIG

  • Degree distribution
    • Follow power law ⚖️
    • Most account invoked very few contract
  • Clustering Coefficient
    • Approach 0
    • If A → B and A → C. B and C unlikely to call each other
    • B and C are independent
  • Assortativity Coefficient
    • Approach 0
    • No relation between between B and C which is connected by A
  • Pearson Coefficient
    • Don’t consider EOA
    • 0.01 implies weak correlation between indegree and outdegree
  • SCC/WCC
    • Number of SCC > number of smart contract
    • Implies many community structures where node intensely connect with each other
  • PageRank

    • Top 10 are smart contracts
    • 4/10 is token contract
    • 1 is gambling application
  • Degree centrality

  • Evolution Analysis
    • More contract are being invoked over time

      Application based on graph analysis

  • Attack forensics

    • Given a malicious smart contract
    • Find accounts controlled by the attacker
    • Create a WCC from CCG to collect all the contract
    • For each node in WCC we located callers in CIG
    • List EOA that that invoked all the malicious contract
  • Anomaly detection
    • Detect abnormal contract creation
    • Input: $x$(account), MFG, CCG, CIG and three threshold
    • Obtain all the contract created by $x$
    • Consider the number we edges and weights
    • Contract in WCC rarely used both in money transfer and contract invocation
  • Deanonymization

    • Use WCC with CCG
    • Root is a EOA
    • Processing the comment and string using NLP