Ethereum Overview and Privacy Attacks

8 minute read

Published:

Ethereum

Accounts

  • 20-byte address
  • A state -> state’ = transaction of information or value between account
  • Contain 4 fields
    • Nonce: Counter that make sure the transaction can be processed once
    • Ether balance
    • Contract code (for contract account)
    • Storage
  • Type of accounts
    • Externally owned accounts
      • Controlled by private keys
      • Can send messages by creating and signing transaction
    • Contract accounts
      • Controlled by contract code
      • Code is activate when message is received
      • Code could read/write to internal storage or create a message or contract in return

        Contract

  • An autonomous agent the live inside the Ethereum environment
  • Execute a specific code when “triggered”

    Transaction and message

  • Transaction- A sign data package that store a message to be sent from an externally owned account
  • STARTGAS and GASPRICE are to prevent infinite loop by limiting the number of computation steps
  • gas is dependent on the amount of computation and capacity of data
  • Transaction contain
    • Recipient message
    • Signature of the sender
    • Amount of ether to be transfer
    • Data field(optional)
    • STARTGAS - Maximum allowed computation steps
    • GASPRICE - Fees pays by the sender per computational steps(gas)

      Messages

  • Messages is like transaction except it is produced by a contract
  • Messages contain the sender, recipient, amount of ether being sent, data field(optional), STARTGAS

    Ethereum state transition function

  • Validate the transaction
  • Calculate the transaction fee STARTGAS * GASPRICE
  • Subtract the fees from the sender’s account balance and increment the sender’s nonce
  • Transfer the transaction value from sender to recipient.
    • If recipient don’t exist. Create a new account
    • If it’s a contract account, run the contract’s code until completion or run out of gas
  • If ran out of gas or the sender don’t have efficient fund. Revert all state changes except the payment of the fees.
  • Otherwise, refund the remaining gas to the sender.
  • Fees are send to the sender
  • Example of transition function (Send 10eth, 2000 gas, 0.001 ether gasprice and 64 bytes of data)
    • Data being sent:
      • byte[0:31] = 2
      • byte[32:63] = “CHARLIE”
      if !self.storage[calldataload(0)]:
        self.storage[calldataload(0)] = calldataload(32)
    
    • Validate the transaction
    • Check the sender has at least 2000 * 0.001 = 2 ether in it’s account
    • Subtract 2 ether from sender account
    • 2000 gas initialized
    • Assume transaction is 170 bytes long and byte-fee is 5.
    • 2000 - 850 = 1150
    • Subtract 10 eth from sender account (ether sent in the transaction)
    • Execute the code (Assume it took 187 gas)
      • Check of contract’s storage at index 2 is empty
      • If it empty, store string “CHARLIE”
    • Gas remaining 1150 - 187 = 963
    • Added 963 * 0.01 ether back to sender’s account
  • If there are no contract in the other end, the gas will just be gasPrice * length of the transaction in byte. The data sent will be ignored

    Code execution (More to come)

  • Each operation in the script can interact with stack, memory, and contract’s long term storage

    Blockchain and mining

  • Differ from Bitcoin is that Ethereum store Transaction list, most recent state, block number and difficulty are stored in the block
  • Differ from Bitcoin, the state information is part of the last block. There is not need to store entire block history
  • Block validation algorithm

    • Check if the previous block reference exist is valid
    • Check that the timestamp
    • Check block number, difficulty, transaction root, uncle root and gas limit
    • Check the PoW of the block is valid
    • For all $i$ in 0…$n+1$, set $s[i+1]=APPLY(S[i], TX[i])$
    • $S_{FINAL}$ is $s[n]$ with the block reward paid to the miner
    • Check if the Merkle tree root of the state $S_{FINAL}$ is equal to the final state root provided in the block header

      Application

  • Token systems
  • Identity and Reputation Systems
  • Decentralized File Storage

    Practical Deanonymization Attack in Ethereum

    Two Problem this paper is trying to solve

    • What coverage of Ethereum node that the attacker can make connections with
    • Adopted in Ethereum to infer the source node accurately

      Ethereum P2P network Primer

  • Fully distributed P2P network
  • Used Devp2p protocol
  • Terms
    • Ethereum Node Records
      • Node record consist of three parts. Signature, sequence number and the key/value pairs of node information. Node information contains IP, port and so on
      • Ethereum Node Records (ENRs) are a standardized format for network addresses on Ethereum.
    • Node Discovery Protocol
      • Based on Kademlia DHT for storage
      • For storage and retrieval Ethereum nodes
      • Each node has a cryptographic identity
      • Public key: Node ID
      • Private key: Sign Transactions
      • Logical distance: Number of XOR operation of nodeID hashes
      • Via UDP protocol
        • Ping and Pong: Detect node status
        • FindNode and Neighbors: Find node closest to the target
        • EnrRequest and EnrResponce: Request for node record
    • RLPx Transport Protocol
      • TCP-based protocol for information exchange between nodes
      • Purpose: Key exchange and protocol handshake
      • Key exchange: Diffie Hellman algo
      • Protocol handshake: Exchange Hello messages which contain protocol version, clientID, capabilities, listening port and nodeID
    • Application-level protocols
      • Ethereum Wire Protocol (eth)
        • Main protocol
        • Exchange status handshake after RLPx handshake
        • Status handshake contains protocol version, networkID, difficulty, current block hash, genesis block hash and forkID
      • Light Ethereum Subprotocol (les)
      • Parity Light Protocol (pip)
      • Light node like les and pip only download block header and other query information
      • Light node create transaction 🔨 but don’t participate in relay of transaction ❌

Address and transaction

  • Address
    • Externally owned address (EOA)
      • Generated by secp256K1
      • Control by private key :closed_lock_with_key:
    • Smart contract account address
      • Determined by the sender address + number of it’s generated transaction (Nonce)
      • Controlled by the contract code
  • Transaction
    • Initiated by the signature of EOA address
    • Types of transaction
      • Normal Transaction
        • EOA → EOA
      • Contract Deploying Transactions
        • EOA → zero-account
        • To deploy smart contract📃
      • Contract Executing Transactions
        • EOA → Deployed contract address

          Deanonymization of a P2P network

  • Node of a P2P network can either be the creator or the forwarder of a transaction
  • Identify the creator, the source node
  • Based on an assumption that a super node that is connected to all node is able to conclude which node is the source node

  • Connection to the Ethereum Node
    • Build ETHNodeFinder to find node on the Ethereum network
    • EthTXListener To have a view of the propagation of transaction by connecting to synced nodes
  • Infer the source node

    ### FirstReach Estimator

    • The minimum delay from node to supernode
    • Assume that the less hops result in shorter delay
    • Node with the shortest delay is the source
    • Assume that delay in one hop is always longer than two hop
    • Triangle inequality violations

    ### FirstSent Estimator

    • $\delta$ is the delay obtain by RTT(Round trip time)
    • Subtract delay from the time of arrival
    • Hard to estimate the delay

    ### ML-based Estimator

    • Some scenarios, FirstReach > FirstSent
    • Reduced to binary classification problem

      Experiment

  • P2P network
    • Run 40 instances on ethNodeFinder
    • 32K node found
      • 10K full nodes
      • 22K include non-synced node, forked nodes and light nodes
    • Client
      • Geth client: 89%
      • OpenEthereum client: 8%
      • Other: 3%
  • Connections coverage
    • 10 instances of ethTxListener deployed
    • Each instance maintains connection with not overlapping parts of node in entire network
    • More than 90% of nodes maintain connection with supernode
    • Broke
  • Deanonymization with basic estimators
    • Tested both on Ropsten testnet and Ethereum mainnet
    • ethTxPretender: generate transaction and sends then only to one selected node
    • Testnet
      • 300 randomly selected nodes
      • Sent 10 transaction to each node
      • Total: 3000 transaction
    • Mainnet
      • 100 randomly selected target
      • 1 transaction per node
      • Total: 100 transaction
    • Result

      • With $k$ = 10, two estimator have accuracy of 93% and 91% respectively

      FirstSent estimator is able to reduce the anonymous set of a transaction to 10 nodes (about 0.1% of all nodes on mainnet) with a success rate of 93%.

  • Deanonymization with ML-based estimator
    • Some time, FirstReach correct ✅ FirstSent failed ❌
    • Introduced ML classification that combined the two estimator
    • Based on observation, RandomForest Classifier seems to have the best result

    Testnet yield a better result and 88% in mainnet

    • ML features
      • Reach_Time_Diff: Time difference between the arrival timestamp of the node and the minimum arrival timestamp
      • Sent_Time_DIff: Time difference between the estimated sending timestamp of the node and the minimum estimated sending timestamp. Delay is based on TCP timestamp
      • Inst_Delay: Closest time that transaction arrives
      • Avg_Delay: Average delay of all delays measured within 10 seconds before and after the transaction arrived
      • Delay_STD: The standard deviation of all delays measured within 10 seconds before and after the transaction arrives
    • Result of different classifier
      ![](https://hackmd.io/_uploads/ryj2iyV93.png)
    
  • Comparison between the ML and basic estimators