Ethereum Overview and Privacy Attacks
Published:
Ethereum
Accounts
- 20-byte address
- A state -> state’ = transaction of information or value between account
- Contain 4 fields
- Nonce: Counter that make sure the transaction can be processed once
- Ether balance
- Contract code (for contract account)
- Storage
- Type of accounts
- Externally owned accounts
- Controlled by private keys
- Can send messages by creating and signing transaction
- Contract accounts
- Controlled by contract code
- Code is activate when message is received
- Code could read/write to internal storage or create a message or contract in return
Contract
- Externally owned accounts
- An autonomous agent the live inside the Ethereum environment
- Execute a specific code when “triggered”
Transaction and message
Transaction
- A sign data package that store a message to be sent from an externally owned accountSTARTGAS
andGASPRICE
are to prevent infinite loop by limiting the number of computation stepsgas
is dependent on the amount of computation and capacity of data- Transaction contain
- Recipient message
- Signature of the sender
- Amount of ether to be transfer
- Data field(optional)
- STARTGAS - Maximum allowed computation steps
- GASPRICE - Fees pays by the sender per computational steps(gas)
Messages
Messages
is like transaction except it is produced by a contract- Messages contain the sender, recipient, amount of ether being sent, data field(optional),
STARTGAS
Ethereum state transition function
- Validate the transaction
- Calculate the transaction fee
STARTGAS
*GASPRICE
- Subtract the fees from the sender’s account balance and increment the sender’s nonce
- Transfer the transaction value from sender to recipient.
- If recipient don’t exist. Create a new account
- If it’s a contract account, run the contract’s code until completion or run out of gas
- If ran out of gas or the sender don’t have efficient fund. Revert all state changes except the payment of the fees.
- Otherwise, refund the remaining gas to the sender.
- Fees are send to the sender
- Example of transition function (Send 10eth, 2000 gas, 0.001 ether gasprice and 64 bytes of data)
- Data being sent:
- byte[0:31] = 2
- byte[32:63] = “CHARLIE”
if !self.storage[calldataload(0)]: self.storage[calldataload(0)] = calldataload(32)
- Validate the transaction
- Check the sender has at least 2000 * 0.001 = 2 ether in it’s account
- Subtract 2 ether from sender account
- 2000 gas initialized
- Assume transaction is 170 bytes long and byte-fee is 5.
- 2000 - 850 = 1150
- Subtract 10 eth from sender account (ether sent in the transaction)
- Execute the code (Assume it took 187 gas)
- Check of contract’s storage at index 2 is empty
- If it empty, store string “CHARLIE”
- Gas remaining 1150 - 187 = 963
- Added 963 * 0.01 ether back to sender’s account
- Data being sent:
- If there are no contract in the other end, the gas will just be gasPrice * length of the transaction in byte. The data sent will be ignored
Code execution (More to come)
- Each operation in the script can interact with stack, memory, and contract’s long term storage
Blockchain and mining
- Differ from Bitcoin is that Ethereum store Transaction list, most recent state, block number and difficulty are stored in the block
- Differ from Bitcoin, the state information is part of the last block. There is not need to store entire block history
Block validation algorithm
- Check if the previous block reference exist is valid
- Check that the timestamp
- Check block number, difficulty, transaction root, uncle root and gas limit
- Check the PoW of the block is valid
- For all $i$ in 0…$n+1$, set $s[i+1]=APPLY(S[i], TX[i])$
- $S_{FINAL}$ is $s[n]$ with the block reward paid to the miner
- Check if the Merkle tree root of the state $S_{FINAL}$ is equal to the final state root provided in the block header
Application
- Token systems
- Identity and Reputation Systems
- Decentralized File Storage
Practical Deanonymization Attack in Ethereum
Two Problem this paper is trying to solve
- What coverage of Ethereum node that the attacker can make connections with
- Adopted in Ethereum to infer the source node accurately
Ethereum P2P network Primer
- Fully distributed P2P network
- Used Devp2p protocol
- Terms
Ethereum Node Records
- Node record consist of three parts. Signature, sequence number and the key/value pairs of node information. Node information contains IP, port and so on
- Ethereum Node Records (ENRs) are a standardized format for network addresses on Ethereum.
Node Discovery Protocol
- Based on Kademlia DHT for storage
- For storage and retrieval Ethereum nodes
- Each node has a cryptographic identity
- Public key: Node ID
- Private key: Sign Transactions
- Logical distance: Number of XOR operation of nodeID hashes
- Via UDP protocol
- Ping and Pong: Detect node status
- FindNode and Neighbors: Find node closest to the target
- EnrRequest and EnrResponce: Request for node record
RLPx Transport Protocol
- TCP-based protocol for information exchange between nodes
- Purpose: Key exchange and protocol handshake
- Key exchange: Diffie Hellman algo
- Protocol handshake: Exchange Hello messages which contain protocol version, clientID, capabilities, listening port and nodeID
Application-level protocols
- Ethereum Wire Protocol (eth)
- Main protocol
- Exchange status handshake after RLPx handshake
- Status handshake contains protocol version, networkID, difficulty, current block hash, genesis block hash and forkID
- Light Ethereum Subprotocol (les)
- Parity Light Protocol (pip)
- Light node like les and pip only download block header and other query information
- Light node create transaction 🔨 but don’t participate in relay of transaction ❌
- Ethereum Wire Protocol (eth)
Address and transaction
- Address
Externally owned address (EOA)
- Generated by secp256K1
- Control by private key :closed_lock_with_key:
Smart contract account address
- Determined by the sender address + number of it’s generated transaction (Nonce)
- Controlled by the contract code
- Transaction
- Initiated by the signature of EOA address
- Types of transaction
- Normal Transaction
- EOA → EOA
- Contract Deploying Transactions
- EOA → zero-account
- To deploy smart contract📃
- Contract Executing Transactions
- EOA → Deployed contract address
Deanonymization of a P2P network
- EOA → Deployed contract address
- Normal Transaction
- Node of a P2P network can either be the creator or the forwarder of a transaction
- Identify the creator, the source node
Based on an assumption that a super node that is connected to all node is able to conclude which node is the source node
- Connection to the Ethereum Node
- Build
ETHNodeFinder
to find node on the Ethereum network EthTXListener
To have a view of the propagation of transaction by connecting to synced nodes
- Build
Infer the source node
### FirstReach Estimator
- The minimum delay from node to supernode
- Assume that the less hops result in shorter delay
- Node with the shortest delay is the source
- Assume that delay in one hop is always longer than two hop
- Triangle inequality violations
### FirstSent Estimator
- $\delta$ is the delay obtain by RTT(Round trip time)
- Subtract delay from the time of arrival
- Hard to estimate the delay
### ML-based Estimator
- Some scenarios, FirstReach > FirstSent
- Reduced to binary classification problem
Experiment
- P2P network
- Run 40 instances on ethNodeFinder
- 32K node found
- 10K full nodes
- 22K include non-synced node, forked nodes and light nodes
- Client
- Geth client: 89%
- OpenEthereum client: 8%
- Other: 3%
- Connections coverage
- 10 instances of ethTxListener deployed
- Each instance maintains connection with not overlapping parts of node in entire network
- More than 90% of nodes maintain connection with supernode
- Broke
- Deanonymization with basic estimators
- Tested both on Ropsten testnet and Ethereum mainnet
- ethTxPretender: generate transaction and sends then only to one selected node
- Testnet
- 300 randomly selected nodes
- Sent 10 transaction to each node
- Total: 3000 transaction
- Mainnet
- 100 randomly selected target
- 1 transaction per node
- Total: 100 transaction
Result
- With $k$ = 10, two estimator have accuracy of 93% and 91% respectively
FirstSent estimator is able to reduce the anonymous set of a transaction to 10 nodes (about 0.1% of all nodes on mainnet) with a success rate of 93%.
- Deanonymization with ML-based estimator
- Some time, FirstReach correct ✅ FirstSent failed ❌
- Introduced ML classification that combined the two estimator
- Based on observation, RandomForest Classifier seems to have the best result
Testnet yield a better result and 88% in mainnet
- ML features
Reach_Time_Diff
: Time difference between the arrival timestamp of the node and the minimum arrival timestampSent_Time_DIff
: Time difference between the estimated sending timestamp of the node and the minimum estimated sending timestamp. Delay is based on TCP timestampInst_Delay
: Closest time that transaction arrivesAvg_Delay
: Average delay of all delays measured within 10 seconds before and after the transaction arrivedDelay_STD
: The standard deviation of all delays measured within 10 seconds before and after the transaction arrives
- Result of different classifier
![](https://hackmd.io/_uploads/ryj2iyV93.png)
- Comparison between the ML and basic estimators