Ethereum Overview and Privacy Attacks
Published:
Ethereum
Accounts
- 20-byte address
- A state -> state’ = transaction of information or value between account
- Contain 4 fields
- Nonce: Counter that make sure the transaction can be processed once
- Ether balance
- Contract code (for contract account)
- Storage
- Type of accounts
- Externally owned accounts
- Controlled by private keys
- Can send messages by creating and signing transaction
- Contract accounts
- Controlled by contract code
- Code is activate when message is received
- Code could read/write to internal storage or create a message or contract in return
Contract
- Externally owned accounts
- An autonomous agent the live inside the Ethereum environment
- Execute a specific code when “triggered”
Transaction and message
Transaction- A sign data package that store a message to be sent from an externally owned accountSTARTGASandGASPRICEare to prevent infinite loop by limiting the number of computation stepsgasis dependent on the amount of computation and capacity of data- Transaction contain
- Recipient message
- Signature of the sender
- Amount of ether to be transfer
- Data field(optional)
- STARTGAS - Maximum allowed computation steps
- GASPRICE - Fees pays by the sender per computational steps(gas)
Messages
Messagesis like transaction except it is produced by a contract- Messages contain the sender, recipient, amount of ether being sent, data field(optional),
STARTGASEthereum state transition function
- Validate the transaction
- Calculate the transaction fee
STARTGAS*GASPRICE - Subtract the fees from the sender’s account balance and increment the sender’s nonce
- Transfer the transaction value from sender to recipient.
- If recipient don’t exist. Create a new account
- If it’s a contract account, run the contract’s code until completion or run out of gas
- If ran out of gas or the sender don’t have efficient fund. Revert all state changes except the payment of the fees.
- Otherwise, refund the remaining gas to the sender.
- Fees are send to the sender
- Example of transition function (Send 10eth, 2000 gas, 0.001 ether gasprice and 64 bytes of data)
- Data being sent:
- byte[0:31] = 2
- byte[32:63] = “CHARLIE”
if !self.storage[calldataload(0)]: self.storage[calldataload(0)] = calldataload(32)- Validate the transaction
- Check the sender has at least 2000 * 0.001 = 2 ether in it’s account
- Subtract 2 ether from sender account
- 2000 gas initialized
- Assume transaction is 170 bytes long and byte-fee is 5.
- 2000 - 850 = 1150
- Subtract 10 eth from sender account (ether sent in the transaction)
- Execute the code (Assume it took 187 gas)
- Check of contract’s storage at index 2 is empty
- If it empty, store string “CHARLIE”
- Gas remaining 1150 - 187 = 963
- Added 963 * 0.01 ether back to sender’s account
- Data being sent:
- If there are no contract in the other end, the gas will just be gasPrice * length of the transaction in byte. The data sent will be ignored
Code execution (More to come)
- Each operation in the script can interact with stack, memory, and contract’s long term storage
Blockchain and mining
- Differ from Bitcoin is that Ethereum store Transaction list, most recent state, block number and difficulty are stored in the block
- Differ from Bitcoin, the state information is part of the last block. There is not need to store entire block history
Block validation algorithm

- Check if the previous block reference exist is valid
- Check that the timestamp
- Check block number, difficulty, transaction root, uncle root and gas limit
- Check the PoW of the block is valid
- For all $i$ in 0…$n+1$, set $s[i+1]=APPLY(S[i], TX[i])$
- $S_{FINAL}$ is $s[n]$ with the block reward paid to the miner
- Check if the Merkle tree root of the state $S_{FINAL}$ is equal to the final state root provided in the block header
Application
- Token systems
- Identity and Reputation Systems
- Decentralized File Storage
Practical Deanonymization Attack in Ethereum
Two Problem this paper is trying to solve
- What coverage of Ethereum node that the attacker can make connections with
- Adopted in Ethereum to infer the source node accurately
Ethereum P2P network Primer
- Fully distributed P2P network
- Used Devp2p protocol
- Terms
Ethereum Node Records- Node record consist of three parts. Signature, sequence number and the key/value pairs of node information. Node information contains IP, port and so on
- Ethereum Node Records (ENRs) are a standardized format for network addresses on Ethereum.
Node Discovery Protocol- Based on Kademlia DHT for storage
- For storage and retrieval Ethereum nodes
- Each node has a cryptographic identity
- Public key: Node ID
- Private key: Sign Transactions
- Logical distance: Number of XOR operation of nodeID hashes
- Via UDP protocol
- Ping and Pong: Detect node status
- FindNode and Neighbors: Find node closest to the target
- EnrRequest and EnrResponce: Request for node record
RLPx Transport Protocol- TCP-based protocol for information exchange between nodes
- Purpose: Key exchange and protocol handshake
- Key exchange: Diffie Hellman algo
- Protocol handshake: Exchange Hello messages which contain protocol version, clientID, capabilities, listening port and nodeID
Application-level protocols- Ethereum Wire Protocol (eth)
- Main protocol
- Exchange status handshake after RLPx handshake
- Status handshake contains protocol version, networkID, difficulty, current block hash, genesis block hash and forkID
- Light Ethereum Subprotocol (les)
- Parity Light Protocol (pip)
- Light node like les and pip only download block header and other query information
- Light node create transaction 🔨 but don’t participate in relay of transaction ❌
- Ethereum Wire Protocol (eth)
Address and transaction
- Address
Externally owned address (EOA)- Generated by secp256K1
- Control by private key :closed_lock_with_key:
Smart contract account address- Determined by the sender address + number of it’s generated transaction (Nonce)
- Controlled by the contract code
- Transaction
- Initiated by the signature of EOA address
- Types of transaction
- Normal Transaction
- EOA → EOA
- Contract Deploying Transactions
- EOA → zero-account
- To deploy smart contract📃
- Contract Executing Transactions
- EOA → Deployed contract address
Deanonymization of a P2P network
- EOA → Deployed contract address
- Normal Transaction
- Node of a P2P network can either be the creator or the forwarder of a transaction
- Identify the creator, the source node
Based on an assumption that a super node that is connected to all node is able to conclude which node is the source node

- Connection to the Ethereum Node
- Build
ETHNodeFinderto find node on the Ethereum network EthTXListenerTo have a view of the propagation of transaction by connecting to synced nodes
- Build
Infer the source node
### FirstReach Estimator

- The minimum delay from node to supernode
- Assume that the less hops result in shorter delay
- Node with the shortest delay is the source
- Assume that delay in one hop is always longer than two hop
- Triangle inequality violations

### FirstSent Estimator

- $\delta$ is the delay obtain by RTT(Round trip time)
- Subtract delay from the time of arrival
- Hard to estimate the delay
### ML-based Estimator
- Some scenarios, FirstReach > FirstSent
- Reduced to binary classification problem
Experiment
- P2P network
- Run 40 instances on ethNodeFinder
- 32K node found
- 10K full nodes
- 22K include non-synced node, forked nodes and light nodes
- Client
- Geth client: 89%
- OpenEthereum client: 8%
- Other: 3%
- Connections coverage
- 10 instances of ethTxListener deployed
- Each instance maintains connection with not overlapping parts of node in entire network
- More than 90% of nodes maintain connection with supernode
- Broke
- Deanonymization with basic estimators
- Tested both on Ropsten testnet and Ethereum mainnet
- ethTxPretender: generate transaction and sends then only to one selected node
- Testnet
- 300 randomly selected nodes
- Sent 10 transaction to each node
- Total: 3000 transaction
- Mainnet
- 100 randomly selected target
- 1 transaction per node
- Total: 100 transaction
Result


- With $k$ = 10, two estimator have accuracy of 93% and 91% respectively
FirstSent estimator is able to reduce the anonymous set of a transaction to 10 nodes (about 0.1% of all nodes on mainnet) with a success rate of 93%.
- Deanonymization with ML-based estimator
- Some time, FirstReach correct ✅ FirstSent failed ❌
- Introduced ML classification that combined the two estimator
- Based on observation, RandomForest Classifier seems to have the best result
Testnet yield a better result and 88% in mainnet
- ML features
Reach_Time_Diff: Time difference between the arrival timestamp of the node and the minimum arrival timestampSent_Time_DIff: Time difference between the estimated sending timestamp of the node and the minimum estimated sending timestamp. Delay is based on TCP timestampInst_Delay: Closest time that transaction arrivesAvg_Delay: Average delay of all delays measured within 10 seconds before and after the transaction arrivedDelay_STD: The standard deviation of all delays measured within 10 seconds before and after the transaction arrives
- Result of different classifier
 - Comparison between the ML and basic estimators


