Here's what I spent the night doing. I've been slurping the blockchain for two months with http://ethslurp.com software. This software allows me to download all the transactions for any given Ethereum address to my local hard drive in a tab separated format. It also allows me to cover the Ethereum transaction's input data field into a human readable version of the data. This allows me to see what function call the transaction is making on a smart contract.
I've been download The DAO transactions. There are nearly 140,000 of them. I store all the transactions in a single file with the fields in each record separated by tabs. Here comes the hard core Linux command line shit. Here's what I did:
cat TheDao.txt | cut -f7 | cut -f1 -d'|' | sort | uniq | sed 's/^/.\/seperate_function /' >functions.txt
This gives me a list of all the uniq function names that had ever been called on the DAO and inserted a call to a shell script at the front of the line. So the file functions.txt looked something like this:
./seperate_function createTokenProxy
...
./seperate_function transferFrom
./seperate_function transfer
./seperate_function vote
and so on. Next I wrote the shell script called 'separate_function' thus:
cat TheDao.txt | grep $1 >$1.txt
which created a separate file for each function of the DAO. Then I took all the records in the function files for the createTokenProxy, and the two transfer functions and I pulled out all ~38,500 Ethereum accounts that had ever owned DAO tokens. (The only way to get a DAO token was to have originally bought them during the creation period, or bought on on the open market (i.e. transfer...)
So now I had a big file with a single column of 38,500 Ethereum accounts that had every owned DAO tokens. Next I did this command:
cat ownership_transactions.txt | awk '{ print $1 "," $1 }' | sed -s codify.sed >geth_script
The awk script makes two copies of the account number on each line. You'll see why by looking at the file codify.sed which had this code in it:
s/^/console.log("/
s/$/"));/
s/,/"+"\t"+"theDAO.balanceOf("/
Running this command gave me a file with 38,500 lines that looked something like this:
console.log("0x098300d08a0e00...." + "\t" + "theDAO.balanceOf(" + "0x098300d08a0e00....")
To which I prepended code to establish the DOA variable in geth.
I then started geth on a separate window and ran this command:
cat geth_script | geth attach
An hour later I had a list of all the Ethereum addresses that had every owned DAO tokens and their current account balances, which I posted here: http://daodeepdive.com/data/balances/. Check it out.