Introduction and Motivation
I bought a house a few months ago. Thinking back on the experience, I was pretty intimidated when I was signing the documents at my lawyer's office. All I was thinking of was how in a few minutes I will be spending more money than what I have spent in 20+ years combined. The main reason for buying a house was to save money on monthly rents; however, I also wanted it to be an investment.
After officially becoming a home owner, even though I was aware that houses were long-term investments, I could not resist but to check the price every once in a while. As a matter of fact, I checked Zillow more frequently than I did when I was looking for a house to buy. Eventually, when my house price finally went up enough to yield profit (considering all the costs), it made me so happy that I wanted to just sell the house, take the extra income, and move to a storage.
Now I wish I had recorded daily prices so that I could make a visualization out of it. Although it's not too late yet, now that I've lost the incentive to check the price unless to ensure that it's still in a good shape, I would have to really commit myself to visit Zillow every day. Instead of doing that, I could automate the process and make my bot do it for me. In this Steemit story, I will present to you how I built my first script that 1) collects data from a website and 2) stores information in Google Document. I also intend to further smooth out the process using the AWS, but I will save that for a future story. This will be an introduction written by someone who has no related experience, so I hope this will be easy to follow even for beginners.
Prerequisites
Your computer should be able to run Python programs. Further, you should have a Google account and a Zillow account, so that you can create a personal Google Spreadsheet and retrieve the APIs.
To install Python, go to http://www.python.org/ and download the latest Python version or Python version 2.7 and follow the instruction to install.
Zillow
Since we need to use Zillow's API to get Zestimate, we need to ask Zillow (nicely) that we would like to have access to their API. Please follow this link: https://www.zillow.com/howto/api/APIOverview.htm and follow their instructions. You will receive an email with your Zillow Web Services Identification, or ZWSID. You will need to include this in your code to send a request.
Google Spreadsheet
Create a new Spreadsheet and name it however you want to. Then rename Sheet1 to something more meaningful. Those are highlighted in yellow.
A new Spreadsheet by default has 1000 rows. Let us get rid of rows 2 to 1000. You can do this as follows: click 2, hold the shift button, scroll down to the bottom, click 1000, right click, select Delete rows 2 - 1000. The reason for this deletion is because later we will be appending data to the last column of the spreadsheet, and we don't want to start adding data to row 1001.
We will use the first three cells as the header:
Now it gets a bit more advanced, but you can do it. Go to https://console.developers.google.com/cloud-resource-manager to obtain OAuth2 credentials. This is required to access Google Spreadsheet using Python. To get started, create a new project.
Name it however you want.
Find the link to manage credentials, click on the blue button, and select Service account key.
Create a JSON file by making a Service account name and clicking on the Create button.
They will automatically start the download, and if you open the JSON file, it should look like this:
Don't worry if your file does not have asterisks because there shouldn't be any. That was me hiding up my keys. Now, we want to share our Google Spreadsheet with the client email that can be found in the JSON file. Copy the email address next to client_email
that should look like this:
Go back to your spreadsheet and share that spreadsheet with the email address you copied.
Send the invite, and note that you don't have to accept the invite. In fact, you will get an email saying that the delivery has failed. Do not worry about it. You are all ready to manipulate Google Spreadsheet with Python. Time to move on to the harder stuff!
Python
Environment
We will use pip
to install all the necessary imports. Open up the command prompt in admin mode by right clicking Command Prompt and selecting Run as administrator.
Now, run the following commands one by one.
pip install requests
pip install gspread
pip install oauth2client
This will set up the necessary environment to run the script that we are about to write. I will explain what those modules do as we build our program. Note that you may not have pip
installed if you are using outdated Python version. In which case, just upgrade Python or install pip
in another way.
Code
import requests
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import time
ZWSID = "<insert your ZWSID>"
zpid = "<insert your zpid>"
url = 'http://www.zillow.com/webservice/GetZestimate.htm?zws-id={}&zpid={}'.format(ZWSID, zpid)
page = requests.get(url)
tree = page.text
location = tree.find('<zestimate>')
price = tree[location + 34:location + 40]
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('House Price Example-2b118bc598d6.json', scope)
gc = gspread.authorize(credentials)
google_doc_link = 'https://docs.google.com/spreadsheets/d/1X6tRFUBAAMdEgSTU1EZBqOnaAHWSyKPOnkEzyqZfIK4/edit#gid=0'
wks = gc.open_by_url(google_doc_link).worksheet('Random House')
wks.append_row([time.strftime("%m/%d/%Y"), time.strftime("%H:%m"), price])
Breakdown
Let us see how that code works and which parts you should change to apply this for your purpose.
import time
import requests
import gspread
from oauth2client.service_account import ServiceAccountCredentials
This first chunk imports modules that we will be using.
ZWSID = "?"
zpid = "52388704"
url = 'http://www.zillow.com/webservice/GetZestimate.htm?zws-id={}&zpid={}'.format(ZWSID, zpid)
Find your house on Zillow and replace ZWSID and zpid with your own information. You should have received an email with your ZWSID already, and you can locate the eight digit zpid in your Zillow url:
I used a zpid of a random house I found with the search queryrandom Zillow address
. If this is your house, then what a coincidence :) Please let me know if you would like me to stop using your house as an example.
page = requests.get(url)
Recall that you imported requests
. We will use get()
, which is a function from the requests
module. Feed in the Zillow link to the get()
function, then the page
variable will contain a response from the request that we sent to that link.
tree = page.text
The page
variable is an object, which means (at least for our purposes) that it contains excessive information. We just need to see whatpage
's text is. We will store page.text
into a new variable called tree
.
location = tree.find('<zestimate>')
price = tree[location + 34:location + 40]
Print tree
(print(tree)
) to see the text in xml format. We want to find the < zestimate> tag, so we use Python's built-in function called .find()
. Store the resulting index in a variable called location
. For simplicity, let's assume location
is 1030, and your house price in USD is 345678. Then we find how far from index 1030 the house price starts at. We can either count, or we can guestimate and use trial and error until we correctly detect the initial position of the number 3. In my case, the offset amount was 34. If your house is 6 figures, then the final position will be 34+6, or 40. If your house is 7 figures, then you will add 7. If your house is 8 figures, then my Bitcoin address is 1MXDqAAAB1hHjUppoYV4h4517nx2ck7ynd
. Store that sliced information into a variable called price
.
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('House Price Example-2b118bc598d6.json', scope)
gc = gspread.authorize(credentials)
This part is to let you have access to Google Docs' API. Just make sure you have your json file (from Google Docs) saved in the same folder as your Python file.
google_doc_link = 'https://docs.google.com/spreadsheets/d/1X6tRFUBAAMdEgSTU1EZBqOnaAHWSyKPOnkEzyqZfIK4/edit#gid=0'
wks = gc.open_by_url(google_doc_link).worksheet('Random House')
wks.append_row([time.strftime("%m/%d/%Y"), time.strftime("%H:%m"), price])
Replace the above Google Doc with your own, and replace the worksheet name with your own. If you haven't changed it, then it will be 'Sheet1' (i.e. .worksheet('Sheet1')). The last code appends a new row and adds three cells: Date, time, and price.
Now if you check your spreadsheet, then you will see a new entry like this:
Batch
Just to make it easier to run the program, let's create a batch file. Open up Notepad and save after writing the following lines:
chdir /d <<yourdirectory>>
python demo.py
Make sure to replace <<yourdirectory>>
with your actual directory that you saved your Python and JSON files in.
You can also create a shortcut and save it to your Desktop, where your file will be the most visible.
Conclusion
That's it! Run the Python program every day, and in a few months you will have a good amount of data. Note that you can take advantage of the API and retrieve other information, so explore and do something creative. In my next story, I will show how we can use AWS to collect data even more easily.
Congratulations @peterout! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
You published your First Post
You got a First Vote
You made your First Comment
You got a First Reply
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Congratulations @peterout! You received a personal award!
You can view your badges on your Steem Board and compare to others on the Steem Ranking
Vote for @Steemitboard as a witness to get one more award and increased upvotes!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
In general there are lots of ways to monitor collect your house price data. So, I recommend to look at different real estate api (even different zillow) to define the best for your goal(s)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit