I'm running a petition on causes.com, and I wanted to do some geographic analysis of the results. Causes.com gives me information about the signers with regard to their name, email address, country and zip code. The country information was useful for doing the first phase of the analysis, which I wrote about in the Lumira space. However, I wanted to do a bit more fine grained analysis, and that means I needed to convert that zip code information into something more directly useful (e.g., city, county and state for the United States). There's 1500 data points though, so I need an automated method. Google Geogoding API to the rescue.
The Google Geocoding API is a REST services, so I created a REST client in PowerBuilder.Net using the following URL:
- http://maps.google.com/maps/api/geocode/xml?components=country:USA|postal_code:{zip}&sensor=false
Where {zip} represents the zip code that I'll be passing in as an argument. Most REST services don't have a WSDL you can use to create client data types from. For the PowerBuilder.Net REST client perhaps the simplest way to get it to be able to determine what data types to create is to provide it with a sample response from the service. What I provided was the following:
- <?xml version="1.0" encoding="UTF-8"?>
- <GeocodeResponse>
- <status>OK</status>
- <result>
- <type>postal_code</type>
- <formatted_address>Azusa, CA 91702, USA</formatted_address>
- <address_component>
- <long_name>91702</long_name>
- <short_name>91702</short_name>
- <type>postal_code</type>
- </address_component>
- <address_component>
- <long_name>Azusa</long_name>
- <short_name>Azusa</short_name>
- <type>locality</type>
- <type>political</type>
- </address_component>
- <address_component>
- <long_name>Los Angeles</long_name>
- <short_name>Los Angeles</short_name>
- <type>administrative_area_level_2</type>
- <type>political</type>
- </address_component>
- <address_component>
- <long_name>California</long_name>
- <short_name>CA</short_name>
- <type>administrative_area_level_1</type>
- <type>political</type>
- </address_component>
- <address_component>
- <long_name>United States</long_name>
- <short_name>US</short_name>
- <type>country</type>
- <type>political</type>
- </address_component>
- <geometry>
- <location>
- <lat>34.2667473</lat>
- <lng>-117.8545867</lng>
- </location>
- <location_type>APPROXIMATE</location_type>
- <viewport>
- <southwest>
- <lat>34.1031590</lat>
- <lng>-118.0513950</lng>
- </southwest>
- <northeast>
- <lat>34.3791980</lat>
- <lng>-117.6556881</lng>
- </northeast>
- </viewport>
- <bounds>
- <southwest>
- <lat>34.1031590</lat>
- <lng>-118.0513950</lng>
- </southwest>
- <northeast>
- <lat>34.3791980</lat>
- <lng>-117.6556881</lng>
- </northeast>
- </bounds>
- </geometry>
- </result>
- </GeocodeResponse>
Note that the information I want comes in the "administrative_area_level_1" (State), "administrative_area_level_2" (County) and "locality" (City) address component types. I'm also going to grab the approximate lattitude and longitude coordinates from the geometry portion of the response.
With the REST proxy created, I create a WPF application that has a datawindow that reads the CSV data that causes.com gives me and then loops through it adding the information from the Google Geocoding API. The main code of interest is as follows.
- long ll_index, ll_count
- string ls_zip, ls_lat, ls_long, ls_country, ls_county, ls_city, ls_state
- googlegeo_proxy proxy
- GeocodeResponse response
- ll_count = dw_1.RowCount()
- proxy = create googlegeo_proxy
- FOR ll_index = 1 to ll_count
- ls_country = dw_1.Object.country[ll_index]
- if ls_country <> "United States" THEN CONTINUE
- ls_zip = dw_1.Object.zip[ll_index]
- if Len ( ls_zip ) = 4 then
- ls_zip = '0' + ls_zip
- end if
- response = proxy.GetMessage ( ls_zip )
- if response.status = "OK" then
- ls_lat = String ( response.result.geometry.location.lat )
- ls_long = string ( response.result.geometry.location.lng )
- System.Collections.IEnumerator enum
- enum = response.result.address_component.GetEnumerator()
- GeocodeResponseResultAddress_component address
- do while enum.MoveNext()
- address = enum.Current
- Choose CASE address.@type[1]
- CASE "administrative_area_level_1"
- ls_state = address.long_name
- CASE "administrative_area_level_2"
- ls_county = address.long_name
- CASE "administrative_area_level_3"
- ls_city = address.long_name
- CASE "locality"
- ls_city = address.long_name
- END choose
- loop
- dw_1.Object.lattitude[ll_index] = ls_lat
- dw_1.Object.longitude[ll_index] = ls_long
- dw_1.Object.state[ll_index] = ls_state
- dw_1.Object.county[ll_index] = ls_county
- dw_1.Object.city[ll_index] = ls_city
- else
- dw_1.Object.lattitude[ll_index] = response.status
- end if
- // Wait 2 seconds or Google will put us into OVER_QUERY_LIMIT condition
- Sleep ( 2 )
- NEXT
The zip codes were imported as numbers, so I'm prefixing them with 0 of they are only four digits long. I'm using an enum to loop through the address_components because PowerBuilder.Net doesn't offer a particularly easy way of requesting a specific element in the collection. And referencing address.@type[1] is a bit of a hack, as there are a number of type attributes returned and I'm assuming the one I want is always the first one. That code will break if they don't always come back in the order I expect.
The last thing you might note is that I do a Sleep(2) between calls to the API. Google places some restrictions on calls to their APIs to prevent abuse, often in terms of total number of calls or calls per day. In the case of the Geocoding API, the restriction is that you can't call it more often than once every 2 seconds. The Sleep makes sure that I don't exceed that limit and start getting overy query limit error response.
And here's the results on a state by state basis for the United States.
Unfortuantely, Lumira was unable to recognize enough of the city and county names that Google provided that it make trying to do the analysis on any finer detail rather difficult.
No comments:
Post a Comment