Saturday, August 24, 2013

Using the Google Geocoding API from PowerBuilder.Net

I'm running a petition on causes.com, and I wanted to do some geographic analysis of the results.  Causes.com gives me information about the signers with regard to their name, email address, country and zip code.  The country information was useful for doing the first phase of the analysis, which I wrote about in the Lumira space.  However, I wanted to do a bit more fine grained analysis, and that means I needed to convert that zip code information into something more directly useful (e.g., city, county and state for the United States).  There's 1500 data points though, so I need an automated method.  Google Geogoding API to the rescue.




The Google Geocoding API is a REST services, so I created a REST client in PowerBuilder.Net using the following URL:


  1. http://maps.google.com/maps/api/geocode/xml?components=country:USA|postal_code:{zip}&sensor=false  


Where {zip} represents the zip code that I'll be passing in as an argument.  Most REST services don't have a WSDL you can use to create client data types from.  For the PowerBuilder.Net REST client perhaps the simplest way to get it to be able to determine what data types to create is to provide it with a sample response from the service.  What I provided was the following:

  1. <?xml version="1.0" encoding="UTF-8"?>  
  2. <GeocodeResponse>  
  3.  <status>OK</status>  
  4.  <result>  
  5.   <type>postal_code</type>  
  6.   <formatted_address>Azusa, CA 91702, USA</formatted_address>  
  7.   <address_component>  
  8.    <long_name>91702</long_name>  
  9.    <short_name>91702</short_name>  
  10.    <type>postal_code</type>  
  11.   </address_component>  
  12.   <address_component>  
  13.    <long_name>Azusa</long_name>  
  14.    <short_name>Azusa</short_name>  
  15.    <type>locality</type>  
  16.    <type>political</type>  
  17.   </address_component>  
  18.   <address_component>  
  19.    <long_name>Los Angeles</long_name>  
  20.    <short_name>Los Angeles</short_name>  
  21.    <type>administrative_area_level_2</type>  
  22.    <type>political</type>  
  23.   </address_component>  
  24.   <address_component>  
  25.    <long_name>California</long_name>  
  26.    <short_name>CA</short_name>  
  27.    <type>administrative_area_level_1</type>  
  28.    <type>political</type>  
  29.   </address_component>  
  30.   <address_component>  
  31.    <long_name>United States</long_name>  
  32.    <short_name>US</short_name>  
  33.    <type>country</type>  
  34.    <type>political</type>  
  35.   </address_component>  
  36.   <geometry>  
  37.    <location>  
  38.     <lat>34.2667473</lat>  
  39.     <lng>-117.8545867</lng>  
  40.    </location>  
  41.    <location_type>APPROXIMATE</location_type>  
  42.    <viewport>  
  43.     <southwest>  
  44.      <lat>34.1031590</lat>  
  45.      <lng>-118.0513950</lng>  
  46.     </southwest>  
  47.     <northeast>  
  48.      <lat>34.3791980</lat>  
  49.      <lng>-117.6556881</lng>  
  50.     </northeast>  
  51.    </viewport>  
  52.    <bounds>  
  53.     <southwest>  
  54.      <lat>34.1031590</lat>  
  55.      <lng>-118.0513950</lng>  
  56.     </southwest>  
  57.     <northeast>  
  58.      <lat>34.3791980</lat>  
  59.      <lng>-117.6556881</lng>  
  60.     </northeast>  
  61.    </bounds>  
  62.   </geometry>  
  63.  </result>  
  64. </GeocodeResponse>  

Note that the information I want comes in the "administrative_area_level_1" (State), "administrative_area_level_2" (County) and "locality" (City) address component types.  I'm also going to grab the approximate lattitude and longitude coordinates from the geometry portion of the response.

With the REST proxy created, I create a WPF application that has a datawindow that reads the CSV data that causes.com gives me and then loops through it adding the information from the Google Geocoding API.  The main code of interest is as follows.

  1. long     ll_index, ll_count  
  2. string     ls_zip, ls_lat, ls_long, ls_country, ls_county, ls_city, ls_state  
  3. googlegeo_proxy     proxy  
  4. GeocodeResponse response  
  5.  ll_count = dw_1.RowCount()  
  6.  proxy = create googlegeo_proxy  
  7.  FOR ll_index = 1 to ll_count  
  8.      ls_country = dw_1.Object.country[ll_index]  
  9.      if ls_country <> "United States" THEN CONTINUE  
  10.      ls_zip = dw_1.Object.zip[ll_index]  
  11.      if Len ( ls_zip ) = 4 then  
  12.           ls_zip = '0' + ls_zip  
  13.      end if  
  14.      response = proxy.GetMessage ( ls_zip )  
  15.      if response.status = "OK" then  
  16.           ls_lat = String ( response.result.geometry.location.lat )  
  17.           ls_long = string ( response.result.geometry.location.lng )  
  18.            System.Collections.IEnumerator enum   
  19.           enum = response.result.address_component.GetEnumerator()  
  20.           GeocodeResponseResultAddress_component address  
  21.           do while enum.MoveNext()  
  22.                address = enum.Current  
  23.                Choose CASE address.@type[1]   
  24.                     CASE "administrative_area_level_1"  
  25.                          ls_state = address.long_name  
  26.                     CASE "administrative_area_level_2"  
  27.                          ls_county = address.long_name  
  28.                     CASE "administrative_area_level_3"  
  29.                          ls_city = address.long_name  
  30.                     CASE "locality"  
  31.                          ls_city = address.long_name  
  32.                END choose  
  33.           loop  
  34.           dw_1.Object.lattitude[ll_index] = ls_lat  
  35.           dw_1.Object.longitude[ll_index] = ls_long  
  36.           dw_1.Object.state[ll_index] = ls_state  
  37.           dw_1.Object.county[ll_index] = ls_county  
  38.           dw_1.Object.city[ll_index] = ls_city  
  39.      else  
  40.           dw_1.Object.lattitude[ll_index] = response.status  
  41.      end if  
  42.      // Wait 2 seconds or Google will put us into OVER_QUERY_LIMIT condition  
  43.      Sleep ( 2 )  
  44. NEXT  

The zip codes were imported as numbers, so I'm prefixing them with 0 of they are only four digits long.  I'm using an enum to loop through the address_components because PowerBuilder.Net doesn't offer a particularly easy way of requesting a specific element in the collection.  And referencing address.@type[1] is a bit of a hack, as there are a number of type attributes returned and I'm assuming the one I want is always the first one.  That code will break if they don't always come back in the order I expect.

The last thing you might note is that I do a Sleep(2) between calls to the API.  Google places some restrictions on calls to their APIs to prevent abuse, often in terms of total number of calls or calls per day.  In the case of the Geocoding API, the restriction is that you can't call it more often than once every 2 seconds.  The Sleep makes sure that I don't exceed that limit and start getting overy query limit error response.

And here's the results on a state by state basis for the United States.

petitionsignersbystate.png

Unfortuantely, Lumira was unable to recognize enough of the city and county names that Google provided that it make trying to do the analysis on any finer detail rather difficult.

No comments: