Open Data API: Socrata Queries

Chicago's open data is managed on the Socrata open data platform. 

The Chicago portal offers very helpful filtering and downloading capabilities through its web interface but applications need API access. 

Socrata provides an API called SODA or the Socrata Open Data API but the format for queries is not easy to learn and use.  The most up-to-date description of SODA here gives a lot of information to get you started.

Socrata provides a Ruby gem called socrata-ruby to access the API but as far as I can tell it does not run queries other than text searches.

Chicago has provided a Ruby gem called windy that wraps SODA with a much simpler interface.  Unfortunately windy doesn't permit any selection of rows so it loads a complete dataset.  That makes it very handy for exploring small datasets but not for large ones like the one my app uses unless you create a small view from the web interface and explore that with the gem.

(Note:  There's something for PHP, too, but I don't know much about it.  See this blog post.) 

I have written an app to write filtered SODA queries which none of the existing tools do. An app to write queries sounds like overkill, but check this format for a fairly simple query filter:

{ "originalViewId"=>"qnrb-dui6",
  "name"=>"not used but must be specified",
  "query"=>{
    "filterCondition"=>
      {"type"=>"operator",
       "value"=>"AND",
       "children"=>
       [{"type"=>"operator",
         "value"=>"EQUALS",
         "children"=>[
           {"columnId"=>3850119, "type"=>"column"},
           {"type"=>"literal", "value"=>"2011"}]},
        {"type"=>"operator",
         "value"=>"GREATER_THAN_OR_EQUALS",
         "children"=>[
           {"columnId"=>3850106, "type"=>"column"},
           {"type"=>"literal", "value"=>"2011-11-26T00:00:00"}]},
        {"type"=>"operator",
         "value"=>"BETWEEN",
         "children"=>[
               {"columnId"=>3850105, "type"=>"column"},
               {"type"=>"literal", "value"=>"HT606237"},
               {"type"=>"literal", "value"=>"HT606284"}]}
       ]
      }
  }
}

The query this specifies is "all 2011 crime records where date is >= 11/26/2011 and HT606237 >= case number >= HT606284" so you see why a generator is needed for this!

Another complication is that fields are specified by number, not by name, and those field numbers can and do change over the life a Socrata view.

Illinois open data is also served with Socrata and this query app can be used there as well.

Advice for Beginners

If you're a beginner to Ruby and to Rails and don't know how to start or how to get past your first learning steps, here are some resources you can try:

 

Learning Ruby

  • "Learn to Program" by Chris Pine - a beginner's programming book with lots of Ruby exercises. (earlier version online)
  •  Hackety Hack a fun way for beginners to learn Ruby.
  • Test-First Teaching - click on 'Learn Ruby'
  • Ruby Koans - a self-guided journey through topics in Ruby for beginners and experts alike
  • Ruby Warrior - write and refine some Ruby code to get your warrior to the top of a hazardous tower
  • Ruby Quiz - a guided tour through the world of possibility; use your Ruby to build simple apps, games, and solve problems


Learn Rails


Watch screencasts


Learn about learning

Apprenticeship Patterns   - advice for aspiring programmers


Learn from colleagues

  • Join your local Ruby users group
  • Find Ruby and Rails meetups anywhere at Ruby in Person
  • Attend a RailsBridge workshop


Get experience

  • Just do it. Write and publish your own Rails app. 
  • Come to a hack session
  • Start your own hack session group

 

h/t RailsBridge team for many of these suggestions

2010 Conference Retrospective

Reviewing the conferences I attended last year, I see eight on Ruby and three on databases.

  1. Mountain West Ruby Conference
  2. Great Lakes Ruby Bash
  3. Code Retreat Floyd
  4. Mongo NYC
  5. GoRuCo
  6. Windy City DB
  7. MidWest Ruby Conference
  8. Windy City Rails
  9. Ruby DCamp
  10. Mongo Chicago
  11. RubyConf Uruguay

I may need to get a new hobby.

Twitter and Ruby

I've been working with a friend on an app that uses the Twitter API.  We've learned a bit about the API and how to use it from Ruby that may be useful to people starting out on a Twitter app.

The first thing to realize is that there are three completely different Twitter APIs, usually known as the REST API, the Search API, and the Streaming API. 

The APIs provide different information.  REST gives update timelines, status data, and user information when asked.  Search gives search and trend information when asked.  The streaming API allows a persistent connection to pick up a stream of tweets.

Some terminology:  A 'status' means a tweet.  A 'timeline' is a set of tweets sent or received by a user.

You'll need to read through the API documentation at http://dev.twitter.com to decide which API(s) you'll need for your app and how to handle authentication. 

As you work on accessign the API you'll find that http://dev.twitter.com/console or curl from the command line will be most useful for debugging.  If you get stuck, the Google group Twitter Development Talk is quite good and is monitored by Twitter technical people.

Once you've got things working in curl, how about using Ruby to access the APIs?  We tried several gems and this is what we ended up with.  Your mileage may vary, of course.

For the REST and Search APIs, John Nunemaker's twitter gem is the place to start.  We use it to look up user ID for a Twitter name.  We haven't tried the recently rewritten version 1.0 yet.

Our primary requirement, to pick out all tweets containing a particular string for a list of users, needs the Streaming API. 

We tried several gems before we got access to the Streaming API to work reliably. 

The yajl-ruby gem ought to work now, since a bug was fixed recently for 0.7.8, but we found it hangs inexplicably at times.  The hang doesn't seem to have anything to do with the Ruby code in the gem so we were unable to correct it. 

We then moved on to the gems that use Event Machine and found, after some difficulties with Ilya Grigorik's em-http-request, that Voloko's twitter-stream gem works really well for us.

As always in the Ruby world, there's more than one way to do it.  This is what worked for us.

Standard Library RDoc for Ruby 1.9.1 and 1.9.2

Update 20 December 2011:  And go here for 1.9.3 documentation.

Update 31 August:  Now just go to RubyDoc to get 1.9.2 documentation in YARD format.

 

If you've been looking for the documentation for the Standard Library in Ruby 1.9.1 and wondered why it isn't on Ruby-Doc, wonder no more.  It really is there but is just not mentioned on the main page.  Go to the page described as 1.9.1 Core API and you'll see that it includes both the Core API and the Standard Library.

Most of the changes in Ruby 1.9.2 seem to be in the Standard Library but since it's just been released no one has posted it yet.  It's easy enough to generate it for yourself, though, if you've got 158MB of space for it:

First find where the Ruby source is on your machine (mine is in /Users/me/.rvm/src/ruby-1.9.2-p0) and cd to that directory.  Then type the command

rdoc  -o  /path/to/some/new/directory  lib  ext

It will run in a minute or two, about 10 times quicker than the same thing does in 1.9.1, and puts most but not all of the necessary files into your new directory in the Darkfish format.  It's easy enough to read as long as you don't mind long unalphabetized lists, but you will need to find some JavaScript files to make the 'click to toggle source' function work.

Copy the files from the Darkfish directory js directory (mine is in /Users/me/.rvm/src/ruby-1.9.2-p0/lib/rdoc/generator/template/darkfish/js) to a new subdirectory js in the same directory you specifed for rdoc -o.

Now when you open index.html in the new directory you'll have the 1.9.2 Standard Library documentation with source code displayed when you click on the method name.

 

Making OSS presentable, not regrettable

 

Everything about Ruby Midwest a week ago in Kansas City was very well done.  In addition to the many useful presentations on the latest developments in Ruby and related technologies we heard a lot about the social aspects of open source.

Chris Wanstrath of GitHub (aka defunkt) started the conference with a keynote about making OS presentable, not regrettable.

I thought it was worth recording the points he made.  My notes on his talk are below.  Any errors can be blamed on me and my transcription of the talk.

Ruby Midwest Keynote  16 July 2010

The social aspect of an OSS project is often bad even when the code is good. Maybe it's too easy to release open source software. The Perl community does a good job managing this and we can learn from them.  Good examples of Ruby projects are ClickToFlash and Homebrew.

Keep in mind the rules listed here whenever you release software.  How do you determine if a project is 'released'?  If you have a readme with install instructions your project is released, nor just shared or distributed.

Things every released project must have/do:

1.  A license file, to show that it is open source.

2.  Easy and reliable installation instructions. A quickstart procedure is most important to ensure adoption. This is an important part of MySQL's success -- Postgres has much lower adoption because its installation is much more difficult. Also important is reducing conflict so quickstart will work the first time: JQuery no-conflict mode made it easy to try along with other JS libraries when JQuery was new. If your project is intended for general use, make it easy for even non-Rubyists to install. Redis, for example, is very easy to install. [gh note: Here's an example of a project that needs work to appeal to non-Rubyists.]

3.  Who is your API for? Machines or people? If you have a command line interface, you have a human API. Probably you should have a CLI to make it very easy for people to start using your code.

4.  Provide examples. Good ones! ASIHTTRequest library for iPhone  is good at this:  All common tasks have examples.

5.  Have a public API that does not change. The code behind the API changes but the API does not. Rdoc is not a public API. It's a method list.

6.  Have man pages. Debian considers software without man page to be a bug. Many man formats can be used. Try Ronn to convert markdown to roff.

7.  Provide a list of dependencies. If you need Bundler, say so.

8.  Set user expectations. How do they know what it should do? Redis is good at this.

9.  It's OK to stop maintaining a project, as long as you say so at the top of the readme.

10. It's OK if it's not actually used anywhere, even by you, if you say so at the top of the readme.

11. Provide the right number of features. It's better to allow plugins than to include every possible feature but be sure to provide some way to find them in a forum or wiki.

12. Lack of competition is a very bad thing. If someone else's  library doesn't do what you want write your own small, focused app. For example: DelayedJob and Resque.

13. Don't do too few releases or too many releases. Make it easy to fix things quickly and release quickly.

14. Provide a change log, not just a git commit list. Note that Hoe can automate this.

15. See Tom Preston-Werner's article  on how to do semantic versioning. God 0.12 is a very bad example.  Is your project beta or  production?  If it's in production anywhere it's not beta!

16. Readme should include meta-info like how to get help, how to contribute, how to run tests. Keep it up to date and remember beginners.

17. Be a lazy maintainer. Provide a road map of things you want to do and others may write them for you. One project where this worked well was Scott Chacon's Showoff.

18. Avoid linkrot. This is why you don't use blogs as documentation. Redis has a good and downloadable wiki.

19. Don't have your own domain name unless it's a really big project [gh note:  And even if it is a big project, don't do this.] Keep it in GitHub so it's findable and maintained.

20. When naming your project, google it first to see if it is distinctive enough but not too distinctive. Resque, for example, should be a distinctive name but it's screwed up by Google search.

21. Market it. Promote the fundamental feature of the app and let people discover the rest. A good example of this was the 'write a blog in 15 minutes' video in early Rails days. Good marketing examples to follow: Rails, JQuery, Redis, HomeBrew, Django, Unicorn, ASIHTTPrequest.

 

 

MWRC 2010

Pat Eyler and Mike Moore did a great job putting on Mountain West Ruby Conference, as usual.  Some of us who stayed on in Salt Lake City had dinner Saturday:

Left to right:  James Golick, Andrew Shafer, Chad Woolley, Brandon Dimcheff, Brian Mitchell, Giles Bowkett, Yukihiro Matsumoto, me, Dave Brady.  Not pictured:  Alistair Cockburn, Liz Brady.

Good food and good company:  Who could ask for anything more?  Getting my picture taken with Matz!

 

 

Filed under  //   MWRC  

Query Testing

 

Whatever database you use, whether a conventional RDB or one of the new NoSQL databases, the most important and most frequently violated rule of databases still applies:

 

NEVER do an un-indexed read!

 

Or to be more exact:  Never do an un-indexed read on any customer-facing query on any table with more than one record.

 

We all know this rule and we all intend to follow it but it can be difficult to do so consistently in an agile environment with constant changes and frequent deployments.  The usual methods for making sure queries are appropriately indexed:

 

1.  Review and analyze indexing schemes before major deployments.

2.  Monitoring performance after deployment, either formally with New Relic or informally when customers call to complain about response time.

Looking at this list we can see that the first method is BDUF and the second is releasing un-tested code.  Both of those are anathema to agile developers. 

So what should we be doing? 

We should have tests for our queries just like we have tests for our code.

Look at the code in http://gist.github.com/328822 for an example of query tests using MongoDB and the mongo gem in Ruby.  This code uses the explain method of MongoDB to assign a rating from 0 - 100 that indicates not only whether the query used an index but also how efficiently.  A little experimentation will show what your minimum rating for passing the test should be but 50 is a reasonable place to start.

Query tests like these will not be fast enough to include in your unit tests that run every time you save but would be ideal for the tests that run when code is checked in.  Wherever you run them they will save you from ever releasing un-indexed queries again.

 

 

 

Disambiguation

 

Let me start like Wikipedia with a disambiguation page.

 

Ginny Hendry is not a common name but there are two of us in the US with a web presence.

 

Me

ginnyhendry on Twitter

ghendry on GitHub

ginnyhendry on GMail

ghendry on LinkedIn

www.ghendry.net

and now blogging here at ghendry.posterous.com

 

The other Ginny Hendry

Ginny Hendry on Facebook

Classmates.com Barrington RI

blogging at heartofginny.com

 

About

A Chicago software developer writing about Ruby, web development, and other topics that interest me.