feed-abstract gem updated to support twitter RSS and Atom

I updated my feed-abstract gem to support twitter RSS/Atom, in that it will automatically parse hashtags and turn them into RSS item subjects/categories. Huzzah! This is pretty fun, as it allows tweets to be aggregated into TagTeam seamlessly and they can be remixed, archived, and searched by tag.

You can get at twitter RSS/Atom via URLs like:

https://search.twitter.com/search.atom?q=url encoded hashtag

so:

https://search.twitter.com/search.atom?q=%23rails

I’m sure there are more search parameters available too. If you want RSS, just change the “.atom” to “.rss”.

TagTeam close to 1.0

A long, complicated project of mine (under the direction of Peter Suber and the auspices of the Harvard Library Lab) is nearing its release date – TagTeam (source, demo site).

TagTeam is an RSS/Atom/RDF aggregator that allows administrators to remix and republish feeds on multiple levels. It also allows for the filtering of tags – additions, substitutions, and removals in a flexible “tiered” filtering system.

It uses the feed-abstract gem I wrote to create a “common object graph” between the different feed formats – this has been a huge time saver and made feed parsing much more reliable.

fulltext wildcard searching with ruby/rails and sunspot

I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.

So – if we’re searching for “collis” in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get “colli” or “coll” to return records with “collis” in the fulltext index, you just need to modify the solr config (in $RAILS_ROOT/solr/conf/schema.xml), changing:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

to:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

which essentially makes the full text tokenizer create left-bound n-grams for indexed terms. This taught me:

  1. Solr/lucene/sunspot rock, and
  2. I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.

Thanks to Arndt Lehmann’s tip on this page.

mod_passenger error page includes external stylesheet.

No sir, I don’t like it. Not at all.

The default mod_passenger “this app wouldn’t start” page includes an external CSS file:

http://www.modrails.com/error_pages/1.0/error_page.css

which is odd, considering there’s a bunch of inline CSS. I guess it’s to include the images. . . but it also amounts to disclosing something unintentionally. Yuck.

Difference between “count” and “length” in Rails.

I made an interesting discovery the other day between how “count” and “length” function with an ActiveRecord model, one that can be exploited effectively to speed up an app, as long as you understand the consequences.

  • Invoking “count” on an array (or dependent relationship) will always hit the database, and do something similar to select count(id) as count_all from contacts every time you invoke it.
  • Invoking “length” on an array (or dependent relationship) will only hit the database once. If the relationship has already been populated (say by :include-ing the dependent objects in your Model.find method), the database won’t get hit at all.

So – if you’re OK not getting a count that’s 100% accurate at the time of method invocation, “length” will do the right thing and run the necessary SQL statement the first time it’s run. Otherwise, it’ll give you the array count, which may be out of sync with the database.

This can have significant performance benefits if you’re iterating through many records and emitting counts of dependent objects. :include-ing the dependent objects and using “length” decreased the SQL expense of a page view 40 fold in one case.

Once again, it’s good to know your tools.

Rails.vim

Frequently a tool crops up that makes me feel foolish for not using it – that tool today is Rails.vim. It rocks in so many ways I am seriously considering having Tim Pope’s babies – were such a thing possible.

  • Automatic method completion!
  • Automatic switching to fixtures, unit tests, and other context-sensitive actions!
  • Excellent syntax highlighting!
  • Hooks into your script/ directory!

And really too much stuff to articulate here. Just use it! Yay for IDEs that don’t slow you down – I’m looking at you, RadRails.

Getting it working under modern Ubuntus (ubunti?) is stupid easy (as a normal user):

 sudo aptitude install vim-rails
 vim-rails-setup

Rails 2.2, postgres, and testing

If you’re using Postgresql with Rails 2.2, your testing database user needs to be a “superuser” for your tests to run. This is not a good thing. It’s the equivalent of running as root. But, if you’ve got a completely separate development/testing server it’s better than not being able to test at all.

If your postgres testing user isn’t a superuser, you’ll get errors like:

ActiveRecord::StatementInvalid: PGError: ERROR:  permission \
denied: "RI_ConstraintTrigger_17866" is a system trigger

You can create a postgres superuser thusly:

> su - postgres
> psql template1
> template1=# create user  your username superuser;

Update the testing stanza of your config/database.yml and you should be good to go.