Wrap-up of my first week at thoughtbot

Random thoughts about my first week as a developer at thoughtbot:

  • I have much to learn about design patterns. The GoF book will be my close friend for a while.
  • I know more about metaprogramming than I thought, I just needed the terminology.
  • Pairing is an amazing way to transfer knowledge and should be done frequently. I learned a ton during and after because I took notes and then researched and practiced concepts new to me.
  • thoughtbot invests a huge amount of time (and therefore money) improving the skills of its developers. It shows.
  • osx is a fine desktop environment, but the amount of crap you need to wade through to set it up as a ruby development environment (as compared to debian / ubuntu) is kind of hilarious. That said, it’s working fine now and it wasn’t particularly hard, it just didn’t feel “native” like it does on linux.  I could take or leave osx, but it’d be hard to stop using a retina display – text is absolutely gorgeous.
  • I don’t know what I’d do without homebrew.
  • thoughtbot dotfiles are great, and the vim set up was very similar to my own but with significant additional refinements. Use them.

feed-abstract gem updated to support twitter RSS and Atom

I updated my feed-abstract gem to support twitter RSS/Atom, in that it will automatically parse hashtags and turn them into RSS item subjects/categories. Huzzah! This is pretty fun, as it allows tweets to be aggregated into TagTeam seamlessly and they can be remixed, archived, and searched by tag.

You can get at twitter RSS/Atom via URLs like:

https://search.twitter.com/search.atom?q=url encoded hashtag

so:

https://search.twitter.com/search.atom?q=%23rails

I’m sure there are more search parameters available too. If you want RSS, just change the “.atom” to “.rss”.

OpenWRT, rsync, and linux love

I use an rsync / hard link backup system of my own design (but similar in concept to this).  I have it providing 180 days of backups for numerous production machines spread around the internet, along with more permanent external backups provided by spideroak (referral link, but we both get free stuff if you sign up).  My internal backup system serves as my hot backups, so I want it available 24×7 via a remote file mount (sshfs) should I need it.

The machine I was running it on, though, was WAY overpowered and idles at around 70 watts – this was having a noticeable effect on our electricity bill.  So I put openwrt on my asus wl500gp v2 and have that now doing my backups. It’s silent, fanless, and combined with a good external USB drive has as much storage as you can afford. It also idles at 5 watts total, device and USB drive combined!

Notes:

  • The external drive is formatted for small files and inodes and a higher inode / block ratio. I just did “mkfs.ext -t small /dev/partition” from a full linux machine for the format – these options are a better fit the usage on an rsync / hard link backup system.
  • I installed the openwrt image with the 2.4 kernel because it seems it has better hardware support for this device – it works great.
  • I disabled the wlan and lan, leaving only the wan enabled with a static IP. I port forwarded an external port from my verizon router to allow ssh access from anywhere.
  • I had to install openssh via opkg because the dropbear ssh client doesn’t support outgoing key auth, or if it does it doesn’t support openssh-style keys.
  • I switched the default shell from ash to bash – just too many minor differences for me.
  • rsync is available via opkg. Install it.
  • cron is provided by busybox and has some minor differences in crontab syntax, I could not get @reboot jobs working.
  • USB storage is fairly easy to set up. I found, however, that the external device partitions were recognized at “/dev/discs/disc0/part1” instead of the more traditional “/dev/sda1” locations. No biggie, just odd.  You should read and implement the “start on boot” section.
  • You can see syslog output via the command “logread”
  • I needed to slow down the automount process via a “sleep” command  to allow the drive to spin up before mounting. Details here. Once I put in that delay, automount worked great.
  • “find” provided by busybox is way limited compared to gnu find, and I can’t seem to locate gnu find in the otherwise complete openwrt repos. Busybox find can’t search based on modification times nor link counts – both key to how I implemented my backup system. I reap backup directories via their modification times to expire old backups. I installed ruby and ruby-core (which contains the ruby stdlib that provides file / directory classes) and wrote my own little timed reaper. Source is below.

It’s working great so far – quiet, low-power and fast enough for me.

Stupidly simple timed directory reaper written in ruby

# expire_directories.rb. My backup directory names all look like "back-2012-01-27-04:44:05", 
# hence the regex along with the date check.
require 'find'
require 'fileutils'

days = ARGV[0]

Find.find('./') do |path|
  if FileTest.directory?(path)
    if path.scan('/').size == 1
      if path.match(/back/) && (File.stat(path).mtime < (Time.now - (60 * 60 * 24 * days.to_i)))
        puts "Removing: #{path}"
        FileUtils.rm_rf(path)
      end
    else
      Find.prune
    end
    #puts path
  end
end

invoked thusly:

cd /some/directory/that/contains/your/backup_directories && ruby ~/bin/expire_directories.rb 60

so pass it the number of days. Be sure you're in the proper directory before running this, it's doing an "rm -rf".

fulltext wildcard searching with ruby/rails and sunspot

I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.

So – if we’re searching for “collis” in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get “colli” or “coll” to return records with “collis” in the fulltext index, you just need to modify the solr config (in $RAILS_ROOT/solr/conf/schema.xml), changing:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

to:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

which essentially makes the full text tokenizer create left-bound n-grams for indexed terms. This taught me:

  1. Solr/lucene/sunspot rock, and
  2. I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.

Thanks to Arndt Lehmann’s tip on this page.