fulltext wildcard searching with ruby/rails and sunspot

I love Sunspot for full-text searching in Rails apps, but it took me a while to figure out how to do left-bound wildcard searching in full-text indexed fields.

So – if we’re searching for “collis” in a set of fulltext indexed fields, in the default solr config supplied by sunspot you have to search for the entire word. To get “colli” or “coll” to return records with “collis” in the fulltext index, you just need to modify the solr config (in $RAILS_ROOT/solr/conf/schema.xml), changing:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

to:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

which essentially makes the full text tokenizer create left-bound n-grams for indexed terms. This taught me:

  1. Solr/lucene/sunspot rock, and
  2. I have more to learn about solr config because the schema.xml looks like it exposes some very powerful search juju.

Thanks to Arndt Lehmann’s tip on this page.

Figuring out what’s behind a listening process without a program / pid.

Say you’re being a good sysadmin and you’re checking out listeners on your machines:

root@deathstar:~# netstat -pant | grep LISTEN
tcp        0      0 127.0.0.1:873           0.0.0.0:*               LISTEN      3947/rsync      
tcp        0      0 0.0.0.0:8649            0.0.0.0:*               LISTEN      3826/gmond      
tcp        0      0 192.168.10.122:9102     0.0.0.0:*               LISTEN      4167/bacula-fd  
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      3229/portmap    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      3468/sshd       
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      3930/master     
tcp        0      0 0.0.0.0:44572           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:34271           0.0.0.0:*               LISTEN      3247/rpc.statd  
tcp6       0      0 :::22                   :::*                    LISTEN      3468/sshd       

lolwut is listening on 44572 ?

 lsof -i -n -P | grep 44572

returns nothing! IT MUST BE A ROOTKIT!!eleventy!!!

Wait – maybe portmapper has assigned it to a kernel-level server?

root@deathstar:~# pmap_dump
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  44915  status
    100024    1   tcp  34271  status
    100021    1   tcp  44572  nlockmgr
    100021    3   tcp  44572  nlockmgr
    100021    4   tcp  44572  nlockmgr

Whew- it’s nlockmgr, part of nfs file locking.

So the moral of the story – just because netstat can’t determine a program or pid doesn’t mean you’re in trouble. Check portmapper’s assignments via pmap_dump first, as it might be innocuous.

Apparently netstat can be made aware of portmapper assignments, but it doesn’t appear it is aware of kernel level services – or – it just doesn’t work for me.