indexer.rb |
|
|---|---|
require File.expand_path(File.dirname(__FILE__) + '/common_words.rb') |
|
Boring new error |
|
|
Just makes a new sublcass of RuntimeError so we can raise database errors later |
class AttrError < RuntimeError
end |
Indexing |
module Silver
class Index |
|
Create a new index for some given database or web service. Silver doesn’t do any connecting to a database, it requires you to pass a query to it as a block. Use whatever ORM you like best. key is the name this index will be stored with in Redis. time_field is the name of the field used for determining whether or not there are new entries. query is a block that takes the date of the most recently indexed item as a parameter. Query should always sort entries in descending order. Example to index all large pictures based on their captions:
|
def initialize(key,time_field,redis_options={},&query)
@r = Redis.new(redis_options)
@r.select 12
@key = key
@time_field = time_field
@query = query
@new_results = []
@indexed_items = []
end |
|
Looks for the most recent date indexed and only fetch entries newer than that date from the database. Takes a block that takes a results from the database and returns the value by which to index. accessor should be a block that takes an individual results and returns and array containing, first, the item’s id and, second, the value by which to index. Ex:
|
def find_and_update(&accessor)
last_date = @r.get("#{@key}:last") || "Jan. 1, 1970"
new_results = @query.call(DateTime.parse(last_date))
if new_results.empty?
false
else
new_date = new_results[0][@time_field].to_s
@r.set("#{@key}:last",new_date)
@new_results = new_results
parse(accessor)
true
end
end
private
|
|
Sends results off to a double metaphone method to allow for fuzzy searching. |
def parse(accessor)
@new_results.each do |result|
begin
value = accessor.call(result)
time = Time.parse(result[@time_field].to_s).to_i
raise AttrError, "Specified attribute not found in item #{value[0]}" if !value
morphed_words = morph(value[1])
morphed_words.each{|word| write(word,value[0],time)}
rescue AttrError => e
puts e
end
end
end
|
|
Takes a metaphoned string, the id of the row that contains it, and the time created/modified of that row and writes to the database to phonemes by which that row is now indexed. |
def write(word,id,time)
text = word[0]
phonemes = word[1].compact
phonemes.each do |phoneme|
@r.zadd "#{@key}:#{phoneme}", time, id
end
end |
|
Note: this method needs some work/ is not perfect. Takes a string, removes any file extension, splits it on non-alpha characters, throws out empty words words smaller than 5 characters that are not capitalized and all common words. Finally it converts each word to its metaphoned equivalents and returns an array of the original words in its metaphones. |
def morph(string)
string.gsub!(/\.[a-zA-Z]{1,4}$/,"")
words = string.split(/[^a-zA-Z]/).reject{|q| q == "" || (q.length < 5 && q.capitalize != q)} - COMMON_WORDS
morphed_words = words.map{|q| [q,Text::Metaphone.double_metaphone(q)]}
morphed_words
end
end
end |