At gap intelligence we work really hard and pride ourselves in making sure we have Great Effin' Data (GFD). And to get that GFD, we have to constantly update our data; we sometimes even have to delete data. This wouldn't be a problem if we didn't have distributed systems that are constantly syncing with our GFD via our internal API.

What this means is that we need to keep track of every record that gets created, updated and deleted. Keeping track of created and updated records is not a big deal. We can use a created/updated timestamp. Pretty easy.

Deleted records though, well that's a whole can of worms.

One solution to this problem is to implement what is called as "soft delete." This is the action of flagging records as deleted instead of actually deleting them. And there are a handful of good gems out there that make soft delete a fairly straightforward task. The issue is, our codebase is anything but straightforward.

We deal with a ton of data. Millions and millions of rows worth of data across a myriad of tables in our database. And we generate VERY complicated reports that are close to impossible to create with anything but raw SQL queries (the sql_query gem helps us do just that). Because of this, we can't just use any of the popular soft delete gems.

We could go through our code and make sure we add the deleted flag to every single table in every single query we write. But this would be an ongoing solution with which we would never really be done with. And every new report or query we write, we need to make sure we NEVER forget to include that one line that filters out deleted records. Yeah, not fun.

So taking a queue from Sean Huber's post Porting ActiveRecord "soft delete" behavior to Postgres, we decided to implement soft delete at the database level, while still working directly with ActiveRecord like business as usual.

Automate Migrations

The first thing we need to do is rename our original table and append with_deleted at the end. After that, we create a view that filters out deleted records, we create a trigger to make sure deleted records really just get their deleted_at flag updated, and we create any indexes we may need. And if you're doing this for quite a few models, you probably would like to automate this.

Fortunately, Rails is awesome and gives us the ability to create our own custom generators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
require 'rails/generators/migration'

class SoftDeleteGenerator < Rails::Generators::NamedBase
  include Rails::Generators::Migration

  def self.source_root
    @_soft_delete_source_root ||= File.expand_path('../templates', __FILE__)
  end

  def self.next_migration_number(path)
    next_migration_number = current_migration_number(path) + 1
    ActiveRecord::Migration.next_migration_number(next_migration_number)
  end

  def create_migration_files
    migration_template 'soft_delete_add_deleted_at.rb', "db/migrate/add_deleted_at_to_#{table_name}.rb"
    migration_template 'soft_delete_rename_table.rb', "db/migrate/rename_#{table_name}_table.rb"
    migration_template 'soft_delete_create_view.rb', "db/migrate/create_#{table_name}_view.rb"
    migration_template 'soft_delete_create_trigger.rb', "db/migrate/create_soft_delete_#{table_name}_trigger.rb"
    migration_template 'soft_delete_add_deleted_at_indexes.rb', "db/migrate/add_deleted_at_indexes_on_#{table_name}.rb"
  end
end

This uses migration templates like this one:

1
2
3
4
5
class AddDeletedAtTo<%=table_name.camelize.pluralize%> < ActiveRecord::Migration
  def change
    add_column :<%=table_name.pluralize%>, :deleted_at, :timestamp
  end
end

With this, we don't have to worry about generating all these migration files manually – huge time saver.

We can just call our new generator and have all the appropriate migrations built for us.

rails g soft_delete [MODEL_NAME]

Integrating with ActiveRecord

We then started writing custom WithDeleted classes for each model. We hoped to give ourselves the ability to query for deleted data with all our existing filters.

Say we wanted to query for all Pricing records from December 2015:

1
Pricing.where(date: '2015-12-01'..'2015-12-31')

And if we needed to include deleted records:

1
2
3
PricingWithDeleted.where(date: '2015-12-01'..'2015-12-31')
# or
Pricing.with_deleted.where(date: '2015-12-01'..'2015-12-31')

What's happening here is that the Pricing model queries our newly created database view called pricings, which is automatically filtering out all records marked as deleted. Our PricingWithDeleted model queries the original table that we renamed to pricings_with_deleted, which does not filter any records out.

This is the ultimate goal.

The problem we started facing was that we started writing a WithDeleted class for every model, and then couldn't figure out an elegant way to share code across them all. Every model has different relationships with different names. So instead of writing a WithDeleted class for every model in our app, we broke out the good old Ruby meta-programming.

* Disclaimer: please don't judge my entire life's work by what you are about to read.

Down the Rabbit Hole We Go

Take a look at the final version of our dynamic class code, and then I'll try to justify why we went this route.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def Object.const_missing(name)
  return super unless name.to_s.end_with?('WithDeleted') && name =~ /.WithDeleted/

  class_name = name.to_s

  @looked_for ||= {}

  return if @looked_for[class_name]
  return Object.const_get(class_name) if Object.const_defined?(class_name)

  @looked_for[class_name] = 1

  Object.const_set(class_name, new_deleted_class(class_name))
end

private

def association_with_deleted_name(association)
  "#{association.name}_with_deleted"
end

def build_collection_association(association, klass)
  if association.class == ActiveRecord::Reflection::HasAndBelongsToManyReflection
    generate_has_and_belongs_to_many(association, klass)
  elsif association.class == ActiveRecord::Reflection::ThroughReflection
    generate_has_many_through(association, klass)
  else
    generate_has_many(association, klass)
  end
end

def configure_associations(klass)
  dependent_associations(klass).each do |association|
    next unless deleted_table_exists?(association)

    if association.collection?
      build_collection_association(association, klass)
    elsif association.belongs_to?
      generate_belongs_to(association, klass)
    elsif association.has_one?
      generate_has_one(association, klass)
    end

    define_method(association.name) do
      if association.collection?
        send(association_with_deleted_name(association))
      else
        send(association_with_deleted_name(association)).becomes("#{association.class_name}WithDeleted".constantize)
      end
    end
  end
end

def configure_sti(klass)
  if klass.superclass.superclass == ActiveRecord::Base
    klass.inheritance_column = nil
  else
    klass.define_singleton_method('sti_name') do
      klass.name.chomp('WithDeleted')
    end
  end
end

def deleted_table_exists?(association)
  association_table_name = association.name.to_s.singularize.camelize.constantize.table_name
  ActiveRecord::Base.connection.table_exists?("#{association_table_name}_with_deleted")
end

def dependent_associations(klass)
  klass.reflect_on_all_associations
end

def generate_belongs_to(association, klass)
  klass.belongs_to association_with_deleted_name(association).to_sym,
                   foreign_key: association.foreign_key,
                   class_name: "#{association.name.to_s.singularize.camelize}WithDeleted"
end

def generate_has_and_belongs_to_many(association, klass)
  klass.has_and_belongs_to_many association_with_deleted_name(association).to_sym,
                                class_name: "#{association.name.to_s.singularize.camelize}WithDeleted",
                                join_table: "#{association.join_table}_with_deleted",
                                association_foreign_key: "#{association.name.to_s.singularize}_id",
                                foreign_key: association.foreign_key
end

def generate_has_one(association, klass)
  klass.has_one association_with_deleted_name(association).to_sym,
                foreign_key: association.foreign_key,
                class_name: "#{association.name.to_s.singularize.camelize}WithDeleted"
end

def generate_has_many(association, klass)
  klass.has_many association_with_deleted_name(association).to_sym,
                 foreign_key: association.foreign_key,
                 class_name: "#{association.name.to_s.singularize.camelize}WithDeleted"
end

def generate_has_many_through(association, klass)
  if association.through_reflection.klass.respond_to?(:with_deleted) && association.source_reflection.klass.respond_to?(:with_deleted)
    klass.has_many association_with_deleted_name(association).to_sym,
                   association.scope,
                   through: "#{association.options[:through]}_with_deleted".to_sym,
                   source: "#{association.name.to_s.singularize}_with_deleted".to_sym,
                   foreign_key: association.foreign_key,
                   join_table: association.join_table,
                   class_name: "#{association.name.to_s.singularize.camelize}WithDeleted"
  else
    klass.has_many association_with_deleted_name(association).to_sym,
                   association.scope,
                   through: association.through_reflection.name,
                   source: association.source_reflection.name,
                   foreign_key: association.foreign_key,
                   join_table: association.join_table,
                   class_name: association.source_reflection.klass.to_s
  end
end

def new_deleted_class(class_name)
  superclass = class_name.chomp('WithDeleted').constantize

  Class.new(superclass) do
    self.table_name = "#{superclass.table_name}_with_deleted"

    def self.name
      "#{superclass.name}WithDeleted"
    end

    configure_sti(self)
    configure_associations(self)
  end
end

Ok, so what is going on here?

First, we figure out if the class name ends in WithDeleted. If it doesn't, then carry on as usual.

1
return super unless name.to_s.end_with?('WithDeleted') && name =~ /.WithDeleted/

We then have to keep track of all the classes we have dynamically defined. This is to make sure we aren't already in the middle of defining this very same class, i.e., infinite loop.

1
2
3
4
class_name = name.to_s
@looked_for ||= {}

return if @looked_for[class_name]

If the class was already defined, then just return that definition.

1
return Object.const_get(class_name) if Object.const_defined?(class_name)

We keep track of every dynamic class we have defined.

1
@looked_for[class_name] = 1

If the class hasn't been defined, define it.

1
Object.const_set(class_name, new_deleted_class(class_name))

Dynamic WithDeleted class definition

We get the original class name by chomping off WithDeleted from the end, and we use this class as the superclass for our new WithDeleted class.

1
2
3
def new_deleted_class(class_name)
  superclass = class_name.chomp('WithDeleted').constantize
  Class.new(superclass) do

The table_name is then set to the newly renamed _with_deleted database table.

1
self.table_name = "#{superclass.table_name}_with_deleted"

And finally we configure associations.

1
2
3
4
5
6
7
def self.name
      "#{superclass.name}WithDeleted"
    end
    configure_sti(self)
    configure_associations(self)
  end
end

Associations were a huge pain to figure out and since this blog post is longer than anticipated, I'll refrain from going crazy trying to do so. Maybe I'll leave this for a future blog post.

The one thing to notice from these association definitions is that we are overriding every single association that has a corresponding with_deleted database table. If the table exists, we create a new association with with_deleted appended at the end, and then override the original association by making it call this newly defined with_deleted association.

The reason behind overriding the original association is that if you have a deleted record loaded, and you try to get information from one of its associations, you need to make sure that the with_deleted association record was loaded. We wanted to make sure our code pretty much worked normally, so it just made sense to have the original association forward to the correspondingwith_deleted association.

Conclusion

I won't lie, there was a lot of trial and error with this implementation. And even after writing this blog post, I'm not 100% convinced this is the best option out there. In fact, there are still ongoing discussions on this implementation and if it would have made sense or have been easier to just implement one of the many soft delete gems out there – and make sure we never forget to filter out deleted records.

And there are some gotchas to take into account, like not being able to chain the with_deleted scope at the end of other chained scopes. But these are concessions we were willing to make with this implementation.

Even though this may not be the most elegant solution out there, or even be a solution that works for most other companies, I do strongly believe it is the best option for us right now. It helps us soft delete records, while stil being able to query deleted records and make sure we notify our other applications accordingly. We have had this implementation working in production for some time now and aside from some of those little gotchas we have to keep reminding ourselves about, it seems to be working rather well.

I'd love to hear what people think of this approach, and if anyone has questions, comments and/or recommendations on how to make it better.

web development graphic