At gap intelligence we work really hard and pride ourselves in making sure we have Great Effin' Data (GFD). And to get that GFD, we have to constantly update our data; we sometimes even have to delete data. This wouldn't be a problem if we didn't have distributed systems that are constantly syncing with our GFD via our internal API.
What this means is that we need to keep track of every record that gets created, updated and deleted. Keeping track of created and updated records is not a big deal. We can use a created/updated timestamp. Pretty easy.
Deleted records though, well that's a whole can of worms.
One solution to this problem is to implement what is called as "soft delete." This is the action of flagging records as deleted instead of actually deleting them. And there are a handful of good gems out there that make soft delete a fairly straightforward task. The issue is, our codebase is anything but straightforward.
We deal with a ton of data. Millions and millions of rows worth of data across a myriad of tables in our database. And we generate VERY complicated reports that are close to impossible to create with anything but raw SQL queries (the sql_query gem helps us do just that). Because of this, we can't just use any of the popular soft delete gems.
We could go through our code and make sure we add the deleted flag to every single table in every single query we write. But this would be an ongoing solution with which we would never really be done with. And every new report or query we write, we need to make sure we NEVER forget to include that one line that filters out deleted records. Yeah, not fun.
So taking a queue from Sean Huber's post Porting ActiveRecord "soft delete" behavior to Postgres, we decided to implement soft delete at the database level, while still working directly with ActiveRecord like business as usual.
Automate Migrations
The first thing we need to do is rename our original table and append with_deleted
at the end. After that, we create a view that filters out deleted records, we create a trigger to make sure deleted records really just get their deleted_at
flag updated, and we create any indexes we may need. And if you're doing this for quite a few models, you probably would like to automate this.
Fortunately, Rails is awesome and gives us the ability to create our own custom generators.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | require 'rails/generators/migration' class SoftDeleteGenerator < Rails::Generators::NamedBase include Rails::Generators::Migration def self.source_root @_soft_delete_source_root ||= File.expand_path('../templates', __FILE__) end def self.next_migration_number(path) next_migration_number = current_migration_number(path) + 1 ActiveRecord::Migration.next_migration_number(next_migration_number) end def create_migration_files migration_template 'soft_delete_add_deleted_at.rb', "db/migrate/add_deleted_at_to_#{table_name}.rb" migration_template 'soft_delete_rename_table.rb', "db/migrate/rename_#{table_name}_table.rb" migration_template 'soft_delete_create_view.rb', "db/migrate/create_#{table_name}_view.rb" migration_template 'soft_delete_create_trigger.rb', "db/migrate/create_soft_delete_#{table_name}_trigger.rb" migration_template 'soft_delete_add_deleted_at_indexes.rb', "db/migrate/add_deleted_at_indexes_on_#{table_name}.rb" end end |
This uses migration templates like this one:
1 2 3 4 5 | class AddDeletedAtTo<%=table_name.camelize.pluralize%> < ActiveRecord::Migration def change add_column :<%=table_name.pluralize%>, :deleted_at, :timestamp end end |
With this, we don't have to worry about generating all these migration files manually – huge time saver.
We can just call our new generator and have all the appropriate migrations built for us.
rails g soft_delete [MODEL_NAME]
Integrating with ActiveRecord
We then started writing custom WithDeleted
classes for each model. We hoped to give ourselves the ability to query for deleted data with all our existing filters.
Say we wanted to query for all Pricing records from December 2015:
1 | Pricing.where(date: '2015-12-01'..'2015-12-31') |
And if we needed to include deleted records:
1 2 3 | PricingWithDeleted.where(date: '2015-12-01'..'2015-12-31') # or Pricing.with_deleted.where(date: '2015-12-01'..'2015-12-31') |
What's happening here is that the Pricing
model queries our newly created database view called pricings
, which is automatically filtering out all records marked as deleted. Our PricingWithDeleted
model queries the original table that we renamed to pricings_with_deleted
, which does not filter any records out.
This is the ultimate goal.
The problem we started facing was that we started writing a WithDeleted
class for every model, and then couldn't figure out an elegant way to share code across them all. Every model has different relationships with different names. So instead of writing a WithDeleted
class for every model in our app, we broke out the good old Ruby meta-programming.
* Disclaimer: please don't judge my entire life's work by what you are about to read.
Down the Rabbit Hole We Go
Take a look at the final version of our dynamic class code, and then I'll try to justify why we went this route.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | def Object.const_missing(name) return super unless name.to_s.end_with?('WithDeleted') && name =~ /.WithDeleted/ class_name = name.to_s @looked_for ||= {} return if @looked_for[class_name] return Object.const_get(class_name) if Object.const_defined?(class_name) @looked_for[class_name] = 1 Object.const_set(class_name, new_deleted_class(class_name)) end private def association_with_deleted_name(association) "#{association.name}_with_deleted" end def build_collection_association(association, klass) if association.class == ActiveRecord::Reflection::HasAndBelongsToManyReflection generate_has_and_belongs_to_many(association, klass) elsif association.class == ActiveRecord::Reflection::ThroughReflection generate_has_many_through(association, klass) else generate_has_many(association, klass) end end def configure_associations(klass) dependent_associations(klass).each do |association| next unless deleted_table_exists?(association) if association.collection? build_collection_association(association, klass) elsif association.belongs_to? generate_belongs_to(association, klass) elsif association.has_one? generate_has_one(association, klass) end define_method(association.name) do if association.collection? send(association_with_deleted_name(association)) else send(association_with_deleted_name(association)).becomes("#{association.class_name}WithDeleted".constantize) end end end end def configure_sti(klass) if klass.superclass.superclass == ActiveRecord::Base klass.inheritance_column = nil else klass.define_singleton_method('sti_name') do klass.name.chomp('WithDeleted') end end end def deleted_table_exists?(association) association_table_name = association.name.to_s.singularize.camelize.constantize.table_name ActiveRecord::Base.connection.table_exists?("#{association_table_name}_with_deleted") end def dependent_associations(klass) klass.reflect_on_all_associations end def generate_belongs_to(association, klass) klass.belongs_to association_with_deleted_name(association).to_sym, foreign_key: association.foreign_key, class_name: "#{association.name.to_s.singularize.camelize}WithDeleted" end def generate_has_and_belongs_to_many(association, klass) klass.has_and_belongs_to_many association_with_deleted_name(association).to_sym, class_name: "#{association.name.to_s.singularize.camelize}WithDeleted", join_table: "#{association.join_table}_with_deleted", association_foreign_key: "#{association.name.to_s.singularize}_id", foreign_key: association.foreign_key end def generate_has_one(association, klass) klass.has_one association_with_deleted_name(association).to_sym, foreign_key: association.foreign_key, class_name: "#{association.name.to_s.singularize.camelize}WithDeleted" end def generate_has_many(association, klass) klass.has_many association_with_deleted_name(association).to_sym, foreign_key: association.foreign_key, class_name: "#{association.name.to_s.singularize.camelize}WithDeleted" end def generate_has_many_through(association, klass) if association.through_reflection.klass.respond_to?(:with_deleted) && association.source_reflection.klass.respond_to?(:with_deleted) klass.has_many association_with_deleted_name(association).to_sym, association.scope, through: "#{association.options[:through]}_with_deleted".to_sym, source: "#{association.name.to_s.singularize}_with_deleted".to_sym, foreign_key: association.foreign_key, join_table: association.join_table, class_name: "#{association.name.to_s.singularize.camelize}WithDeleted" else klass.has_many association_with_deleted_name(association).to_sym, association.scope, through: association.through_reflection.name, source: association.source_reflection.name, foreign_key: association.foreign_key, join_table: association.join_table, class_name: association.source_reflection.klass.to_s end end def new_deleted_class(class_name) superclass = class_name.chomp('WithDeleted').constantize Class.new(superclass) do self.table_name = "#{superclass.table_name}_with_deleted" def self.name "#{superclass.name}WithDeleted" end configure_sti(self) configure_associations(self) end end |
Ok, so what is going on here?
First, we figure out if the class name ends in WithDeleted
. If it doesn't, then carry on as usual.
1 | return super unless name.to_s.end_with?('WithDeleted') && name =~ /.WithDeleted/ |
We then have to keep track of all the classes we have dynamically defined. This is to make sure we aren't already in the middle of defining this very same class, i.e., infinite loop.
1 2 3 4 | class_name = name.to_s @looked_for ||= {} return if @looked_for[class_name] |
If the class was already defined, then just return that definition.
1 | return Object.const_get(class_name) if Object.const_defined?(class_name) |
We keep track of every dynamic class we have defined.
1 | @looked_for[class_name] = 1 |
If the class hasn't been defined, define it.
1 | Object.const_set(class_name, new_deleted_class(class_name)) |
Dynamic WithDeleted
class definition
We get the original class name by chomping off WithDeleted
from the end, and we use this class as the superclass for our new WithDeleted
class.
1 2 3 | def new_deleted_class(class_name) superclass = class_name.chomp('WithDeleted').constantize Class.new(superclass) do |
The table_name
is then set to the newly renamed _with_deleted
database table.
1 | self.table_name = "#{superclass.table_name}_with_deleted" |
And finally we configure associations.
1 2 3 4 5 6 7 | def self.name "#{superclass.name}WithDeleted" end configure_sti(self) configure_associations(self) end end |
Associations were a huge pain to figure out and since this blog post is longer than anticipated, I'll refrain from going crazy trying to do so. Maybe I'll leave this for a future blog post.
The one thing to notice from these association definitions is that we are overriding every single association that has a corresponding with_deleted
database table. If the table exists, we create a new association with with_deleted
appended at the end, and then override the original association by making it call this newly defined with_deleted
association.
The reason behind overriding the original association is that if you have a deleted record loaded, and you try to get information from one of its associations, you need to make sure that the with_deleted
association record was loaded. We wanted to make sure our code pretty much worked normally, so it just made sense to have the original association forward to the correspondingwith_deleted
association.
Conclusion
I won't lie, there was a lot of trial and error with this implementation. And even after writing this blog post, I'm not 100% convinced this is the best option out there. In fact, there are still ongoing discussions on this implementation and if it would have made sense or have been easier to just implement one of the many soft delete gems out there – and make sure we never forget to filter out deleted records.
And there are some gotchas to take into account, like not being able to chain the with_deleted
scope at the end of other chained scopes. But these are concessions we were willing to make with this implementation.
Even though this may not be the most elegant solution out there, or even be a solution that works for most other companies, I do strongly believe it is the best option for us right now. It helps us soft delete records, while stil being able to query deleted records and make sure we notify our other applications accordingly. We have had this implementation working in production for some time now and aside from some of those little gotchas we have to keep reminding ourselves about, it seems to be working rather well.
I'd love to hear what people think of this approach, and if anyone has questions, comments and/or recommendations on how to make it better.