Thinking About Model Extraction

Created 2024-02-04 / Edited 2024-03-23

Sharing some thoughts on my Recurse Center project, exploring how to bring Alloy or similar tools into the Day Job.

Robot opening a jar


I've been doing searches for academic papers or projects with keywords of Ruby and Alloy, as well as brushing up on my Alloy. The rough idea that I'm pursuing is to take a working codebase, extract a model, and then do stuff with it.

So... let's imagine...

$ cd codebase
codebase$ extract-model
codebase$ cat result.als
codebase$ alloy run result.als

Executing "Run Default for 4 but 4 int, 4 seq expect 1"
   Solver=minisat(jni) Bitwidth=4 MaxSeq=4 SkolemDepth=1 Symmetry=OFF Mode=batch
   2727 vars. 508 primary vars. 4414 clauses. 108ms.
   . found. Predicate is consistent, as expected. 14ms.

codebase$ █

... and already we have a problem. Because there is no such alloy cli interface. It has a lovely GUI, but no way to run something lke that (yet). It appears that you can reach in and run stuff in the java, write your own wrapper, interface with the java, etc. So already off to a yak-shaving start!

Still. The idea is you run this magical extract-model command on a codebase, I'm using Rails because it has a structure that suits this, and it spits out a model. The model should contain modely-stuff -- database tables, classes, API interactions maybe.

THEN what we'll do is turn the model BACK into some rspec.

THEN we'll add some further declarations and turn those back into rspec too.

Hmm. Maybe we can slurp in the existing rspec and try to turn that into some constraints or validations on the Alloy side.

And what can we learn or do with an Alloy model? I'm not sure yet! The rough theory is that we have this Simplified World we can work with, or reason about, and that it can reflect and guide the Real World. I kinda doubt that folks will want to muck about much in the Alloy directly (though you never know), so more like a weird intermediate representation for codebase analysis and transformation, basically.

I'm thinking the extract-model and might be some LLM magic, but also we can follow what they were trying with Rubicon, getting right into some Rails-specific tooling to help extract a model. While I know the most about Rails, I think it'd be very cool to run our mythical extract-model against other codebases. Still -- specialty knowledge and specialization on Rails is interesting.

All through it I keep thinking ... can I make this practical? Can I make this into a tool that I'd actually use in my day job? One of the Recurse center ideas is to work at the "edge" of your abilities and understanding; for me making something in this space that is both powerful and useful dances right on this edge.