Can LLMs model real-world systems in TLA+?

(sigops.org)

59 points | by mad 13 hours ago

8 comments

tmaly 4 hours ago
I remember NVIDA sponsored a TLA+ challenge last year https://foundation.tlapl.us/challenge/index.html
[-]
- uptodatenews 3 hours ago
  Whoa didn't even know cool
iFire 1 hour ago
I don't use tla+ to model real-world systems anymore, Claude is able to model systems in Lean 4 and the binary executable can handle real input or I can directly generate c / rust on proofs with numeric types that have ring structure (integers, rationals, bits).
https://github.com/lambdaclass/truth_research_zk
[-]
- dev_arvin2000 1 hour ago
  [flagged]
tombert 2 hours ago
Claude has certainly been getting better with TLA+. It's not perfect yet but for laughs I got it to model the rules of Monopoly last night [1]. I haven't done any exhaustive checking on it yet, but it certainly looks passable.
It is pretty impressive at how good it's gotten at this, in a relatively short amount of time no less. I still usually write my specs by hand, but who knows how much longer I'll be doing that.
[1] https://pdfhost.io/v/KU2j37YKrP_Monopoly
[-]
- ofrzeta 1 hour ago
  It looks quite complicated and I have no idea what it is doing. Obviously, since I don't know about TLA+. But what about someone who knows TLA+? It still seems hard to make sure it is valid. And it's just for a relatively simple game.
atomicnature 1 hour ago
Just a question to people who may know better than me about this.
I thought the whole point of trying to write out TLA+ is so that you get a better idea of what you want and put it into formal language?
I get that an LLM can assist/help with expressing what we want in formal language a bit, but if one automates all this there is no human intent/design anymore.
If the LLM generates both the design (TLA+) and writes an arbitrary program that satisfies said design -- what exactly have we proved?
What assurance do humans get since human doesn't know or cannot specify what they want.
[-]
- majormajor 30 minutes ago
  An LLM-generated TLA+ model can be verified for certain things in a way that LLM-generated code can't. It's infamously hard to exhaustively unit-test concurrency.
  Whether or not you're modeling the right things or verifying the right things, of course... that's always left as an exercise for the user. ;)
  (How to prove the implementation code is guaranteed to match the spec is a trick I haven't seen generalized yet, either, too.)
dgacmu 4 hours ago
This post reads like an accidental advertisement for approaches like Verus [1], which couple the implementation and verification so you can't end up with a model that diverges from the actual implementation. I'm personally much more optimistic about the verus approach, but I freely admit that's my builder bias speaking.
[1] https://github.com/verus-lang/verus
[-]
- dev_arvin2000 1 hour ago
  [dead]
pzoln 45 minutes ago
Sorry, must be a very naive question, but what if you give LLM just a source code (maybe even obfuscate the names like Raft and Etcd) and ask it to create a TLA+ spec of that?
asxndu 3 hours ago
[dead]
uptodatenews 4 hours ago
[dead]