TalkTastic Insiders Icon

Tackling AI's Unsolved Problem: Validating LLM Output Accuracy

·
·

Update: I am working on one of the bigger unsolved problems in AI research, apparently. I was talking with a friend - an AI researcher and part-time Computer Science professor - who mentioned that one of the great unsolved problems in all of ML/AI is generating accurate outputs from LLMs. It's frontier research, he said. He's saying this in passing, while explaining a project that he's working on. In my head, I'm thinking - "Wait a sec, that's exactly what I am trying to do right now!" Step 1 to fix TalkTastic is to actually understand the codebase and map out how it all works. My idea is to compress the codebase for our macOS app by a factor of 20x (1.8M tokens → ~40k tokens) without any loss of accuracy, so then I can use Claude to reason over the whole thing and start shipping at warp speed. In order to do this, I need to create a highly-accurate intermediate codebase representation that's incredibly detailed yet compact enough to fit within Claude’s context window. Sure, AI can generate a summary of anything, but can you rely on that summary to be 100% accurate and not have missed anything? Nope. The core challenge is: When LLMs generate output, how do you validate accuracy? For me, manual validation of every claim isn't feasible. So apparently, to solve our little "how do I fix my busted codebase problem," I need to crack an unsolved problem in all of AI. Strangely exhilarating. Haven't proven it yet, but I think I'm onto something. Stay tuned.

  • Avatar of Bailey Cook
    Bailey Cook
    ·
    ·

    Reenforcing answers that work? Hopefully we can get a flywheel going one day where reasoning models can use syntax/compile errors as a reinforcement loop. I don’t understand why I don’t see more of it (must be hard).

  • Avatar of Jan
    Jan
    ·
    ·

    Two ideas for solutions: 1.) Pseudo code a la Sudolang. Pseudo code saves tokens compared to natural language prompts, and produces more predictable output that conforms to the data structures you supply. 2.) Referential transparency. All idempotent functions without side effects have referential transparency. So for certain programs, you'll be able to swap out the implementations for the results.

  • Avatar of Messina
    Messina
    ·
    ·

    Yeah, I was wondering if there ought to be a new method for "compiling to an LLM", where you write your intended outcome and in pseudocode, and then a MOE collaborates to create the most efficient algorithm to achieve the results, similar to this: https://minimaxir.com/2025/01/write-better-code/

  • Avatar of Matt Mireles
    Matt Mireles
    ·
    ·

    Update: After throwing A LOT of compute at the problem, my approach is starting to work!

  • Avatar of Matt Mireles
    Matt Mireles
    ·
    ·