Refining Language Models at Scale
Business enthusiasts open to fresh, unconventional perspectives on organizational structure, ideally would have an interest in understanding leadership and operational dynamics from a religious lens, and are open to the idea of drawing parallels between divine entities and corporate roles.

Picture this: I’m running a toy factory, Santa Claus style. I’ve got a team of elves crafting toys, each following the same blueprint, but, you know, they’re elves! Each one adds a personal touch, based on how good or bad the child has been, making every toy unique. It’s charming, right?
But here’s the thing: As Santa’s Operation Admin, I need to keep track of all these toys. In order to let the elves know whether they are doing a good or bad job.
This is pretty much what it’s like for us businesses using language models. Take ChatKJV, for instance, where we handle thousands of conversations. Each one is unique, despite having the same underlying instructions going into the language model. It’s just like those elves making different toys from the same blueprint, but for different kids.
In the world of artificial intelligence, large language models (LLMs) like GPT-4 have become increasingly popular due to their ability to understand and generate human-like text. However, as the scale of these models grows, so does the complexity of managing and refining their inputs and outputs. Every conversation is unique, with user inputs generating dynamic outputs. Essentially, each conversation becomes a new data point.
The challenge here is that the instructions given to the LLM remain constant, despite the vast variety in data points. This makes it difficult for businesses to optimize these instructions based on the data they are handling.
Businesses that rely on LLMs as AI Agents, i.e AutoGPT generate a lot of conversations, and today those conversations are not necessarily user responses, these conversations are operation instructions, application variables and more. We need to be able to see each conversation, understand it, and group it with similar conversations. This will help us to fine-tune the instructions we’re giving our language models and further improve them.
Sample Scenario
The below is a screenshot of prompts and responses exported from ChatKJV and tagged into an excel sheet.

Notice how we try to manually infer the type of input, assumptions about the user goals and tag the topic to categories we think are similar. Now at scale we cannot do this for conversations ranging in the hundreds of thousands because its just not right.
But then when fine-tuning our model, we need to understand the kind of inputs users supply, for example. someone is asking “is it okay to kill people” and on the other hand another person is asking “is God gay?”
By building a visibility system, Engineers can log these outputs and inputs, fine-tune and improve against them and gradually test newer models on the same inputs.
Possible execution
To tackle this issue, we need a system to help refine our language model inputs and outputs. Imagine a software that could collect, categorize, and analyze each output. It would find patterns, give us insights, and help us optimize our future instructions. Right now, we’re trying to sort through a mountain of LLM outputs without a plan. We’re missing valuable insights, and it’s taking us a lot of time.
We need a system that can help us to see each conversation, understand it, and group it with similar conversations. This will help us to fine-tune the instructions we’re giving our language models, and it will help us to be more efficient. One startup that is currently working on this is English to Bits which for now is specifically targeted at autonomous coding assistants. Like what Remix IDE did for solidity but even better.
I believe that this is an important problem to solve, and I’m excited to see what the future holds.
This article is a π± Seedling
More on Seedlings