Automating code optimization with LLMs featuring Mike Basios, CTO & Co-founder at TurinTech AI (Practical AI #237)

All Episodes

You might have heard a lot about code generation tools using AI, but could LLMs and generative AI make our existing code better? In this episode, we sit down with Mike from TurinTech to hear about practical code optimizations using AI “translation” of slow to fast code. We learn about their process for accomplishing this task along with impressive results when automated code optimization is run on existing open source projects.

Changelog++ members save 2 minutes on this episode because they made the ads disappear. Join!

45 minutes
Recorded Aug 23, 2023
Published Aug 29, 2023
Download (43MB)
Transcript
🎧 32,984

Featuring

Mike Basios – Twitter, LinkedIn
Chris Benson – Twitter, GitHub, LinkedIn, Website
Daniel Whitenack – Twitter, GitHub, Website

Sponsors

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Typesense – Lightning fast, globally distributed Search-as-a-Service that runs in memory. You literally can’t get any faster!

Changelog News – A podcast+newsletter combo that’s brief, entertaining & always on-point. Subscribe today.

Notes & Links

📝 Edit Notes

TurinTech AI

Chapters

Chapter Number	Chapter Start Time	Chapter Title
1	00:07	Welcome to Practical AI
2	00:43	Code optimizing with Mike Basios
3	03:19	Solving code
4	07:24	The AI code ecosystem
5	10:41	Other targets
6	12:58	AI rephrasing?
7	15:28	Sponsor: Changelog News
8	16:40	State of current models
9	20:31	Improvements to devs
10	22:31	Managing your AI intern
11	25:09	Custom LLM models
12	29:49	Biggest challenges
13	33:19	Hallucination & optimization
14	35:42	Test chaining?
15	39:09	LLM workflow
16	41:25	Most exciting developments
17	43:40	Looking forward to faster code
18	44:14	Outro

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another episode of Practical AI. This is Daniel Whitenack, I’m the founder of Prediction Guard, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?

Doing well today, Daniel. There’s so much going on these days in this industry in terms of AI… Just constantly learning new stuff and finding out who’s doing what.

Daniel Whitenack

Yeah, it’s almost like you need to optimize some things about your life to keep up. Would you say that’s accurate?

Yeah. Speaking of optimization, I think that’s a thread to pull right there.

Daniel Whitenack

Yeah, yeah… So speaking of today, we have with us Mike Basios, who is the CTO and co-founder at TurinTech AI. Welcome, Mike.

Hello, guys. Nice to meet you.

Daniel Whitenack

Yeah. Well, we alluded to optimization, and I know one of the things that TurinTech is working on is code optimization with AI… And maybe some people - actually, probably a lot of people listening to this podcast are familiar with certain developer tools that are AI-flavored, maybe GitHub Copilot or something like that, for generation. I’m wondering if you could take a moment before we dive into AI-driven code optimization, if you could just help set the stage for those that aren’t aware what do you mean when you say code optimization? Why is it useful? How has “code optimization” been part of the developer lifecycle for some time?

We are in an area nowadays that we see more and more applications consuming a lot of cloud resources, and a lot of resources, and everybody’s trying to optimize the performance of their code. Now, when we’re talking about code optimization, and in particular performance, typically people will like to optimize things like application being faster, maybe memory consumption… We know everybody complaining about Chrome using too much memory, for example, in the past, or CPU usage, which is connected very much with the energy that different software is using. If we talk about mobile phone applications there, we would all like to be more efficient and consume less energy. And practically, that’s the area that we have been focusing in my research, and our group, and our research, and in the company.

Daniel Whitenack

Maybe you could talk a little bit about some of the history of that research. I’m sure that that has, just like everything else, been impacted by this kind of latest wave of AI technologies and generative AI… But I know that the company and yourself have been involved in research prior to, and all during the development of these things… So could you give us a little bit of background on how you first started thinking about these problems, and how it kind of developed over time?

Yeah, code optimization is not a new thing. If you read research papers 20-30 years ago, everybody would like to optimize and make the code efficient. There’s a lot of tools, like profilers, that help developers find hotspots. The biggest problem in this area is, okay, we profile our code, but how we can automatically improve it, so we make it faster? Typically, this is a very, very manual process, and the majority of the people that work on these areas are super-specialized. And it’s more and more difficult nowadays to find people that know how to optimize the performance of their code, because the programming languages are becoming higher and higher level. Now people write more in languages like Python, JavaScript, TypeScript… So nowadays, the real companies that deal with code optimization are companies from the hardware space, like Intel, NVIDIA, and those, they write specialized software that takes advantage of their hardware, and they show the hardware outperforms, or technology companies that need the scalability. But the majority of developers will not necessarily bother about the performance of the code as an immediate first thing that they need to optimize.

And we have tried, and we have a platform that we helped practically engineers to automate this kind of process, to make it easier for developers to identify places in their code that are slow, and then to optimize it without necessarily having the knowledge that they need, but also do it automatically. Now, the history of code optimization, as I said, it was a very, very manual process at the very beginning; very few people know how to optimize code for specific hardware. In the past, I guess they had to read books and compiler options. Eventually, then you had better and better compilers, so you can use compilation option to optimize your code, and tune the options of the compilers etc. And then you have a lot of profiling tools that help developers also optimize. But still, all these processes most of the time have been manual or semi-manual. And that’s where we see the advances of AI helping this process.

[06:23] And to give you a bit of context how we started our startup - this started after we published the paper in 2018, I think. That was at the Foundation Software Engineering Conference, where we showed that we could automatically help developers choose better data structures. Taking a code, looking at data structures, and optimizing it by giving variations. For example, sometimes in languages like Java, for example, you don’t need to use an array list if you potentially can use a linked list in a scenario that may be better. So we try to do these small changes, and show that we have good performance impact. But the majority of people at that point were manually translating code; rule-based transformation, like regular expressions, “If I see this pattern, convert it to this pattern.” So that is how people have been doing those things with code refactoring tools until the LLM came into that discussion.

Daniel Whitenack

Speaking of that, kind of getting into this latest wave of maybe what I’ll just refer to as developer tools that are AI-related… Again, people might be familiar with code generation type of tools or explainers, or something like that… There have also been, I’ve seen kind of agentic type of tools, that will write a PR for you to do, like you say, “Hey, I want to do this thing”, and then there’s a PR generated… Here, given the focus on code optimization, could you draw out some of – just for people that are maybe kind of getting into this, could you draw out some of when they would want to use this kind of tool, versus some of these other maybe applications of generative AI to code, I guess?

Yeah, how does it fit into that ecosystem?

So the way we present the code optimization tool currently is part of your CI/CD process. So you make a pull request, then you will run some unit tests, you will have your integration testing, potentially you’ll have a security scanning tool like Snyk, or Checkmarx etc. and then the next step is, depending on where your application is deployed, we analyze your code and we tell you “Make those changes, because you can have this 20% improvement in CPU and execution time etc.” That is the way we present it currently as a CI/CD tool in the tool chain of the developer tools. However, if you think about the technology underneath, and all dev tools, the way I see it, are going to be using AI and they will take advantage of LLM-based solutions; from the moment you’re using LLM, if you generate code or you translate code, it’s a kind of the same kind of approach. It depends on the data that you will apply the LLM.

So code generation tools - practically, say for the PR example, the way those LLMs have been trained is that they see that there are some comments below the code, so you say “If I give you those comments, can you predict the code?” So in the code translation, where you don’t know about the speed of the code, you say “I have seen this C++ code, and the equivalent of Python code.” And LLMs do it. For example, Copilot or ChatGPT - they can translate code. If it’s perfect - probably not yet, but that is the fundamental technology.

[10:01] So code optimization is on the same set of tools, but it says “This is a slow code that I have seen. Now I have seen a variation of faster code, so I can recommend you this faster code.” But eventually, you can expose the LLM, like any other tool that is built on VS Code or LLM-based - you can expose it in an editor and then the developers from our LLM will get suggestion for faster code. That doesn’t mean that will be beautiful code necessarily; that it will be faster for the hardware that you need to run it.

I’m kind of curious, as we’re talking about speed… Are there any other dimensions that are relevant in there, that you guys are interested in, that are either adjacent to speed or contribute to speed, or any other characteristics that may not be directly speed-specific, but are things that you’re starting to target or expect to target?

We apply multi-objective optimization. So when you do those translations, you can have different objectives that you try to optimize. Speed is one factor, memory usage is another… CPU usage… Typically, there is a trade-off between speed and memory usage; if you have more memory, you’d like to use it to increase speed. So our tool allows you and gives you different suggestions for what you need. But we see users, for example, we see this paradigm to be used in other use cases. Users have told us, “I can improve the readability of my code as far as I know that I don’t impact performance”, for example. Because there are managers in teams that they have five projects, and they would like, let’s say, five projects with five teams with developers, and they would like to guarantee that the quality across is good.

So this AI approach, these tools, Artemis or other tools in the LLM space, will be able to help with those. But the biggest problem of any LLM-based tool currently - Chris, you mentioned you’re working for Lockheed Martin etc. - is that the code generated by any LLM is not guaranteed that it will work 100%. Also, you need existing tools to check if this code is secure, and also you do not know if it will break your code. So still in the early stages of incorporating those LLMs; that’s why it’s also easy for a developer to give it [unintelligible 00:12:37.11] right now. But I’m pretty sure companies that work in bug fixing, code security checks - they are using and they will be using more and more LLMs, because they can train all the data that they have access, so they can have a competitive advantage.

Daniel Whitenack

Would it be a good parallel just to kind of draw to people’s mind like maybe something they’ve seen before in other domains…? It almost sounds like a parallel to kind of a rephrasing type of prompt in an LLM, where you might say – I do this a lot with emails, or other things, like “Here’s my really bad email. Make it flow better and sound better”, or “Here’s my goofy email. Make it business professional”, or something like that. You drew the comparison to maybe machine translation, or something like that… Is rephrasing kind of a good way to think about this?

Let me give you a very simple example. Let’s say you have an essay that you need to write, and you have ten paragraphs in that essay. And you would like to have a version of that essay that is much better, and you can get a better grade. Now, what we do is we will look at all the paragraphs and we will provide you better variations. So we will get you version one with three changes that are applied by different LLMs. Then you would need somebody to grade that essay. So in this case, we measure how fast the code is. So somebody says, “Okay, you have a 70%.” Then we take that output, and then we provide and give it back to the LLM and give you another version.

[14:25] So practically, you start from version zero of your essay, you apply the different LLMs, you get feedback, so it’s like reinforcement learning, live learning, and get different variations, and that score will be increasing eventually. And at the end, you have like a translated version of your original version, that the LLMs and all those [unintelligible 00:14:42.00] which is better on the metric that you have. But we in the platform test that the code passes, we compile it, and also we run the performance. In this scenario you would have a teacher that would grade.

And if you think how OpenAI and everybody has been doing that training, they usually get LLMs, and then they use reinforcement learning and RLHF and all those techniques. So we have done that in the code optimization setting. That’s why we have had some impressive results in taking an open source library and we just put in the tool and then suddenly optimized by 30% the execution time without us doing anything. The models learn themselves.

Daniel Whitenack

So Mike, you got into something that I’m super-interested in in terms of how you’re going about this problem, which you alluded to the fact that you’re using this sort of reinforcement learning loop or feedback loop to improve the performance of your tools. Given that you and your team have worked with sort of cogeneration or code-specific models for some time now, before we get into kind of some of the cool stuff that you’ve done specifically, could you just comment on kind of the state of code generation models that are out there, on maybe the open source side specifically, but if you want to highlight any closed source ones, that’s perfectly fine, too? …but from your perspective, how is that the ecosystem of code generation models changing and advancing, and what is the state of it sort of these days, I guess?

So I was one of the very first believers of LLM assistive code generation tools. I tried to get beta access to GitHub Copilot, even tried to use GPT-3 to see all this kind of technology… I’m a very big believer, because I have seen members of our team using LLM to build things much, much faster to code. For example, we had a backend engineer, we needed the prototype for a frontend, and he just used one of the closed source LLMs, and in one day he did a new UI that he didn’t even know the language; he didn’t know TypeScript.

[18:19] So I see that they are very, very promising. I see more usage right now for good developers that already know the basic of computer science. I’m not a believer of “Hey, you don’t need to code. You will use this LLM for you.” Some videos potentially are over-promoting that. But if you’re a good enough developer, you know what you need, and you use it, it can practically help you dramatically to build easy applications, to generate tests, to generate comments about your code… And of course, the performance - from our experience, I believe GitHub Copilot and ChatGPT are still outperforming the other models, but we see more and more open source models starting to become very, very good into the different languages. We have been trying LLaMA 2, we have been trying code gen, we have been trying all of this, and we even expose them to our platform, so people can compare the results of those.

Those tools will need to become a bit easier for developers and VS Code tools to use, because by default, this API exposure that some of the closed source models are giving is solving a lot of headaches for a lot of developers. That’s why a lot of developers still have preference for this. But definitely open source models, they are very good. And I see them becoming even better if they are fine-tuned on a specific language, or a specific context. I’ll give you an example - if you want to do translation from SQL to an SQL – let’s say we want to optimize SQL queries that people are doing in their databases, you can take one of the open source models and fine-tune it on SQL, and you probably will outperform GPT-3.5, or 4, on the context that you have. But I’m truly a very big believer. It’s also a bit of psychology between developers [unintelligible 00:20:20.27] but I am a believer that people that will use them, they will have an advantage over people that don’t use those tools currently.

So that raises an interesting point… It’s a little bit of a tangent, but you’ve kind of inferred it… This is kind of changing the way we humans are coding. I know you also talk about whether LLMs would just write the code for us, and the overhype certainly today about that… But it’s kind of changing the way that we code as humans, and it’s extending our capabilities dramatically in terms of being able to reach beyond what we might have been able to do two years ago, for instance. Do you see that accelerating? When you’re looking at the fact that if you looked at a traditional coding team a few years ago as you’re producing your product for folks, and you’re starting to recognize that individual coders are starting to elbow their way out of their traditional swim lanes with these new tooling capabilities that you’re providing, how does that change things? How does the market change that you’re looking at going forward?

I believe dramatically. And we have tested this with our internal developers. I have seen dramatic improvement in the productivity of developers without access to LLMs, and developers that have – like, there are two things. One is you can make more efficient developers, that everybody would like to hear, but from a more senior-level manager level, you do not need, unfortunately, as many developers as you would need before. You cannot avoid this. And developers should think for me like “Okay, I am competing with some other developer in my team to produce, let’s say, an API, or produce a UI.” If that guy has access to Copilot, or ChatGPT, and the other person doesn’t have, I guarantee you, with very, very big chances, the guy that has access to ChatGPT will outperform and have faster results. It is like you have a very, very good assistant next to you, that you should use. Otherwise, you’re losing. That’s how I see it.

[22:31] Yeah. As a two-second follow-up to that point you’ve just made… When we start pervasively – because I think I’ve seen some stuff recently over the last month or two, that the majority of active developers out there are now using LLMs. So there’s been really rapid adoption here within the developer community. So we’ve kind of moved very quickly from those who had, versus those who didn’t have, into a world where everybody has. They may not all be using exactly the same models as things progress, but everyone has it. Any thoughts about kind of what that means? It’s like, you’re democratizing LLMs across the population of developers, and now they’re competing. So it’s kind of like me and my LLMs are competing against you and your LLMs as a developer. Any thoughts, waxing poetic a little bit, about what the implications there are?

Yeah, I mean, it’s a weird world that we – I don’t know anymore. I code less and less. But I now can code, again, because – for example, we have a data scientist on the team, and he says “I don’t feel like I’m a coder anymore. I’m just a manager, or a user of this LLM. I validate the output.” And yeah, it’s okay. And it’s a bit ridiculous, he says, because even simple things like copy-paste, “I will not bother. I’ll just say, Okay, can you refactor this?” It definitely has changed. And okay, there may be implications about creativity of people, and if you go into the AI/LLM space, for example for images, when those models generate images, somebody, a painter may say to you “Hey, you may lose creativity, because you generate always the same thing…” I don’t have answers for those things, and I don’t think a lot of people have answers. We’ll just see how things go, sincerely. [laughs] But yeah, it’s interesting, right?

Daniel Whitenack

Yeah, this idea of being a manager of your assistants I think is really helpful. I forget who it was we had on the show, Chris, but they were saying, “Hey, if you think about this thing just like a high school intern, or something…” Like, is a high school intern going to solve all of your problems? No. But if they work all day on your problem, or let’s say you have an infinite number of those high school interns that just can do work all the time… Is that useful? Certainly. There’s a management aspect to that, right? Probably more so with high school interns than with LLMs, I’m not sure. But yeah, I love that metaphor. That’s really good.

I’m also wondering… So this particular application of AI within someone’s codebase, I think similar to what we’ve seen in other cases with Copilot and some of the things that have happened there, it’s a very sensitive area, particularly for enterprise business users. I think if you’re like an indie hacker, and like you say, you are wanting to create a TypeScript UI and you don’t know TypeScript - boom, you can get some really cool results really quickly. Of course, enterprise code is part of the IP of a company. There’s two aspects of that, one of which is the fact that companies have been hesitant or even sued others over usage of their code or data in ways that they didn’t expect… I think though the other aspect of this that you alluded to is it really is powerful when you start to bring your own data to the table, especially with these open models… Both because they have kind of privacy-conserving deployments, and there’s also like code preference things and other things that your company might have.

[26:19] So I’m wondering if you could speak to that a little bit and how you envision – you’re helping build a product that’s doing code optimizations for people, so how do you think about people creating customized models for their codebases, and the sort of proliferation of these customer-specific models, and the hosting of those, what goes through your mind when you’re thinking about those things?

That’s a very, very important topic, and we have quite a lot of experience with this. I will mention an example where we went to a client, very big technology firm, super-big, one of the best, and we said “Hey, this is our platform, we have these LLMs. You can use any LLM of your choice, like GPT-4, or open source LLMs like LLaMA 2, etc.” In the beginning, they said, “We don’t have any approval for OpenAI”, because first, they don’t know the IP issue, and second, they don’t want the code to go outside. We’re talking about proprietary code at a lot of such companies. So the solution there was, okay, you can use the custom open source LLM on your data; we do not see anything, so it’s a custom solution on premise. And while they’re using a product, then practically our platform allows them to generate their own training datasets.

So for example, they use Artemis, they optimize code, they see sometimes the code is optimized or is not optimized, but they generate the data. And those data we cannot see, and the client should have those data for fine-tuning their own model. Then, through the platform, they can further fine-tune their own model. That is how the industry will go, especially in financial sectors or technology sectors, or defense companies - they will never give their code outside on an open source level. But LLMs are super-powerful on that. One client, they said, “I’m not sure, if you give me a recommendation for an LLM, who is liable if that code doesn’t work, what is IP issue.” I said “How do you solve it now?” They have the IP checkers etc. I say “You can just use IP checker for this moment. The same code, the same process that you would do.” But also, LLMs - and we have added this functionality - can do very good similarity search. So if you have other codebases and similar functions in your codebase, you can fine-tune nicely, or you can look like the same way people are bidding chatbots on your documents, you can build your own chatbot on your code, so it’s practically a similarity search, and then we could recommend. We even identified three teams have implemented the same functionality a bit different. So you even save time.

So still, this technology, the underneath technology, if you know how to use it properly, you can take advantage of it. And the biggest example is Databricks about MosaicML. They acquired MosaicML, and that helps the organization fine-tune their own LLMs of their data. Because no organization will give that data to OpenAI or any other company to fine-tune on top of them, especially big organization that there is value in that data. So the same thing applies for coding. That’s how we see, and that’s why we adopted the product accordingly.

Daniel Whitenack

[29:49] My next question is, I guess, sort of selfish, and I like to ask this of people that have really built impressive things with this kind of new reasoning layer of LLMs… I’m wondering, as you look back on building this product for code optimization with LLMs, are there any challenges that were unexpected that you had to overcome? And are there any sort of takeaways that you would give to practitioners that are maybe working on their own products or integrations with LLMs? What has been important for you to stress? …especially as a CTO and bringing new people into the team, as you’re working with these types of models, what’s important in your mind, and any of those challenges that have come up, anything you’d like to highlight?

If you are building applications and your application depends on an LLM output, then you need to make sure that that – as a first stage, I would recommend you use something like a closed source API usage, because it will solve you the headache of deploying your own LLM, having good GPUs… That is a problem that you cannot scale. At this moment, most teams don’t know how to do it. And of course, there are a lot of startups, a lot of companies that are working on this, on how to build your own LLM in a scalable way. But that can be a nightmare to build. So if your business is not how to deploy an LLM and the value is somewhere else, if somebody provides it as a service, it makes sense to use it.

So in our side, we say you can import with an API key and the secret key any LLM that you have access, then your application becomes much easier, but then you have the problem that you need to solve [unintelligible 00:31:42.02] is used on OpenAI because there is data that goes there. This in the financial sector currently cannot be accepted.

Then our product will have to deploy LLaMA 2. It will have to deploy it. So then you have to build it yourself or use a service. So we had to build it, because we were one of the early adopters… But there are tools. Hugging Face provides a very nice API for you to deploy. Now they changed the license, I think, but I think the majority of people can use it.

So the speed of LLMs is a big problem for scaling the application, definitely. Then other issues about LLMs, which were bit in the beginning, and now they’re trying to be fixed, is token size. Every time you ask and the result may be incomplete, if there is – then how do you deal with the previous context, and all this… You need to spend time with it to do it properly. I am expecting more and more tools and open source are solving, like Langchain and those tools also open source are doing some of the things.

And of course, the biggest problem that a lot of people talk is about hallucination of the models. You cannot trust necessarily the models, and you cannot just, say, generate code and execute that code in your backend, because somebody may do an SQL injection. Similar to where people were doing SQL injections in the past, especially for coding, you can have an LLM kind of injection. So you need to be very careful on exposing the prompt to the end user, because somebody can really damage. So… Those happen.

When you’re thinking about hallucinations from LLMs, and you’re working on a problem like optimization, and stuff, and you acknowledged earlier some of the problems you face is, one of them, you may not get the right code or compilable code, because it’s the output of an LLM… How do you approach that specific problem? I was actually wondering that earlier, and the conversation continued on without it, but we kind of circled back around… How do you think about dealing with hallucination when you’re dealing with optimization and correctness, and improving in that way, balancing the two?

Very good question. So it depends on the programming language and the existing tools that you can also use. So if you go for a programming language like Haskell, and functional programming, in theory, you can have a bit more proofs about code before and code after, it works the same. NASA, for example, would want this proof. You cannot have the code. The second mechanism is ideally we would like applications to have unit tests and test all the scenarios. So when you do a change… Ideally. But not all codebases have that.

Daniel Whitenack

You mean not all codebases are fully covered? [laughter]

[34:25] Yeah… [laughs] Unfortunately, there are open source projects that unit tests don’t even pass. We take a codebase, [unintelligible 00:34:35.18] or whatever, and they don’t pass by default. So it’s still – this needs to improve. Hopefully, LLMs can improve it. The third time mechanism is we aim for minimum code changes with the biggest optimization, and we go gradually. For example, we try to say “First target data structure optimization”, that is one, two lines that you can check. Then go in one single for loops, double for loops. You go a bit gradually. And we currently make a pull request with the recommended changes, so we still want the developer to validate those changes… Because you cannot take that risk. And also, from a psychology perspective, if you have a tool there that you consider “This is my performance expert” and it tells you, at the end, “Hey, make those three changes” that you can verify, it’s not different from – if we change the name of the tool and put a developer name and make a pull request, that person will not know where this pull request came from, right? So you follow the same process. That’s how we see it.

Daniel Whitenack

Something that I’m kind of getting in what you’re saying as well, which I know is often a misconception that I run into when I’m either doing workshops, or working hands-on with people with generative models, is there’s typically this misconception that you need to package everything into a single prompt, and then output your final result as a sort of one-step thing. I’m getting the sense that your workflow, for one, it probably involves multiple calls throughout the codebase, because of the context size; I would assume it’s partly because of that. But then also, you mentioned this kind of iterative element, where - hey, there’s kind of big rocks that you can move, that are the sort of worst offending areas, so there’s hierarchy in that respect… Bt also, it seems like - let’s just assume; I know it’s not a good assumption, but if we assume that a person’s codebase is fully tested, integration tests, unit tests, it seems like this is something you could just loop over and over and over again to get increasing optimizations, probably with diminishing return. Could you speak a little bit to how you as a team think about that chaining element, I guess would be the way to say it? And then also, maybe iterative element.

First of all, the way we have presented this – let’s say you take the original version of a code; there are two approaches there. One, I apply one LLM and take the first three, four suggestions of this LLM, right? And then apply which one works. Even on the papers, on how good they are in the codebase, they will say the first top five recommendations from the LLM, three out of five outperform, something like this. So it’s not a one chance, apply the LLM etc.

Now, the best approach from what we have seen is you get the first version, you apply the first version, and if you have the ability to get feedback from what you applied, that is where those LLMs are very, very good. So for example, you say “Optimize this code”, you try to pass the unit tests, or you try to compile your code and you get the compilation error message. Then you go back automatically to the LLM and say “The recommendation you gave me didn’t pass because of this error.” And then they can give you better – it’s like the Wolverine technique, or I think somebody did a demo where they were showing how you can compile…

[38:20] Because this also – if you think how you’re using those LLMs, let’s say “Write my email”, they recommend, then you say “Sorry, this is too official. I wanted a bit more friendly.” This is, I believe, currently the best approach. So it’s like an iterative approach; if you have a way to measure and give the feedback back, you can have the best result. If we take the logs and give it to the LLM, we’ll have even better. But even if you say “Hey, sorry, this is not good. Give me something better”, it will again try to improve in the context.

You can even do things like – again, I was in a presentation that Langchain, and a similar one - you can combine two, three versions of different LLMs, and then the three base, you can have three, and then combine them, say “Take context from here”, “This work didn’t work…” So there are different approaches.

You just stole my question right out of my mouth. I was gonna ask [unintelligible 00:39:10.08] multiple LLMs and integrated, I was kind of wondering how you were thinking about that… Because you were addressing – like, when one LLM gave you multiple points back in terms of optimization, and you’re trying those out, and kind of how you might extend that to multiple LLMs, because we’re getting in this world with an increasing number; we’re going to just been awash in LLMs before long… And as you have so many APIs, or so many deployments available to you, how does that change? It sounds like your workflow would work for that regardless, but does that add value, or do you think there’s diminishing returns as you keep adding LLMs into it?

I think there is a value, and also it gives the flexibility to the user to always not be locked to a single LLM. For example, it may be different. It may be a pricing issue. It may be a new model this week can have bigger token size. It may be the performance of it. So you cannot rely on one single LLM. Nowadays, because it’s so easy to build LLMs if you have the data, our business doesn’t depend on “Hey, we have the best LLM”, and somebody else suddenly in a week can give you a better LLM and then you are out of business, because you need to spend 100 million to train on CPUs. So in our case, combining LLMs, using LLMs and having that workflow, LLM-agnostic, it’s a way that you can utilize. Now, if people pay for better LLMs and access, the end result is better. Here though we have to mention one issue, which a lot of people may not know… It is a bit tricky when you use output from one LLM, or the other LLM. So there are IP issues etc. So you cannot – we are also investigating exactly…

It’s a good point.

[40:55] Yeah, but you cannot use, in theory, ChatGPT output to fine-tune LLaMA in a commercial setting. The Alpaca paper, I think the first one, showed that you can say to ChatGPT “Give me examples”, then fine-tune another model, and then have it. But you are not allowed on the commercial side. But I’m sure there may be two open source LLMs eventually that you will use. As far as you have the framework, then we allow those things to happen.

Daniel Whitenack

As we kind of near the end of our conversation here, I’m wondering if you can paint a bit of a picture for us from your perspective as someone working day to day in developer tools that are AI-driven, what are some of the most exciting things that keep you up at night as you look forward to sort of the next year? It could be things you’re working on, but it might just be generally how this field is developing. What’s really exciting for you as a person building these sorts of tools as you look to the next year or so?

I personally believe that this is the start of the power of this technology. We already see how much it has changed the way people are coding. What I want to see and I see more and more things make it – like, people nowadays want to use talk, speech-to-text, and then code, these kinds of things. They are trying to make developers even lazier; you know, these kinds of things.

So from my perspective, from what we are trying to do, I want us to – we are in the process of looking at open source project going in through our platform and really get great optimizations that we can give to the community. So for example, if I can get a very slow machine learning library 30%, 40% all automatically, make a pull request, show to the people how easily we can optimize the speed, and everybody can benefit from that. And it excites me to know that we still haven’t found out what are the limitations of the current technology, and how much inefficient code is outside there. And I’m excited to find out how much we can improve in an automatic way… Like, I don’t know, Redis, these kinds of things that everybody’s using in their own day, because they can make our laptop faster. Already my fan is going crazy… [laughs] And yeah, this combination of LLMs and coding is something very, very exciting, because we don’t know the limitations of this. That’s what I want to find out.

Daniel Whitenack

That’s awesome. Well, we will certainly be on the edge of our seat as you’re exploring those limitations and those possibilities. I really appreciate you joining us, Mike; it’s been a great conversation, and I’m very much looking forward to my code running faster, despite my ignorance of how to make it do that. So thank you so much, and we’ll talk to you soon.

Thanks a lot.

Thank you. Thank you, guys.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art