Introducing Devin – The “First” AI Agent Software Engineer

Devin – The “First” AI Agent

Devin – The “First” AI Agent

Cognition AI just unveiled Devin the quote unquote first AI software engineer now if you’ve watched my Channel at all you know they’re not the first but still their demos were quite impressive but the amount of traction that their marketing got over the last few days that was the most impressive part of the entire thing so today I’m going to go over all of their demos I’m going to show you what’s unique about Devon and I’m going to tell you how they were able to achieve such an incredible launch
before we get get into everything else let me show you their launch video which was hosted by their CEO Scott woo and Scott Woo is very impressive and I’m going to show you why right after this video hey I’m Scott from cognition Ai and today I’m really excited to introduce you to Devon the first AI software engineer let me show you an example of Devon in action I’m going to ask Deon to Benchmark the performance of llama and a couple different API providers from now now on Devon is in the driver’s
seat first Devon makes a step-by-step plan of how to tackle the problem after that it builds a whole project using all the same tools that a human software engineer would use Devon has its own command line its own code editor and even its own browser in this case Devon decides to use the browser to pull up API documentation so that it can read up and learn how to plug into each of these apis here Devon runs into an unexpected error Devon actually decides to add a debugging print statement reruns the code with the
debugging print statement and then uses the error in the logs to figure out how to fix the bug finally Devon decides to build and deploy a website with full styling as the visualization you can see the website here all of this is possible today because of the advancement set we’ve made in both reasoning and long-term planning it’s a really hard problem and we’ve only just started but we’re super excited about the progress that we’ve made so far in the meantime if you’d like to try out Devon on your own real world tasks
send us a request below and we’d be happy to forward it to Devon all right so you saw the video but if you watched my video from yesterday going over Pythagoras it’s essentially the same thing definitely Devon has a very unique UI but in terms of capabilities I’ve seen GPT pilot also known as Pythagoras I’ve seen meteg GPT I’ve seen a a handful of other projects that do this or better now what is so unique about Devon well there’s a few reasons why they were able to get such an explosive launch Day first they
raised a bunch of money so they raised $21 million series A led by Founders fund Founders fund is an incredibly well-known Fund in Silicon Valley and so just having the backing of Founders fund having all of their investors tweet them out it helped get the ball rolling for their incredible reach that they were able to achieve this week and if you’re not familiar with Founders fund it was founded by Peter teal who is a very well-known libertarian billionaire founded Paypal went on to be the first check in Facebook close friends with
Mark Zuckerberg and Shan Parker was also a part of Founders fund so a very reputable firm backing Devin but that’s not it let me also show you a video that went absolutely viral about the CEO he is a very very sharp person to say the least watch this we’re going to move on to the last match up of the first round which has Victoria sha our seven seed against Scott woo from Louisiana our 10th seed and the question is is if the pattern shown continues what is the letter in the 2010th position Scott a a is the correct answer
and the next question is what is the value of 255 Scott 5,000 5,000 is the correct answer moving on to the third question of our matchup the digits one 2 3 four and five can be arranged to Scott 60 60 is the correct answer which means that Scott has won that matchup all right so shortly after the launch that video went viral everybody looked at it and said wow this guy is super smart I mean it’s obvious from that video he was a significant overachiever and now he’s building this company and of course people are going
to keep their eye on it in their blog post they describe Devon as the world’s first fully autonomous AI software engineer I don’t agree with that but that’s fine that’s all marketing play and this also helped them get the reach that they did just by saying they were the first even though we know they’re not so with our advances in long-term reasoning and planning Devon can plan and execute complex engineering tasks requiring thousands of decisions Devon can recall relevant context at every
step learn over time and fix mistakes this is all stuff that we’ve seen other AI coding assistants do but there is something that is truly unique about Devin and that is their UI they put everything into a single very pretty looking UI so they have common developer tools including a shell a code editor and a browser within a Sandbox compute environment so basically you have this single view that has all four things that you’re going to need as you’re coding with Devon and that was really impressive because typically you’re

Devin – The “First” AI Agent


switching back and forth between different tools working with an AI coding assistant okay so this is a very impressive demo and really all of their demos were quite impressive and I think that’s also part of the reason why they were able to make such a big splash with their launch let’s look at this first demo so Sarah pasts a blog post about something she saw where AI can essentially generate an image with hidden text and all she does is simply paste the blog post and say hey I saw this can you set this up and make it
happen so that’s really cool Devon scans the page scrapes the content figures out what it needs to do comes up with a plan and installs everything and then finally outputs what they need it also fixes errors which is great but not something that’s unique so she loads up the page and there it is Sarah and Devin so very very cool and I think what was truly unique about this was the fact that it was as simple as just dropping a blog post in and saying hey check this out read it and let me know but again this
is all powered by GPT 4 they don’t have a special model that they’re using whatever they’ve built on top of GPT 4 that’s very impressive but the underlying model is the same as any other AI project is using and in fact one limitation of Devon is that it’s not open source and you can’t plug in your own model you can’t run a local model you can’t run Claude you can’t run any other model except for the one that they’re choosing which may be a benefit or a drawback depending how you look at
it okay the next demo shows an engineer from Devon telling Devon to build him a personal website that runs Game of Life and if you’re not familiar game of life is just a simple algorithm that shows how life progresses using pixels and you know what it’ll speak for itself when you see it but he says this is for Devon so make it customized to him as well and and so let’s watch Devon build this out Devon follows up with clarifying questions which again is nothing special we saw that in our pythagora video
yesterday but once again I think what really sets Devon apart is the user interface and it is very nice it is very clean and so here it is here’s the game of life so there was created and he follows up with we needed to say Devon in very cool text in the middle of the screen actually never mind make it start with the words Devon and pixels they go back and forth and a really nice conversation window and then eventually Devon is able to Output The Game of Life starting with the characters Devon as the initial pixels and here on the right
side we can see why the UI is so nice we have the shell the browser the editor and the planner all in one window so you don’t have to switch around which is nice and then finally the code is output put on netlify and we have the demo that starts with the text Devon and as you move your mouse on the screen it launches New Life pixels and The Game of Life progresses now in this next demo going to show Devon actually finding a bug in code that the original engineer was not able to find now what I think is special about this is that they could

Devin: The World’s First AI Software Engineer


just paste a GitHub repository and Devin is going to go scan it and figure out what’s going on most AI coding assistants aren’t able to iterate on existing code bases now there’s a couple exceptions to that like ader which uses Universal C tags to do it this is a very hard problem and I wonder how Devon was able to solve it my guess is universal C tags all of the codes that we’re seeing all of the demos that we’re looking at are less than a few thousand lines of code so my guess is maybe they just
didn’t and they’re just fitting the whole code base inside of the context window of GPT 4 since it’s 128,000 tokens so here we go clone from repo explored the directory structure of CP book reviewed the code and it’s going to find a bug let’s watch the rest of the video now and then I told Deon what test case I wanted I just told Deon you know these are the inputs and then try checking for these conditions for me so Devon wrote the test without too much trouble so then I asked Deon to actually
expand the test a little bit this time after Devon ran the tests Devon actually found a uh test failure now you know if the code were correct there could be compile erors in the test but you know the tests seemed really pretty reasonable so there probably shouldn’t be a failure so Devon went and tried to debug the program for me so Deon here actually wrote uh actually added a print statement to debug the outputs uh and the uh inputs to the failing test reran the tests and actually found which case was wrong so after fixing this Devon
actually reran the tests and now uh now I can be confident that my code is correct and I have some tests to prove it thanks Devon okay the next video we’re going to watch is AI training AI which is very meta very cool I must admit so let’s take a look at this hey guys today I’m going to show you an AI training in AI so here we’re going to take the Cur repo which is a fine-tuning method for quantizing large language models we’re going to feed this repo to our agent Devon and all we have to ask Devon is to
fine-tune a 7B llama model so what we’re seeing here is can you find- tune 7B llama model using and then the Cur GitHub URL there should be instructions in the read me so I really think this is cool where you essentially just paste in a link say learn everything you can about what you find here and then we’re going to move on and actually do some coding so that’s something unique from Devon that I really haven’t seen elsewhere but it’s not something that is necessarily very hard to build all it is
is web scraping plus agents Devon clones the repo figures out how to run it using the readme sets up all of the requirements using pip okay so it’s downloading and installing all the dependencies which is really nice and again everything’s in one window which is also really nice and is able to start running the training job there are a few hiccups where Devon runs into some Cuda issues which is to be expected with open source repos but it’s not a problem Devon looks at the Nvidia environment and figures out how
to reinstall the packages to make it work after a few more runs figure out the correct model names Devon successfully gets the training run working here we see training proceeding smoothly loss is going down it’s outputting every step along the way so you can possibly always just go back to a previous step and there it is training the model AI training AI very cool conceptually in this next video we have an issue that was created on a repository and the issue is given to Devon to fix so it says write a fix for
this issue do not commit or push your changes that’s very interesting that they have to explicitly say Do not commit or push your changes which means Devon has access to commit and push changes so that is very cool all right I’ll take a look at the issue and work on a fix just to clarify you want me to prepare a fix but not commit or push changes to the repository correct correct so then it continues on reads the GitHub issue and then prepares a fix for it but does not actually commit or push it so very cool okay in this next
blog the cognition team shows Devin iterating on an existing large code base and it is true this is a large code base which surprised me because I didn’t think it was going to be able to really understand the context of a very large code base but I think this was just a really nice demo let me show you the video and then explain why hey I’m Neil and I want to show you an example of Deon our AI software engineer helping me fix a bug so I’ve been using this repo called simpai syai is an algebra system
written in Python and I noticed this issue where when you take the log of a fraction you get Zoo which is a type of infinity so that’s definitely wrong but instead of trying to Fig this out myself I just asked Devon to take a look Devon immediately jumps in sets up the go and is able to reproduce that same Zoo output Devon then figures out the right part of code and adds print statements um in order to figure out what the cause of this issue is and we can see here that the cause is that integer division leads to a zero and
then we take the log of zero So based on that Devon’s able to fix the issue in the Kill by replacing that integer Vision with true vision and then clings up the debug output and verifies that the results is what we want and then Devon even runs the test in the repo as well to make sure nothing else is broken so that was great um saved me a ton of time so thank you de okay now that you’ve seen the video I could be wrong but what I’m seeing is it identified a single file that it needed to fix this was not
across multiple files files this was not a file with multiple dependencies outside of that file so it was able to read all of the code add some tests to it figure out what’s wrong with it and then fix it but everything was encapsulated within this single file which was I don’t know maybe a thousand lines of code so still impressive but not the Breakthrough that I think that they were trying to convey now I want to be clear I’m not trying to poooo on Devin this is fantastic and I am a big believer that AI is going to take over a
lot of programming jobs and this is just another step towards that I think where I have the problem is the way that they position themselves as being the first AI software engineer which is clearly not the case because we have multiple other examples of companies that are doing just that including the video that I posted yesterday now for the coolest demo in my opinion Devon is actually able to make money and so Devon took a job on upwork and was able to actually take the requirements of the software job and actually complete them
successfully meaning it was going to earn money so I know a lot of you are probably looking at this and thinking oh my God upwork is done and maybe you’re right in the future so let’s take a look at that video now hey I’m Walden one of the developers here at cognition AI we were playing around with whether or not Devon could start a side hustle on upwork so here’s actual real job from upwork where the client wants to set up this computer vision model which actually looks quite interesting seems
very difficult to set up um I’m not sure how I would start doing this but you know you give the task to Devon and ask Devon to figure it out and things just kick off Devon immediately goes ahead and you can see it sort of starts setting up the repo it actually runs into some issues here with the versioning so if you watch how Devon deals with it Devon’s actually updating the code to make these things work he continues with this loading and importing packages you can see that actually download images from the
internet to run through the model but you can see here that there are actually some issues that come across however Devon knows how to handle these things Devon kind of pushes through and if you look closely Devon’s actually doing print line debugging here where Devon is adding these statements to track where the data flows and Devon continues to do this until Devon understands how everything’s working and actually then updates the code with the fixes after removing print line statements Deon continues this pattern of fixing code
and running it again until it runs the image model across all these roads across the world and we can ask for a report from Devon at which point Devon sends over some sample images of roads with damage marked out and a nice txt file explaining Devon’s work and the different kinds of outputs of the model good job Devon all right so everything is really cool I’m going to give them props the demos are awesome the interface is great and let’s look at their performance we evaluated Devon on swe bench which is a software
engineering Benchmark Devon correctly resolves about 14% of the issues and to end whereas previous state-ofthe-art is about 2% so let’s look at the examples now Devon is comparing itself against other large language models which is not accurate and I’m going to tell you why so Devon 14% Claude 2 4.
8% all the way down to GPT 4 1.74% which okay aside from this graph excluding Devon being probably wrong let’s take a look at what’s actually being compared because it’s not Apples to Apples Devon is an agent meaning it probably has multiple agents working together in unison to accomplish different tasks it’s able to iterate on code multiple times until it gets the right answer everything else is essentially zero shot one go so that just means no examples they just copy and paste something and say fix this or
do this build this and so it’s not truly an Apples to Apples comparison what I would have liked to see is Devon versus Pythagoras GPT pilot meteg GPT super AGI crew AI all of the other agent platforms out there that’s what I want to see and that would have made a very compelling comparison but look the team is amazing what they built so far is amazing it’s very very cool I’m excited to see how they progress it’s not open source which of course I would much rather prefer it being open source I would much rather
prefer me being able to plug in my own models into Devon but that aside very cool congrats on the launch if you.

Devin – The “First” AI Agent

Leave a Reply

Your email address will not be published. Required fields are marked *