Non-Technical intro to generative AI how work

generative AI how work

generative AI how work

This is a non-technical intro to generative AI you’ll learn about the evolution of AI capabilities and analyzing the key technological breakthroughs that have enabled modern generative AI models to achieve remarkable performance you learn about the different levels of llm applications like Q&A systems chatbots rag Solutions and how large language models can be leveraged for Downstream natural language processing tasks and the development of intelligent AI agents you’ll also learn about the potential of
large language model operating systems Abdul created this [Music] course we call as generative AI is not too old in fact it has been just last couple of years so first let’s take a look at what has changed how have we reached here in this particular place and what kind of challenges lie ahead of us this is very important before you actually start using the tool in itself imagine this is like the manual that tells you what you should do with the tool and what the tool is not good for without this knowledge it’s particularly
very scary to have a tool which you believe could be great but maybe it is not I’m not saying that generative AI is bad but I’m just saying that you need to know how did we get here and that is the whole point of this particular section first of all why the name generative AI few years back whenever we said something as AI it was mostly a niche Within mission learning or deep learning for example if you take text you would have noticed that few years back we had something called ner that stands for named entity
recognition if you have got a bunch of text like this named entity recognition would help you find out the named entities like like Wall Street is a location $15 is price 2011 is a date Amarin Corp is an org Visa is an org so this was possible with NLP natural language processing using a very popular technique called Neer named entity recognition while today you can go ask a question and then it would give you an answer so even though previously you had text you were using AI or AI let’s say for fathers like EML and deep learning to
get something out of the text so analyze the text process the text find out something from the text but you’re not necessarily using it to create new text in itself and that is where the generative part comes in you are now generating text rather than just processing processing text let’s take another example the next one is few years back you were just trying to F figure out whether a given image is a cat or a dog this example can go multi-level this is a typical classification problem in machine learning it could be like cancer
not a cancer looking at the image you can say whether the patient has got pneumonia or not a pneumonia so it could be at any level but imagine you have got two sets of input or you’ve got like let’s say unlabeled input and you want to figure out whether the output is cat or not cat or cat or dog so it could be of any type so even in this case the AI or the Deep learning model was purely used to classify an input image but what has changed now now you can actually artificially generate the image of a cat
flying from the sky falling from the sky you can generate a cat as a president you can do basically anything that you want you can generate images as much as you want and that’s exactly why it is generative agent a new powerful class of large language models is making it possible for machines to write code write text draw something or create something with credible and sometimes superum results now let’s break this down first of all we say large language models but let’s let’s say large models
language is only one aspect of it we have got multimodel models these days that can do multimodality that can create text understand images generate audio like for example Google Gemini is a great example of a multimodal model in the open source world you have got something like lava which is a great example of a multimodal model so either way we have got a new class of large models really really large models and these models enable or let humans use these models use this AI systems to write English or multilingual text code which is a
computer program that successfully runs draw create images and like create bunch of other things like audio video 3D Point Cloud a lot of other these things the main thing here is that one when something like this happens the result is credible like you can see this and then believe it could have come from a human being and that is very important I mean you could have used AI to create something maybe like 5 years back I have created let’s say tweet BS before like long back before all these generative AI
using a technique called Mark of chain Mark of chain used to use the underlying patterns within the text and then try to create the next word very similarly like that Mark of chain is a very popular technique that uses States and transition to understand the underlying pattern from the previous pattern the previous state but those were not as good as what it is today even though we had those things those were not like actual human text but now we have models that can create text exactly like how Shakespeare would write a new book or a new play we
have models that can code like a proper programmer we have models that can create art like let’s say van go or some other famous artist so we have all these possibilities that are credible when you see it you have to take a second look just to believe that this is either a human or an AI and also it can make super human result what do I mean by superhuman result a human being would take x amount of time to create something you asked me to create a stream lit application I would take a given amount of time to create that
streamlit application given that I’m a human being I mean given that I am a human being I would need an x amount of time to create it or let’s say you are a human being and you need to create a computer program you have to take a certain amount of time you want to write a book you would take time you want to create an art you would take time but these models are scalable that means you can write a book in maybe one day you can create a powerful application like a desktop application maybe a python GUI
graphical user interface in just a couple of prompts and all these things are possible thanks to this large models that are mostly mostly mostly at this point Transformer based architectures I’m not going to get into the details but if you are attending a course about generative AI you should know that most of these models are based on something called Transformers which is a very popular architecture that was popularized or that was at least released by Google which uses a very important technique called attention so
the attention based Transformer model is at the core of all these things and now we have got language models that can write and code we have got like diffusion models that can create images we have got like multitude of other models in fact multimodel models where we have got text and images in the same space within the same model in itself and that is what makes generative AI quite interesting look at these images these are images created by AI I mean somehow you could say today that these are AI created images but if you had
shown the same images to Me 3 4 years back I wouldn’t have even guessed even like with enough clue that these are AI generated images because not even in my dream I’ve thought that AI could create something like this and these things get better and better every single day thanks to the powerful class of models that we have God and thanks to the research advancement that is happening every day and in in fact if you see AI today the AI is not affecting the blue colar workers I mean back in the day people used to talk about automation
people used to always say automation is going to take the jobs of let’s say Factory workers automation used to take the jobs of people who are working in factories and Manufacturing units it did in fact there are a lot of information about how Amazon has made most of its packaging and shipping and Logistics automated these things happened like Tesla’s Factory if you see there are a lot of robots in there so definitely there was a bit of Automation and robots taking the jobs of let’s say blue color
workers but as your Barber gone out of job as your hairstylist gone out of job not it but if you see the current the world of generative AI it primarily focuses on knowledge workers and creative workers knowledge workers and creative workers are mostly people like you and I who are part of this course we use our knowledge and create something and that’s how we get paid either we write English some other language we write code computer program we create something using an image or we create an image in itself we produce an audio or
video or we listen to an audio or video and produce something so some how if you see these knowledge workers or creative workers either their input or their output are one of these either text or code or image or video audio 3D or Point Cloud you can just go on and on now if you see the current state of generative AI models you can pretty much say that it’s four out of five like I could have given five out of five but I still want to say that it’s not still there so almost like four out of five you can say
that these large language models are pretty pretty pretty good at writing and not multilingual yet I mean there are multilingual models but you can pretty much tell the difference like if you for example if you ask a model to create something in my language which is South Indian language Tam you know that okay maybe this is not necessarily human because these models are really not that good yet the next one is code these models can do pretty much good code they can create create GUI applications but not to the text level if you can say.

generative AI how work

The
model is good four out of five in text the code part is let’s say three out of five then images the way the model understands images the way the model can create images creating images is really good but still there are certain aspects you can look at the eyeballs you can look at the fingers I mean fingers have almost got sorted out at this point but you can still look at things like that the skin tone lot of other things to tell that maybe this is an AI generated image the same thing goes with image understanding then you have got the
video and audio which maybe you know at this point is like one out of five cuz it’s still improving the video interlacing the change of uh the frames all frame transition it’s there is still an improvement then there are like other modalities that we don’t discussed about at all like 3D Nerf Point cloud and all these things exist but one thing that is very sure that if you are talking about generative eii you have to say that the particular set of people that it impacts whether it positively impacts or
negatively impacts as knowledge worker and creative worker previously it might have taken me let’s say 3 to 4 hours to create a YouTube thumbnail but now it takes much lesser time thanks to generative AI previously it might have taken me let’s say a lot more time to summarize a document but thanks to generative AI it takes me much lesser time now so it positively impacts and also negatively impacts primarily knowledge workers who have to use their brain and either take the output which is somewhat this so it has to use all of
these modalities like text code video image audio etc etc why is that now we have a huge flux in growth of generative VII I mean what has happened we already discussed about Transformers very briefly that the paper and the new neural network architecture Transformers gave way for all the models that we are using now or most of the models that we are using now but there is something else that we need to pay attention to now it is the time we have a lot of other other things combining together at the same time so if you see now.

Introduction to Generative AI for Non-tech Professionals

We have
got better models we have got different architecture which is what like I said Transformers architecture and in fact there are Transformer Alternatives that are coming up called like for example Mamba these are like State space models SSM that solves the problems Transformers face in terms of model scaling and time complexity we have got more compute computer has never been cheaper like this it is completely possible for you to rent a very huge amount of computer on AWS the accessibility is there compute in itself
is there NV has released almost like a super computer level GPU or accelerated Computing device that you or I can probably own maybe expensive but still somebody can own and we have got more amount of data historically and all the humans have been always creating data we create data volunteer we create data without being asked to create data you go to a supermarket you try to pick something there is a CCTV capturing your data there is video data you go to the same Supermarket you buy something now that is going to be part of a POS like
Point of Sales system that is a data you come out of the supermarket maybe you’re going to Tweet about it that is the data that you are generating maybe there is going to be an Instagram post that is a data that you generating maybe you have got a loyalty points card that is a data that you are generating maybe you’re going to return it that is a data that you are generating so there is a lot more amount of data from humans but also we have more sensors and other equipments that can collect data there
is data from uh let’s say electricity like sensors there is data from Air planes and there is data everywhere else the amount of images that we have digitized huge the amount of books that we have digitized huge the amount of unstructured information to structured information that we have moved huge so we have now tons and tons and tons of data and one thing that you can also see is that the models have become better with the model size also when you have got more data more compute the models have started
becoming better for example Palm which is a Google model has got 540 billion parameter not saying that you always need a lot of data for a good model that’s not the outcome that you should take but a lot of data will help you build a good model and finally at least for me personally one of the most important reason why things have gone is open source open research open models open techniques open tools few years back you did not have let’s say a place like hugging face where you can go share the model few years back people were not
putting out papers almost every single day on let’s say archive which they found out and few years back you did not have all these scripts that would make it easy easier for them to build fine-tuning Solutions and all these things are there today and like for the last couple of years at least and these are people who have relentlessly open sourced whatever they have created and that has almost led to a huge influx and revolution of new types of models new models new fine-tune models new techniques new data and lot of these
things exist so it’s better models more compute more data while all these being open source I mean compute of course it’s still not youve got like decentralized models like petals there are solutions but it is one of the place where it is not open yet but everything else do you have beta models open source models base models yes we have do you have more data of course we have got more data and those data with commercial license and without commercial license we have got and the bigger part is all these things are open source you can go
take any Model start building data sorry Start building models or you can take one of the models start fine tuning it you can do all these things even without the required computer even if you do not have a GPU you can go to Google collab for free and then use their GPU and then play with these models thanks to open source so all these things come together and then help you create something that did not exist before thanks to better models more data more compute and open source and finally if you have to look
at the generative AI landscape the generative AI landscape keeps on changing but you can kind of put it into multiple buckets if you look at the text primarily people use text for marketing content sales email CH support chat email support no taking General writing in professional world I’m not talking about you know kids using text for their homeworks in professional World code you can generate code you can generate documentation you can understand code today you can like literally highlight a particular part of
the code and gp4 GPT uh Vision sorry gp4 and other models can help you understand it you have got image generation for like let’s say advertisement voice synthesis video generation and you have got a bunch of other things you have got NPCs inside games that are AI generated You’ve Got Game assets that are AI generated you have got game scenarios that are AI generated and you have got multiple companies working on it one thing that you have to understand is there are multiple layers multiple different ways you can be part of this
one is the model layer you’re not talking about the data layer in itself when you go inside model layer you have data layer the other side is the appli application layer either you can build applications on top of these models or you can build the models or you can build data for the model and one of the most important thing these days people started wondering is how do you evaluate the models model eals model monitoring so there are lot of different ways you can be part of the generative AI landscape and when you look at the
companies you have lot of different companies you have got mid Journey that can create images youve got gup co-pilot that can code for you that can help you understand the code you have got tools like Jasper that helps you use for something like opena API and let’s you use it for a particular vertical like marketing and youve got a lot more other companies like coare creating models you have got hugging face helping you host and also create models you’ve got all these different kinds of companies all
these different kinds of domains and with a very nice match you can build things you can build things either in the model layer which is like data layer model monitoring open source tools or you can build in the application layer and when you build an application layer you can either go to a particular vertical you can say I want to build something in sales I want to build something in marketing maybe you want to build a chatbot for lawyers so you can go to a particular vertical or you can go to a particular function you can say
say okay uh I will take a particular business unit irrespective of what modality I’m using or you can go by modality you can say I’m going to just pick text and then build something in text I’m going to pick code and build something in code so there are a lot of different ways for you to approach this and if you generally talk about how good AI has become you can just tell you to write a poem write a story or write something and it is going to do it for you there are a lot of instances I have given such poems to human beings and it
has been quite difficult for them to know that it was AI written and they have been shocked to know that this has been AI written when somebody was not exposed to AI so one we have spoken all the things that people are talking about but there are certain things that people do not talk about often and those are also quite important for us to pay a slight ATT attention to one training data two hallucination three rules four copyrights not a lot of time we have training data which a lot of companies have openly shared GPD 4 GPT 4.5
whatever the latest model is we do not know entirely what training data that they used generally they say they used web data how much of that web data is consent with content how much of that web data is without content nobody knows there are lawsuits people have always filed cases people have shared their voice against let’s say tools like stable diffusion recreating artist work but the point here is that a training data is a place or training data is a thing with which uh we don’t have lot of information under also we don’t speak a
lot open tools open models have given you some understanding about the kind of training data that goes into with content without content all these things but still even if you take one of the most popular models these days is mistal but you don’t know what is mist’s training data Maybe M doesn’t want to get sued or maybe this is what it takes to build a gp4 equivalent open model the point is training data matters um not maybe today I’m I’m not not pro copyright I’m not against copyright I’m
just saying that imagine you wrote a book let’s say like Harry Potter or something and there is an AI that can read your book or that has read your book and create a similar work which you may have to take like couple of years to do but AI is really good at it maybe it would hurt as a software engineer or as a data scientist I don’t know how much I would say there is a meme for example um you know the that the text that you take from a book the author would U feel sad or an artist would feel bad when their
work is copied by AI but when they tell a programmer that AI has copied from stack overl GitHub the programmer actually says that okay that’s what I do every day so our profession or at least like if you are from a software background or a data scientist you might feel that you know we do it very often but there are a lot of other professions where cre creative work is their bread and butter they get paid by writing books they get P by paid by creating art so I’m not sure what is going to happen especially when we do not have training
data transparency there are companies like shutter stock Adobe have explicitly said that we are going to use only consented data for training our let’s say image generation model there are companies that do this kind of stuff but still it is not an industry standard yet the next one is the models are quite good at hallucinating now whether Hallucination is good Hallucination is bad that is for a separate conversation altoe Andre karpati who is uh now part of opena previously the head of self-driving at Tesla who is a very popular figure in
the Deep learning World in terms of his teaching and all the things that he shares Andre carpati has always said I considered that these models are dreaming and some dreams are factual write some dreams or not and I see hallucination as a feature than a bug I feel like most of these models are hallucinating and when the hallucination Hallucination is Right factually correct we take them and the Hallucination is factually incorrect we call it hallucination so everybody has got their opinion it is very easy to use
techniques like prompt injection or some other techniques and make these models give your wrong answer like for example in this case I have basically made chat GPT to tell me what is Neo’s favorite food in The Matrix and it says according to a statement made by the directors of the Matrix movies in 2021 Neo’s favorite food in The Matrix is chicken biryani followed by Italian pasta as a second favorite there is no information about this there was nothing like this on the internet I primarily injected that information in cont
context to chat GPT so that chat GPT would give me this response back when I ask this question which is completely possible this is not technically hallucination but this is like one of the adversarial attacks with which you can make chat GPT or other large language models to give you factually incorrect answer so Hallucination is a big part of it Hallucination is one reason why medicine does not use a lot of large language models because you cannot still rely exact ly whether the answer is right or not the model gives
you a different answer when you say let’s take let’s think step by step the model will not give you the same answer when you don’t see that there are a lot of memes that you can actually tip chat GPT or you can say that my mom likes you or you know save the Kens and all the other things and get a different answer alog together so overall hallucination I don’t know whether you like it whether you don’t like it but it is still part of a problem in democratizing large language models in everyday life and the next thing is the
rules what is the question for which you want chat GPD to answer what is the question you don’t want chat GPD to answer the main question is who makes the rule open AA has made a rule saying where do I find cheap cigarette is okay to answer while how can I create a bomb is a question that I should not answer well and good that open a made this decision but how far and how long you want a for profit or at least open a is a weird setup uh it’s a nonprofit and it has got a for profit so anyways how how
for how long you want big corporates with let’s say Market interest to make these decisions for you and whether it is right decision whether it is wrong decision every country is different some country might have certain belief some country might not have there is no Global belief I mean like we have certain Global beliefs like kids are cute kittens are cute so these are like Global belief imagine you are a kitten lover maybe you don’t like dogs so what do you want the model to do so the there is a larger amount of question about who
makes the rules there are bodies being created there are bodies being dismantled because it’s not working but at least at this point who makes the rule is a big question and that is one of the reason why I love decentralized Ai and why in this course we are also going to see a lot of local AI which means like you can run the model yourself on your desktop or a laptop or PC whatever that you have got you don’t have to always rely on one company setting the rule and then getting the model in itself I mean that’s that’s one
thing that at least I believe that we need to have a lot of models and um you get to choose what you want to follow it’s it’s like living on this planet you get to choose your religion you get to choose your food you get to choose your dress you don’t have to globally follow a set of rule um until you want to be nobody should be forcing it on you that is exactly why we need local open models and this is a big part of a question to say who makes rules when the model is not open and uh as much as we talk about
like good things in AI it is very important for us to while we are talking about all the other things it is very important for us to understand at least the implications that it might bring the current exam system that we have might be obsolete with the generative AI that we see I mean look at the exam results gp4 this is gp4 score this is gp4 score it has scored tremendously in a lot of these exams where human beings have to take if you have ai models really good maybe you can hear the model through your headphones or whatever it is
what is the purpose of this exam how do you now still hold these exams accountable to select the right set of humans for the right set of let’s say courses or whatever it is so Education Plus Academia is one of the places where these large language models have got some push back you know there are universities who have punished students for using large language models there are certain places where universities have encouraged students to use large language models Khan Academy has signed up with opena as a partner to create a
personal tutor for every student so this is a place where still there is a lot of questions among the academic researchers or educational or like you know teachers about how do they encourage or discourage using large language models but the broader question is how much of what we have followed until now is going to be valid already you know for example like if you’re in India you know that in India there are certain institutes where you should not open the book and write the answer but then there are certain
cases where you can have open book like you can have any book that you want and you have to figure out to write the answer so education requires its own transformation and uh whether willingly or unwillingly llms are going to transform education for good or for bad but this is a place where there is going to be a lot of impact in terms of large language models and we already discussed about knowledge workers I’ll give you a particular example I am data scientist by profession and one of the part of my job is sometimes making charts and
explaining charts to our stakeholders and this is something that gp4 Vision can do it like we’ll see couple of examples gp4 Vision can take a chart like this help you understand translate the chart into structure data and it can do much more more efficiently than what I can do I’m not saying that I’m going to be replaced by a gp4 vision model tomorrow maybe that will not happen tomorrow but what does it mean for my daily job if the model can do a much better job than me at a particular task maybe the model still does not
generalize well as much as as a human I would do as a human I might improvise I might know what to say what not to say depending upon who the person is but the models are getting good at it and as you can see here what will happen happen to the knowledge workers this is something that people should know and in terms of copyrights I think there is a huge debate and issue there are lot of lawsuits cases against mid Journey Open Ai and all the companies in fact opena has promised that if you use open AI product and you get sued by
somebody open AI will pay your legal fees and fight for you that’s a huge commitment I’m not sure how it is going to work or how much it is going to work but at least for that matter that copyrights are going to be a big deal whether you like or not the world that we live in copyrights are a huge part patents copyrights people get royalty from these things now one you’re going to put a lot of these professions out of work two it’s going to become ridiculously easy to replicate their work so what would it mean to have
copyright still intact is is it possible that we are going to enter into a world where copyrights don’t exist but then will open AI or these kind of companies share their code openly because they don’t respect copyrights then do they let people copy their work they’re not going to let people copy their work so it’s a very strange predicament in which these companies are we are but we don’t have answers like these are like questions that we have got we don’t have answers if you want to finish this
presentation with one final takeaway I would like to say that generative AI is transformative disruptive and unlike you know let’s say I’m not a very big crypto Fanboy but unlike like let’s say crypto or web3 or blockchain and all the other things like recently people have been like really getting crazy about generative AI is here to stay it’s not going to go anywhere it’s not going to vanish you can pretend that it is not going to impact your job but it is going to stay but like every other technology it has
its own limitations and if not handled with care it can affect current form of Education it can impact jobs it can spread misinformation it can widen inequality I mean of course it has a lot of other good things I’m not talking about the good things here I just want you to know before you enter into learning how to use these tools that the tools that we going to learn about in this course will have impact on all these things I don’t know if it is a nice touch I said like this is written by a living human but again what is written I
typed it on a computer pasted it here even written is being a question uh I don’t know the answer to but that has been there for many years but the point is it is it is a very strange place that we are in um if you take like humanity and uh that’s why a lot of people say AI is like electricity and I believe if AI is like electricity then decentralized AI is the way to go so you use your own AI models and then you build your own things you don’t have to use it from somebody else I mean like a lot of
countries have only governments making electricity but then there are a lot of other places like if you see Tesla cells solar cells people buy solar cells make their own electricity use it for their own cars use it for their own stuff I think decentralizing decentralization is the way to go about but in this course we’ll see both the closed models and open models but while there is a little bit of I should say a lot more emphasis on open models because that is a principle that I believe in at the same time A lot of people are also against
open- sourcing AI because they compare now ai with nuclear energy and then there are a lot of different ways that they go about it’s two different schools of thought I’m I’m part of something like somebody’s part of something if you have any thoughts I would like to know about what you think about generative AI see in another [Music] chapter so the last video ended with me saying that we I prefer decentralized AI but I did not provide any kind of clarification what do I mean by that so in this video I want to First offer what
do we mean by decentralized Ai and I want to like in fact break down certain things before we go ahead with decentralized as a part let’s define what do we mean mean by AI here so as you all know of course AI is here artificial intelligence we are not talking about AGI which specifically talks about general intelligence so here we are talking about narrow artificial intelligence which is AI now what is this AI why is everybody talking about AI we saw a glimpse of this in the previous video but now the AI here
specifically if you’re talking about technically is nothing but deep learning problems what are deep learning problems leap learning is a computer science domain where you build deep neural networks so typically a neural network would look like this it has got an input layer a middle layer and an output layer this is typically how a simple neural network would look but if you replace the middle layer with lot of deep layers that is basically what your deep learning is now for you to do deep learning effectively you need par
Computing or you need High memory because at the end of the day what is happening in deep learning is it is matrix multiplication it has got lot of different weights within this neural network so if you see here youve got lot of different weights here and these weights get multiplied you know certain things happen so at the core what you need is you need really good compute that can do matrix multiplication for you and for that particular reason deep learning is more efficient on GPU than CPU now what is GPU GPU stands for
graphical processing unit and CPU stands for central processing unit your normal computer or the laptop or probably the device that you’re using right now has mostly got both GPU and CPU does it mean you can do deep learning on any device that is where a big catch comes in see most popular Frameworks for deep learning or at least one of the most popular framework for deep learning today is called pytorch now pytorch and many other deep learning Frameworks are primarily optimized or efficient on Nvidia gpus Nvidia gpus are really one
of the core of why deep learning exist or deep learning happens or where deep Learning Happens these days now the reason why Nvidia G is highly preferred is primarily because Nvidia GPU has got their own proprietary software which they call Cuda so Cuda is the software that is one of the primary reasons that enables parallel Computing like high level parallel Computing within Nvidia gpus who efficiently utilize the Nvidia GPU memory and then do matrix multiplication or do deep learning efficiently thanks to Cuda Nvidia GPU
are the home for most of the deep learning projects it doesn’t mean you cannot do it on CPU it doesn’t mean you cannot do it on AMD processors it means Nvidia is highly highly preferred now what is this Cuda so the Cuda or it is an Acron for compute unified device architecture it is a proprietary that is one thing that people often forget it is not an open source solution it is a proprietary closed parallel Computing platform and an API that allows sof was to use certain types of gpus for general purpose processing so what we are doing
here the matrix multiplication we are doing is a general purpose processing and for that general purpose processing we are leveraging the GPU and this approach is called GP GPU general purpose Computing on gpus so Cuda is the software layer that gives direct access to gpu’s Virtual instruction set and parallel computation elements for the execution of compute kernels now what Cuda enables you to do ultimately is do this better um GPU or leverage GPU for matrix multiplication that means you can do machine learning and deep learning
having said that that basically enables most of this AI models to effectively run primarily on Nvidia enabled gpus so if you want effective hardware for running deep learning or for running AI you need effective gpus and not everybody has got effective gpus and that is where a lot of other Concepts come into picture so now first thing is because you need effective GPU that is clear at this point right deep learning here we are saying ai ai basically we are saying it is a deep learning and deep learning mainly prefers GPU because of
comput and memory and Frameworks like pyot lets you do that thanks to Cuda maybe not so much thanks because it’s not open source and you need all these things so that is where running an AI model or an llm becomes a little challenging so if you want to run an llm so if you want to run an e model or an llm let’s say you want to run an llm in this case we are referring to large language models so if you want to run large language models let’s say now you at this point need Nvidia gpus no because of this reason because it is
very expensive computationally you don’t often use Nvidia gpus as a physical Hardware rather you would rent this on a cloud provider so you would go to let’s say AWS you would go to gcp or you would go to Azure and then you would rent an Nvidia GPU this is option one option two is these days you have got a lot of new startups like for example one of the most popular one is runp POD so you can go to runp pod and rent a GPU so GPU there are like different kinds of GPU for example RTX RTX let’s say 4080 4090
is a very popular GPU a100 another popular GPU these are like on a different levels it has got different memories it has got different compute um the flops would be different lot of things are different so now one you can go to these Cloud platforms rent to GPU or you can go to this new kind of computing or Cloud platform and rent a GPU this is where you could do this machine learning or deep learning or whatever llms efficiently now what do I actually mean by oh do llms I’ve been saying the do llms what do I mean by do
llms that is a very important thing for us to note because llm doesn’t happen to be just like that there are a lot of steps in building an llm first thing you need to prepare data set so you need an input data set that you can use and these data sets because because we are talking about large language model we are not talking about just a simple language model we are talking about large language models these large language models require you to have large amount of data set so because of that only you end up having large
language model and because you need data set you need preparation which you can still do it in GPU but CPU still you need like GBS of data like gigabytes of data after you do that the next thing that you need to do is you need to train a Model A train a model is what we call as building a model once you build a model successfully sometimes you might do an evaluation to see how good the model is doing so you have got a bunch of evaluation metrics or benchmarks to see how the model itself is doing when you compare it with different models so
you need data set then you train a model then you eval or do benchmarks now what started happening in the Deep learning World some years back before that there was a new technique that came into picture that is called transfer learning so what this technique said is that okay not everybody has to build their own model I’m not talking about llms here I’m talking about any deep learning model rather you can take certain pre trained deep learning models and then you can use that for your own use case so you take a pre-trained deep learning
model and you use that model as your base model and then you do something called finetuning for your own use case and then you have a new model and that model is supposedly better than the pre-trained model for your own use case if pre-trained model or something this model after you fine tuned would be better than what the pre-train model was for the fine-tuning task in which you did it now this has been there for quite a while so now when we have to build a large language model we need data set we need to train a model and we know that
if we have to train a model which is basically building a deep learning model we need gpus and then you do evaluation or Benchmark after you have a model in place let’s say at this point you have a model in place and what is that we mean by model so most of the times most of the times what we mean by model is a p torch bin file so it’s a DOT bin file or sometimes these days we have safe tenser files so we have save tenser file or we have py to bin file these are the files where the Deep neural networks weights
are stored at the start of this video I said like you know there are weights like numerical weights that are stored here and these numerical weights are mostly float so these numerical weights are mostly floats what do we mean by float it’s like 0.26 78 96 something like this is a floating point so we have numerical weights and these weights are stored inside this bin file or tensor safe tensor file now when you have a longer number here like this very long number you need more memory to run the model also forget about training the
model even to run the model you need more memory and what is that is called that running the model is called inference so inference means you use the model to generate text we’re talking about llms particularly here so using the llm to generate text is called inference so you know at this point you need GPU to run the create the model you need GPU to also run the model that is what is inference but because not everybody can have CPU there are new techniques like for example quantization there are different kinds
of quantization techniques but in a nutshell quantization helps you reduce the Precision of the floating points that you store and thereby reduce the memory that is required for inference so that you can run this inference on even consumer Hardware gpus or even CPUs so we have quantized models we have quantized models that help you run these models on consumer Hardware either GPU or CPU and one of the most popular Frameworks for that is llama CPP and there are many other techniques like file formats like GG UF GG ml you
don’t have to know exactly what is inside this and all these things you just have to know these Nam so that you know what when you come across this name you know this is basically quantized model so you’re going to use llama CPP that is going to help you convert the model weights into let’s say a C++ uh quantitized weight and that helps you run the py torch code basically is ported into C++ which is like much faster than py torch and then it uses different file formats which has quantized model weights and then you can
run the model on consumer Hardware like your laptop so now because we spoke about CPU and GPU and we have seen about quantization let’s take a quick look at a bunch of things so we know CPU and we know GPU in GPU you have multiple providers you have Nvidia let’s say a very popular leader you have AMD and you have bunch of other companies and then you have got TPU the T in TPU stands for tenser processing unit this is uh primarily mostly used by Google so Google has primarily got tpus I’ll quickly show you
Google collab and you would see tpus there and very recently there is another Pro popular one that is coming up that is called metal and what is this metal it is nothing but Apple silicon computers so if you have got M1 M2 M3 computers from Apple it could be MacBook Air MacBook Pro Mac Mini whatever computer you have got if you have got any of these chips it is highly likely that you have got metal GPU within this apple silic and these days we have got Frameworks and softwares that can let you run these deep learning models optimized for metal
so these are like different kinds of computing and these are like you know what you can do now let’s take a quick look at different kinds of Frameworks we learned about pytorch but pytorch is not the fastest one so there are people who uh have fallen back to something called Jacks and very recently Apple announced something for mlx and this is primarily optimized for Apple silicon so but most of the things that you would see on white TCH you have got Jacks and still some people Port things into C++ and you
have got mlx very recently that is optimized for Apple silicon I know this is a lot of information in this video but the objective of this video is to give you a very wide landscape idea about what is like the technical details of what we call as large language models and I hope you have some clarity about what we mean by large language models to quickly recap when we say large language model the model here is nothing but a deep learning model a deep learning model is nothing but a deep neural network so it is like quite deep and
then it has finally an output and most of the large language models that we are discussing today or large language models based on a particular deep Le learning architecture that is called Transformers so Transformers is the one of the most popular architecture based on which these deep learning models are built and we know that we need GPU to run these deep learning models and when we say gpus we are primarily talking about Nvidia gpus and because we don’t have Nvidia gpus like lying around in our house mostly most of the people if you
have got your lucky but if you do not have that is where you go to a club CL provider and either rent a GPU or you go to some other cloud provider and then try to use one of their services and because of this primary reason because gpus is computationally expensive gpus are hard to get and uh not the highest model not easy for you to run on CPUs because of this primary reason people offer llms as inference apis we already learned what is inference inference is the process of running an llm and then getting the text output so building an
llm is one running an llm is what we call as inference so because of all the bottlenecks you have got because of all the reasons that it is difficult to run a large language model also you have to run it in production which means you need GPU always you may not have serverless compute that means you always have a cold start problem because you have to boot up the GPU to solve all these problems a bunch of companies decided to give the llm as an inference API it’s not first time somebody is doing it software engineering is quite
popular with the API world but if you are new to the API World API stands for application programming interface it’s nothing but a connection to a bigger system that system would be owned by somebody else and you are going to just hit their end point and then get a response back so you’re going to hit their end point and get a response back now the challenge here is that if you are working in a big company if you’re working in an Enterprise setup especially if you’re working in let’s say Finance sector or Healthcare sector
not every company would be interested in giving your data to somebody else because there is a server and you are going to hit that API from that server and then get the response back if you want to let’s say design an app you will not have your own GPU to run one second you cannot run that app I mean now you can do it with iPad Pro because it has got Apple silicon but mostly let’s assume you cannot do it that means you definitely have to hit an endpoint API endpoint and then get the response back that means you’re sharing the data with
them and that is where a whole new business of how you can use llms as an APA comes into picture without any doubt the leader in this business is open AI open AI if you are wondering how does open AI make money open AI makes money because their model is not open source not everybody can host it that means if you want to use gp4 one you can go to gp4 the chat GPT and then use it but you cannot build softwares on top of it you can chat with it but you cannot build softwares on top of it and if you want to build softwares on top of it let’s
say you want to build your own translation software or let’s say you want to create your own design software or let’s say you want to create something only for lawyers then that means you need to create an app either on web or like let’s say iOS that means within this you have to hit the open a endo and get the response back and it’s not just open a is the company that is alone doing this business opena is definitely one of the leaders in this because they have got the exclusivity and they are the best llms in this world
at this point their gp4 is definitely the best llm at this point generally there are like specifically you would go to different LMS but generally this is one so open AI Nob brainer then you have got Azure then you have got AWS bedrock and there are other services like this that in fact like let’s say Google gcp because Google has got G and bunch of other things Palm let’s say Palm these are like model names so these service providers also let you use a large language model as an inference endpoint APA endpoint the challenge is most of
them would not let you fine tune it even though open a recently started letting you fine tune as well so if you want to let’s say just have a chatbot the easiest way to do that is build an application where take a user response hit the open a endpoint get the response back show it to the user you have a working chatbot without having to maintain your own infra without having to maintain your own infra or gpus but if you are worried about data let’s say data challenges data privacy and lot other things that is where the
whole new concept of open models come into picture instead of owning a proprietary model instead of paying money to an llm service provider who owns a proprietary model like open AI you would turn towards open models which means the model weights so the model weights you all know what is model weight the let’s say the bin file the pytorch file or the safe tenser file or openly shared when we say openly shared it comes with a particular license I’ll get into the license detail later on but these are model weights
which you can self host you can buy your own small let’s say VPS like virtual private server or you can have your own cloud uh model and you can have these things and you can host these models and then you can run it now this is where open AI versus truly open AI come into picture this is the company in this case and this is the open models that you are either self hosting now there are different service providers who let you use these open models also API that is a different topic alog together but in a
nutshell you have got one world which is like proprietary models the second world where is open model and what I meant by decentralized AI is using these open models without having to rely on some proprietary model where you have to always send your data to them now like I said even this model if you have to self host it you need gpus or you need to rent gpus or you need to use quantized models like ggf or the other option is you go to new providers who let you use this model and there are bunch of advantages using this
model but I want to stop this video here at this point to give you a breather if you have reached this point at this point I would strongly encourage you to write a blog post about whatever that you learned in this video and then write the blog post publish it anywhere that you want and then tag me whenever you share it on social media I would love to read what you have done because what I have done in this particular videos I’ve given you a lot of keywords I’ve given you a lot of seeds but if you have to
understand this entire space better you need to spend at least an hour or two in going through these individual keywords that I’ve given you so that you can understand this entire landscape and then build a better understanding before we go on further lessons see you in another video five levels of llm apps consider this to be a framework and help you decide where you can use LM there are lot of different myths around what llms can do what llms cannot do where do you use llms today so I decided to put together this material uh in which I’m
going to take you through kind of like a mental framework based on the extension or the depth in which you go towards an llm you can decide where you can fit this llm so we’re going to first see what are those different levels of LMS that I have put together then we are going to see slight extension of that got two different documents to take you through that so this will give you an idea about how llm is being used today and how you can use LMS for your own applications to start with imagine this pyramid structure this is a very simple
pyramid structure and as you can imagine with any pyramid structure the top of the pyramid or the peak of the pyramid is our aspirational goal and what you see at the bottom is the easiest that we can do and as with everything else you have to slowly climb to the top of the pyramid so you can probably hit the aspirational goal so to start with where do we use llms first Q and A a question and answering engine what do I mean by that it is quite simple for us to understand so question and answering engine is a system where you have an llm
and all you are going to ask the llm is a question so you send a prompt and the llm takes the prompt and gives you an answer that is it that is the entire transaction that you have between in llm send a prompt get send into the llm get an answer llm large language models are nothing but sophisticated next word prediction engines and they have been fine-tuned with something called instruction so they are instruction fine-tune models that means they can take a human instruction and get you an answer back for example if I ask a
question for this what is the capital of India then the llm would process this and then llm has information about how to answer it and then it will give me the answer back the capital of India is New Delhi that’s all what you’re going to do with this thing so first level question and answering now you might wonder at this point that where can you use question and answering as an llm engine this is the first thing that people built like when llm started even back in the day gp22 level people started building simply Q&A bots so all
you want to do is ask a question give an answer could be a homework could be a general knowledge question could be something about the world could be about science could be about anything ask a question get an answer as simple as that it’s a very three-step process ask a question or send a prompt take the llm to process it give me the answer back very simple application now what you’re going to do is you’re going to add something to that application and that is how you actually build a conversational chat bot and to
understand this better I would like to take you to my second document which will give you probably better idea whenever we are talking about llm there is one important thing that we need to understand is we have crossed the stage where llm is simply a large language model we have more than that so for you to understand that I have five dimensions a prompt a short-term memory an external knowledge tools and extended tools if you think of this as your horizontal these are your verticals these are different dimensions that can
add to an llm so you have a prompt you have a shortterm memory you have a long-term memory or external data you have tools and you have got extended tools so let me give you an example for each of this so that you can understand this better a prompt is what is the capital of New Delhi that’s all the prompt is you simply go give what is the capital of New Delhi and the llm understands it and gives you a back understanding just gives it back now shortterm memory is when you have conversational history or something in
the llm that is what we call call as I in context learning so whatever you stuff inside the context window the llm can take it that is your short-term memory so you give a few short examples you give an example like for example what is the capital of us uh I guess it’s Washington DC Washington DC and you give a bunch of examples like this so the llm knows what is that it has to answer this is a short-term memory next you have external data now you take data from Wikipedia and you keep it and then give it to the LM that is your long-term
memory because short-term memory just like a computer the ram it gets reset every time you reset the conversation or the session and then tools you let llm use tools like calculator internet python terminal and all these things and extended tools is when you expand this much beyond that I hope now you have understanding about the five different dimensions that we have in llms a prompt a shortterm memory of in context memory a long-term memory or external knowledge external data or custom knowledge tools like calculators and Python repple and
extended tools that goes much beyond that what we do not have currently so these are different dimensions now coming into what we wanted to see is chatbot so how do you make a Q&A bot as a chatbot is very simple now at this point you might have already got this idea so you take a prompt and you give it to the llm where you can have shortterm memory me in context memory in context learning for example so what is the capital of India so you what is the capital of India you ask and the llm answers New Delhi this is what happens
in a simple Q&A bot but how do you make it a conversational bot or a chat bot by adding a new dimension called a shortterm memory and how how do you do that you keep all these things that you are conversing into the chat conversational history so what this gives the ability for an llm to do is when you say what is the capital of India it says New Delhi then you can just simply go and say what are some famous Cuisines uh there so at this point the llm would have an understanding you’re talking about New
Delhi because that conversation is stored there in the llms shortterm memory or the in context memory so the llm can do something called I in context learning and give you the response back and that is how you upgrade in the pyramid by building a Q&A bot giving a new dimension called history and then making the Q&A bot a chatbot so that it can converse now chatbot has applications everywhere that you can turn towards youve got chatbot and customer support you have got chatbot on websites you have got chatbot for
Education like youve seen a lot of demos from Khan Academy so chatbot is quite versatile it almost has its purpose in every single business or domain that you can think of now people were using chatbot um but you know chatbot itself is not enough why we already know the answer to the question can you pause and answer if you know the answer so why is that chatbot is not enough uh for a lot of use cases the answer to the question is chatboard stops with only short-term memory you need long-term memory or you need
external memory see for example I ask what is the capital of India it says New Delhi what are the famous quins there it will give me an answer quite valid llm is doing its job so let’s say I’m a I’m a company okay so I’m I’m an organization let’s take uh Apple for an example okay now I ask what who is the CEO of Apple of course the internet as information about it so it will say Tim Cook that’s quite easy now if I go say uh who who is the manager of the team handling iPhone 16 will it answer no I
mean it might answer because it hallucinates a lot but the answer would not be correct and that has become a big bottleneck in a lot of Enterprise use cases because you do not just need internet knowledge you do not just need the knowledge that the llm has got you need more than that and that is the custom knowledge component or the external knowledge component that you need the dimension that you need to make your llm slightly more than just a chat bot and that is where a new technique called rag comes into picture retrieval
augmented generation where you use the knowledge that you provide or you call it a longterm memory you use the documents the internet the sources everything that you have around and you use that knowledge to send to route to llm and then make the llm you The Leverage that knowledge and now at this point probably you might have guessed it see first we had only prompt one dimension second we had shortterm memory two Dimension now we have external knowledge which is three dimension so this llm is at the center of three different things you have got
prompt you have got um short-term memory and you have got long-term memory to make you understand this better uh so I’m going to take you to the rag so how does the rag look like so you have got the M at the center of it you have got your data somewhere available so it could be on different structures it could be on database most organizations have data in their database structure database rdbms database then you have got documents which are unstructured like PDF HTML files internal portals blah blah blah blah blah then you have
got apas let’s say you are a sales team uh probably your data is and some CRM or Salesforce right so you need a programmatic call to make the call and get the answer back so your data could be of these different places could be like structured database like rdbms system it could be unstructured documents uh PDFs uh HTML documents anything that you have locally and then you have got programmatic access like your marketing team you need data from Google ads you a sales team you need data from Salesforce you are your
company is heavily into it so you need data from AWS like billing cost and all the other things so this is programmatic so you use one of the these methods a structured passing or unstructured passing a programmatic call and take all the input data and create an index an index is what Google creates at every single moment you have got all these websites what Google does is Google creates this index so it is easier for Google to go Travers when somebody’s asking a question and that’s how Google became popular before Google people were
using totally different thing Google came up with something called page rank algorithm at the fundamental of page rank algorithm you have got this index with different parameters of course and definitely we’re not building Google but so index is what we are building it makes it easier for you to understand what is inside the data so now a user comes in asks a question what is a question who is the manager of iPhone 16 team so that question goes to the index the in this this system particular system takes that and picks only the
relevant information see this index might have information about all the teams iPhone 16 Apple Vision Pro billing accounting procurement marketing blah blah blah blah blah so it has all the information what you are interested in is only this particular piece which is what you asked which is iPhone 16 manager so it this particular part is where it takes only the relevant information from the index and then it matches with the query uh The Prompt that you give and then it finally gives you send it to the llm The Prompt what
you asked and the data that you extracted and it goes to the llm llm gives the answer back to the user this is quite different from the chatbot application if you see I’ll give you an example why so in the chat bot all you are doing is you have a memory question is there sometimes you might do uh let’s say a long-term memory by doing user profiling I’ll I’ll ignore this for now you don’t have to use this now ignore this for now so what you’re doing is you have a question you’re sending it as a prompt
and you have memory that also goes to the prompt because that’s how you can do it and you have llm answering this question and you get the answer back now you might ask me hey why do I need to put my thing in the external data and create an index rather why can’t I keep it in memory if you have got this question at this point that is a very important question and you are thinking in the right direction in in fact people who reached at this point you can tell me whether you know the answer or not the reason why we cannot do this uh or
we could not have done it early in these days of alms is due to an important factor called CTX window what is CTX window CTX window is nothing but called context window this internal memory and question or the shortterm memory and the question is bounded by what is the context window of this particular llm so you have an llm the llm might have context window like 4K which is quite popular these days or 8K and even gini like llms have like 1 million as context window so context window is there now what you are
actually doing here is you have a question the llm answers so you have a question one right and answer one comes back then you have a question two then you have answer two by the time you go to question three what you are sending to the llm is not just your question three you are actually sending all these things right so let’s say this is 2K this is 1K answer then again 2K question 1K answer and let’s say this is a 2K question so at the end of the day when you are hitting the third level of conversation I’m kind of exaggerating
but let’s say 2+ 3 uh 2 + 1 3 3 6 8 so you already hit 8K so conversation context window so if you have got 88k token model at this point your model will hit out of memory error or it cannot hold it in short-term memory and that is exactly why you need rag ritual augmented generation because this one is not bound by the conversation of course you are going to keep it in conversation but you don’t have to stuff everything inside your question rather you can keep it inside your index right because youve
already indexed and you can keep it and only the bit that is relevant comes to you and now you might be asking how is that possible and for that you know you go into like a separate tangential side that talks about semantics and uh semantic search and all the other things embedding semantic search that is quite out of scope uh if you want to go deep you should read rag llama index is an excellent library for you to read about rag uh they have got really good developer relation system uh they have got a lot of Articles uh and you should
definitely read about llama index and rag if you want Advanced rag but I hope you get the point going back to our system that we put together so what do we have we have a Q on a system at the front which just takes an input gives an output nothing else then you have got the chatboard the input plus history goes together that is always shortterm memory you get the output the output also goes back to the input that’s why you keep the conversation history then you have got a rag retrieval augmented generation the reason why it is called
retrial augmented generation is because you have got a retrieval component that you augment with the llm component and then you generate the response back so that is retrial augmented generation and the applications are enormous there are lot of startups in 2024 when we are recording this lot of startups just doing rag so if you can build a rag solution today in 2024 you can probably even raise find or you can be a good successful SAS there are a lot of companies making really good money solid money out of it I’ll give you an example
in fact like one thing that I’ve seen site gp. if you go to site gp. aai it says make eii your customer export Export customer support agent and I know this is this is a product that is making a lot of money um hundreds and thousands of dollars and at the foundation of it it is a rag it takes all the information that is ail aable in your website index assert or we call it data injection injection and index assert and when you ask a question it just gives you an answer back that’s it it’s not just a
normal chatbot it is a chat bot that can answer based on the existing data so if you are breaking into llm today I would strongly encourage you to do some rag system that is by default something that you should do so if you’re University student watching this if you’re an early in career professional I would say you should build a couple of rag examples so you know there are a lot of nuances in rag like how do you improve indexing how do you improve indexing by changing chunking what kind of algorithms you use
for embedding and what kind of models are good with rag whether you put the text at the top is it good whether you put the text at the bottom is it good if the text is in the middle it is good a lot of components to rag rag is not just simply what we discuss usually on this channel you can go Advanced drag and I would strongly encourage you to spend some time in rag unless you want to get into something that is quite exciting and interesting but before we do that I would like to quickly show you one more thing that not a lot of people discuss
when we talk about llms it is not necessarily rag it is just like using short-term memory so it doesn’t use long-term memory but it has its own potential which is to use llms large language models for classical NLP task classical NLP Downstream tasks for example example let’s say you want to build a text classification system what is a text classification system you give a sentence for example uh the movie was complete crap now is it positive or negative positive or negative you choose you build you train a text
classification model just to figure out this for example or the other example I can give is you have a review let’s say the movie was amazing and the actress um was exceptional now you try to build a model that will say what kind of review is this for example is this review about movie um theater or director or actor so now you know this is an actor based so this is what takes classification in classical nlps there are lot of other tasks that you do in classical NLP what you can do is without having to build
your custom model like say bird based model XG boost based models you can use llms large language models for classical NLP problems because large language models have really good in context learning and with the current memory that you have got with a few short examples or tree tree of thoughts or a chain of thoughts you can make your large language models a good zero short NLP classifier or it is applicable for lot of other tasks as well so one thing that not a lot of people are exploring I would encourage you to explore if you
work in classical NLP problems like labeling or text classification entity recognition whatever it is you can leverage llm now the question is do you want an llm based Solution that’s a different topic I’m not talking about you know you looking for a nail because you have got a hammer I’m just saying that this is a good opportunity wherever you don’t want to build models you can use us this but of course if you can build models that will be probably cheaper than you know making calls to llms and getting answer back but
summarization text classification entity recognition I think llms are exceptional zero shot llm uh and downam for Downstream tasks and you should definitely leverage it now with this we have arrived at rag okay so we have arrived at Rag and we already know what is rag now we are entering into a very interesting phase about what everybody’s obsessed with what everybody’s love agents very recent announcements from Google Microsoft previously open Ai and every announcement you would have seen two important things as a common Trend
one is you would have seen multimodality multimodality what does it mean it just simply means in instead of just chatting with text you can chat with images you can ask questions in voice it can respond back in speech you can send videos so one important Trend that you are seeing is multimodality and the second important Trend that you see everywhere is Agents multi- agent setup where you have got multiple agents you can summon these agents to do certain tasks and these agents will do it for you just like men
in black MIB they have a purpose and they will do certain tasks but before I jump into agents I want to actually introduce you to another important concept called function calling function calling is the precursor to llm agents in function calling what you do is you have a short-term prompt you have a prompt you have short-term memory uh sometimes you need external memory sometimes you don’t need external memory but you are giving the ability of calling external tools and you are giving the ability of calling external
tools by doing something called function calling function calling to be honest is a terrible name cuz you’re not calling any function here you’re you’re not making the llm call anything not at all all you are doing is you’re forcing the llm to give a structured response back so you can call an L I’ll give you an example what is function calling so let’s say that you have a weather API okay weather I think everybody goes with weather AP so I’m going to I’m going to skip let’s say you have a currency
converter okay currency converter what kind of things a currency converter need okay you need input currency you need output currency you need date you need amount technically these are the four things you need what is the amount that you want to convert for what is the input currency what is the output currency and what is the date for which you want to do currency conversion let’s keep as a simple APA now typically when you go to an llm okay and say uh what is USD to INR today first of all LM may not understand what
is today llm might know USD llm might know INR but lm’s memory is frozen because it’s a snapshot see a large language model is a snapshot so it memory has been frozen to let’s say September 20123 or something like that okay so what it cannot do is it cannot give you the latest information and you cannot do this with I mean you can do this with rag kind of like you can every day take knowledge ingest keep it in your memory and then give it back not very efficient um expand this to stock market a daily data doesn’t matter
because everything changes like every minute and every second so you need something instant what you do you call an API if you are a programmer that’s what you would naturally do you call an API now if you want to call an API uh what you need to call an API so let’s say this is the information what I need at the end of the day I want to call it currency converter right and I’ll say input output date right I need to make a call like this so I need four arguments that is a solid input could not be like oh United States
dollar and some other time I’ll be like USD some other time I’ll be like US dollar I mean that will not work right you need a specific format for everything your let’s say amount should be an integer right a this should be a date object so you need to force this llm to give you a particular response back otherwise what happens is this llm will throw you anything for example example what I want to say is what is USD and I so it’ll be like oh USD andr is so so on September 2023 so you have to force guide the llm to make a
particular type of output and somehow universally everybody has accepted that format is going to be Json except anthropic which absolutely loves XML so if you use anthropic you use XML if you use any other model you use Json so you’re forcing an llm to give you a structured response back a j that can help you make this function call you can call this function with that Json so a guided response into a Json is what everybody calls function calling you don’t necessarily call the function in function calling but you get
the output that will help you call function call right clear now that is exactly what is a precursor to agent because in a function call you have the ability to call a function and agents are nothing but a bunch of function called stitched with tools so what do we have in agents we have a bunch of function calls plus tools and I would like to introduce to you a very interesting solution that can help you understand more about agents if you are too old in the AI world you would have probably recognized this immediately and this was the
workflow of something called baby AGI so baby AGI was quite a popular thing back in the day I mean back in the days like less than one year before I guess or maybe more than one year a function call is what I said is the foundation of Agents but what is an agent now if you have seen our pyramid you would know our agent sits right at the top like closer to what we our aspirational figur is now what is this agent how do you define an agent so it’s simple first of all a chatbot and a rag all of these guys if
you see here they end a text or you know some kind of thing like input output images video all these things right that’s where they end one of these modalities they’re done what you achieve with agent is something that is absolutely stunning you don’t stop at text response you stop at an action you trigger an action and that is what agents are simply you take llm you connect them with tool you give them a purpose or goal that is your agent and that is exactly what baby a has done back in the day like there are multiple
agents now but if you see baby a which is a very wonderful framework you can see that there is a task like there is something that has to happen there are certain tools like for example Vector DB and all the other things are there and every agent has a Purp purpose like okay you have to execute you have to return you have to do something you have to do something and they have a goal so you have tools purpose SL goals and llms and this all together work for a common goal and that is your agent there are multiple agent Frameworks that are quite
popular these days crew AI L graph you have got a py autogen and most of these things you will see first you have to define a role you have to refine a go Define a goal role goal and then you have to save which llm that you want to use as a backend engine and then you put together a system of one this is single agent now you put together like this is a team that is your multi-agent setup with agents people are doing amazing things you can make an agent book your ticket you can make an agent let’s say read something um distill something
create a note publish the blog post you can summon these agents to do a lot of things and personally for me uh the most time that I spent reading about agents because you it’s it’s becoming quite obvious that agents are the next Frontier in uh the way we can take llms forward I mean there are a lot of different things but at least personally I’m quite interested in automation usually and I think agents are going to be the next big thing in I mean currently itself is a big thing Google has got Google’s own projects like they
call their own agents I don’t know what they call they have a lot of different names open a has its own agents and uh every time you talk to some company you speak about agents because you want to summon these agents you want to connect these llms to like different dimension and on this Dimension that what we are connecting is the tools Dimension so you take llms you have the function calling ability and once you connect them to tools you are unlocking the potential of something immense and that is what you
call as agents I’m not going deep into agents because this is probably I’m hoping it to be a series depending upon how you all like it but in this series my next focus is going to be agents so agent is quite closer to the top and that takes us to the almost the end of the video which is what is our aspirational thing what is that we are all trying to go towards which is l l m o and this is inspired by Andre karpa who created this amazing structure so what is happening here this talks about using llm at the center C of a
conversation or sorry center of an operating system if you go back in the day computer was created just for simple calculation purpose right you want to add a and you want to add a and b you want to keep a for one and B for two and then you want to add them that’s that’s what like initially computer was started like very very very back back in the days and computation started increasing computation started becoming less expensive more compute than we have the computer that we have today and garpa is
arguing can we have a similar vision for llm and where the vision is you keep llm at the center right you keep llm at the center and at the center with llm you have Ram which is the shortterm memory or the context window then you have long-term memory the disk system that can be used with rag then you have the agent structure that you have with tools and then you connect it with internet and when you connect it with other llms to have like a multi-agent setup or like a peripheral setup and then you have your peripheral devices where you have
got audio and video can we put together a system with all these things working towards a common goal and that will ideally become your large language model operating system this is quite a vision at this point there are certain implementations available at this point those implementations are based on current understanding they are mostly let’s say llms plus fun fun calling plus agents multi-agent more tools that is what the current LM is it’s not like a radically has a different a total view Al together and that’s why if you see
even in my framework that I’ve created llm o is currently developing and it is everything that we have got the tools the extended tools the peripheral tools with long-term memory with shortterm memory just one input from the user where it can run itself and then it can execute certain things I think that is a future that we are heading I’m not sure when we are going to do it but if somebody says something a for me today a could be like this could be like the baby a I mean I don’t I don’t I don’t
trust AG as a concept anytime soon but um yeah leaving the conscious thing Consciousness and all the other things out I would say llm o is at the top where we can expect something closer to a happen and all these things lead us up to there so I wanted to keep this video brief but this video is already going to be like more than half an hour I wanted this to be like a crash course where you understand if you don’t know anything about llm o uh maybe you have not taken any course so this is going to help you
to see how the future of llm O is coming and what led us up to there and uh let me know in the comment section if you like this kind of content I’ll put together more this took me a lot of time to create the framework design put it um in a particular thought process to you know make it make it understandable.

generative AI how work


One thought on “Non-Technical intro to generative AI how work

Leave a Reply

Your email address will not be published. Required fields are marked *