This is Hacker Public Radio Episode 3953 for Wednesday, the 27th of September 2023. Today's show is entitled, Large Language Models and I Don't Have Any Common Sense. It is the first show by new host hobs and is about 18 minutes long. It carries a clean flag. The summary is, learn how to load and run GPT-2 or L-Lama-2 to test it with common sense questions. This is hobs and line and Greg Thompson, we will be trying to load a hugging face large language model so that you can do text generation on your own computer without having to use somebody's proprietary API. So hugging face has a bunch of models including chat models and large language models, so you'll need to create a hugging face account first and we'll put this link in the show notes. It's for it's huggingface.co slash join, that's where you want to go join up. Then you'll need to get an access token. If you want to use any of the supersized models from meta or any other company that kind of hides them behind what's called a business source license, they're not really open source but they are sharing all the weights and all the data but you just can't use them for commercial purposes if you get really big to compete with them. But anyway, if you need a token, you'll need to create, get that token from your profile hugging face. You can put that token in a dot NV file. That works with a lot of Python library called dot NV, and that's what you use to load environment variables and so if you put it in a dot NV file, we'll combine those together when you load it with your existing environment variables. So, quick to up, you definitely want to use, once you see import dot NV and you say dot NV dot load NV, but you don't want to then say dot NV dot dot N values because that will load the dictionary of only your dot N file, variable and value pairs and so you want the mapping of all of your environment variables typically when you're running a server because there will be things like your Python path with Python version, that kind of stuff that you'll probably need to use if you're building a real web service. So, we ran into that problem when we were kind of configure our GitLab CICD pipeline and then we had to hit that problem again when we went over to Render to deploy our chatbot software at carry.au qa or y.au. So, once you've got your token loaded, you can then say, so you import, you've got your dot N file loaded into the dot N package, but now you've got an import.au s and say, dict.dict.au s dot environment. So, you're going to convert the OS dot environment, which is a dict like object. You want to you want to grab a copy of it basically and convert it coerced into a dictionary. So, dICT opened parentheses.au s dot environment. You should be familiar with that if you've ever worked with environment variables. The closure parentheses and you've got that in a dictionary, we call that ENV as a variable and then we can say ENV square bracket, quote, and then hugging face access token. Well, whatever you call your variable in that dot N file. Anyway, turns out we're going to show you how to do it for smaller models. We tried to do it for llama 2, but that's a 4 gigabyte model and it takes a long time to download and really hard to do when you're on a conference call with somebody in Korea where Greg is located. So, we are going to, and so when you search for models, it's really hard to find models on hugging face unfortunately because there's so many and people can describe them in a lot of different ways. And so, really hard to find what you're looking for. Don't ever hit enter after your search query instead go to their full text search. That'll give you more of what you need or you can click on the like the C33398 model results for llama 2. That's what we did in order to find the one we were looking for that could do chat. But, like I said, we are going to skip that one and move on to a smaller one KPT2. We're not actually not that much smaller. It's just that I've already downloaded it, downloaded offline several days ago. So, if you've already downloaded it, it's very done this once. This process of downloading and creating a model. If you've gone through this these steps that we're describing here, then you won't have to do it again and wait for the download. And so, anyway, that's what we're going to use one that I've already done this for online. You'll want to, if you do need that license, you'll need to apply for that license from meta.meta.com. If you're trying to use llama 2, then you can go, if that's under slash resources, models and libraries, llama downloads. Anyway, the show notes will tell you how to do that. But if you just want to use GPT2, you don't need to do that because that's two generations back on what OpenAI is building. So, which is that they're up to GPT4 and they're already working on GPT5. Let's see. So, now you can, we're going to use instead of the auto model that a lot of people use, we're going to use the transformers pipeline object from hugging face. So, the pipeline will include the model and the tokenizer and be able to help you to do interference. You won't be able to retrain or find to the model, but at least you can get it to generation text. So, you say, from transformers and port pipeline, and then you say generator equals pipeline, open parentheses, text generation, and you need to give it the model name with the key model. So, you say comma model equals open AI dash GPT. That's open AI dash GPT all lowercase, no spaces, just that hyphen in the middle between those two words. And then you can ask it a question. This is a generative model. So, it's going to only try to complete your sentence. It's not going to try to carry on a conversation with you. So, if you need to create, if you're trying to ask a question, you probably want to proceed it with the prompt question colon, and then ask your question, and then probably a new line after your question, and then answer, colon, and then that should give it the hint. It needs to try to answer your question. Another way you can do it is, if you're just asking a math question, you could just put an equal sign at the end and then try to complete the equation. So, we're going to try to see if GPT2 can do any kind of math, or because large language models are famous for not being able to do math or common sense reasoning, which is kind of surprising, since they are, because its computers can do math quite well. And they certainly do logic very well as well. But large language models are just trying to predict the next word, and so you'll see how this one falls on its face when you ask it to do one plus one. So, if you put in your question, a string, just the three characters, one plus one, and then a fourth character, equal sign, and put that in quotes, then you can you can then do your generator on that question. If you put the equal sign at the end, it's sort of like the question mark to a machine. And so, at least a generative large language model, this is going to try to complete that formula. So, then you're going to say your responses, equals generator, open parenthesis. Oh, yeah, I've already said generator equals pipeline, and so you already got your generator. So, you're just going to use that function. Generator, open parenthesis, and you give it your string, those five characters, you just enter four characters, one plus one equals. And then, and then it will return a bunch of answers. You could probably, you can set a max length. You want it to be bigger than the number of tokens you input. And because each one of these characters is an individual token, it represents a piece of meaning in that phrase, then you're going to have four tokens. So, you need to give it at least five on your max length parenthesis. You're going to say max underscore length equals five, or six, or seven. That'll be, it'll just generate enough tokens to end at that number that you give it there. This is for GPT2, in generative mode. And then, for the num return sequences, you can give it another parameter. If you'd like for the number of guesses, you would like it to take. So, the number of times you wanted to try to generate an answer to that question. So, we gave it the number 10, just to see if we would have any chance of entering the question. And when we did that, so close your parenthesis, now after num return sequences, those have underscores between those three words. And max length also has an underscore in between those two words. And those, those are keyword arguments to the generator function. And your question is the positional argument at the beginning. And then you're good to go with your answers equals that, or responses equals that. And, and so then you can just print out all those responses. If you'd like, we got, so the responses will include both your question and the answer. So, in our case, we got the very first response that we got, or generated text we got was one plus one equals two space two space plus. So, it's going on. If you, if you gave it two extra tokens to go, it would have said, you'll keep going. If you have it more than two extra tokens. So, let's say one, two, three, four, five. If you give it six tokens, it would stop it two plus. If you gave it more than that, then it's going to keep going. And it's going to say one plus one equals two plus five equals one plus, and it keeps going on. So, it's just trying to complete some sort of equation or system of equations. Third down the list though, we do see an answer that looks a lot closer. We see one plus one, equals two comma. And then it says space equals one, and space equals two. So, it does continue on beyond what looks like an answer. And many of the other answers are not even close. There's a one plus one equals six times the speed of sound, and one plus one equals one comma. So, out of the ten answers, it got one out of ten. So, that'll be a 10% on its exam. And you can't really even count the one that it got right as a right answer, because you have to be pick and choose some of the tokens that are generated to assume you have to just make it stop after the first token basically. I think it had a good answer out of it. Trying to more complicated question where we used that sort of prompting approach. We say answer colon and question question equal colon and answer colon. And we put questions like in the book natural language processing an action. We put the question about cows. So, if you've got there are 2,000, 2 bowls, how many legs are there? That was our questions. So, we put that after the question prompt. And then we had an answer colon. And then we gave it, I think we gave it 30 tokens or so, or as our max length, we gave it like 30 so that it could answer that question. Because there's 25 tokens in there. If you look really closely, you count up all those words and punctuation marks in there. You could probably see that it's going to end up with when you include the question and answer prompts. It's going to end up being 25 tokens. It'll give you that estimate if you give the number 2 love as a warnings. Hey, you better give me some more tokens that I can't generate what you need. But the answer is that we did come up with with that question about cows was we only gave it. Actually, it did a really good job. Let's see. Did I tell it to stop? Well, it looks like the question and answer prompting gave it a better job when I limited the max length. That this was when I actually said it to be smaller than the correct amount. So, once I got what's up when I said it's smaller than the actual question, it only got one out of the 10 right. I actually got none of them right because all the numbers like four and only a words like only and four and then the digit two and then the number the word one, then the digit's 33 zero and so on. So it didn't do very well when I when I under estimate the number tokens and then when I gave it more tokens that it needed, it gave answers like four f of you are dot and then a character turn and then a quote, let me see if I have this straight. So it's going to ask me a question. It looks like after giving me the answer four for two cows and two bulls. So it doesn't know that it lags or what I'm talking about and not cows, male and female because that's what it's counting out when it gets the answer four. The second most likely answer was only three and three cows and two bulls are bigger than. That answer three is kind of interesting because there are a lot of trick questions that people have been asking chat GPT and that have been including in the training sets that are trying to trick for logic where you've removed the legs of a couple of the cows or bulls in the question. So some of them will only have three legs so that three might be wise showing up so high on that list because it's gotten it's going to arise some text problems that are trying to fool it. But anyway, some of many of the other answers are all the other answers are incorrect. There's a two comma two comma one comma. There's a one per cow. There's a 30 dot. There's a one dot. That's interesting that that number 30 keeps coming up and one dot and three dot. So those are periods. It looks like at the end of a sentence. So it thinks it's given me the full answer on some of those and one of them says something like three has the word three period and then they need to be introduced to the cow population before. I wish I let that one go on a little bit further. Anyway, you can have some fun playing with large language models on hugging face. They're not going to give you much use unless you get to a really good job of prompt engineering and perhaps train them on your kind of problem that you need to solve. And that's the kind of thing we're doing over on the carry project, an open source project to build a chatbot that you can trust and has a lot of rule based, has a rule based approach to managing the conversation rather than purely generative. So you can keep it grounded in reality. Anyway, hope you have enjoyed this little my first ever hacker public radio podcast and I hope you have too and Greg, do you have any questions or thoughts? We spent a lot of time looking at all the different models. So it's worth exploring all the different sizes, tiny to big and seeing which ones work for your use case. Indeed. Yeah, that's a really good point. We had trouble finding one that was small enough for us to do live on this pair programming that we're working on. But um, so you can and you, this was one model out of many, many, many, many thousands that you can choose from. So have fun searching around on hugging face and find yourself a model. You have been listening to hacker public radio at hacker public radio.org. Today's show was contributed by a HBR this night like yourself. If you ever thought of recording podcast, click on our contributally to find out how easy it means. Hosting for HBR has been kindly provided by an onsthost.com, the internet archive and our synced.net. On the satellite stages, today's show is released on our Creative Commons attribution for.0.0 international license.