Scoping and Use Cases for AWS Bedrock Projects
AWS launched Bedrock at an opportune time to ride the hype wave of generative AI. A lot of technical deep dives have been done into the offering and showing off what one can do with customized generation. But the bigger question remains -- outside some already well-defined areas - what can I actually use Bedrock for? In the Amazon Bedrock FAQ, the use case section is a single sentence. "Agents for Amazon Bedrock can help you increase productivity, improve your customer service experience, or automate DevOps tasks." Not super heavy on the description, and it's easy to understand why - Amazon wants the technology to have a broad appeal to a broad spectrum of corporate clients. But how does one determine if a particular task is suitable for implementation via Bedrock? Let's review some topics that might make it a little bit easier to see if whatever bright shining AI-powered picture that you have in mind is suitable for implementation with Bedrock.
The thing that can be said for the "magic" of GenAI, is that two the output needs to be understood as non-deterministic. When talking to a person, asking the same question twice may or may not give you the same answer. GenAI is generally not dissimilar in this regard. The same prompt two times in a row may produce different outputs. While the general features of this output will be the same or 99.9% similar, this question gets tricky if you need the output to be 100% the same all the time.
If your task requires 100% all the time similar results and accepts no substitutes, then generative AI models may not be the best solution for that particular use case. If you sit down with your engineers and map out exactly what is it that you're trying to accomplish without bringing AI into the picture, you might find that in any particular problem the task you were thinking of throwing against AI can be replaced by a very small shell script (or insert programming language of choice here). (That, ironically, may or may not be generated by AI.)
That being said, there's a myriad of reasons to not have deterministic output. Human-readable copy, chatbots, and image creation are all well-known cases. However, the big one that is still a little bit shaky is *data classification*. If your particular use case is to chug through a large set of data and classify that data in certain ways, the question that needs to be on your mind is "What happens when there's a mistake?" and "Can this task be solved by a very small shell script?"
This is a rather prominent feature of Bedrock, and unfortunately, one of the more misunderstood features. For Amazon's own use case of "automating DevOps tasks", there's not really much you can customize in a model.
So why would you want to customize a model? The documentation says "to provide better output". Fair, but how do you determine what is "better"? Is your use case so niche that the current models aren't giving you the results you want? All fair questions to ask. This is the one aspect of bedrock where you're not going to be able to get away from testing. The challenge here is going to be balancing the subjective value of the current model output, vs the time (and effort) required to customize any particular model.
To customize the model, you need to provide it with a set of data in a particular format given the expected input and expected output. If you're thinking quickly on your feet you can say "Well, I'll just have AIs generate that based on my data". However, this will immediately land you in the problems of determinism, as well as put you in the position of having to answer the question: do you need a human reviewing all of this data? Now what about potentially gigabytes of data? And now suddenly you're balancing the potential model improvement vs the cost of having somebody pour through potentially gigabytes of text (or image) classification data.
This is probably the biggest differentiator of Bedrock from other GenAI tools out there. In essence, this feature points your chosen model at some particular data, then tells it "Based on this data, do X". If your data is initially correct, then fantastic. For instance, if your software documentation is quite good already, you can point it at your internal Wiki page, and say "Generate documentation based on the code that developers check in the same style/format for this wiki". Fantastic. But it doesn't necessarily solve the biggest issue of any system, computer or otherwise. Garbage in, garbage out.
The question to answer here is once again, what is the risk of potential bad data producing bad output? What is the accuracy that you need? Do you need 100% accuracy?
When discussing any language models, large or otherwise, what it generally comes down to is a corpus. In linguistic terms, this is a collection of text, that may (or may not be) categorized by subject, but is more frequently categorized by language (or dialect), and origin. Linguists develop such corpora for their own research needs to study language patterns and usage in a particular area. This is a pattern that holds true for any GenAI model and is no exception for Bedrock.
So the first question you have to ask yourself, is - do you have data to actually be run through the models? And the second question that you have to ask immediately, is "Is this data available in a form that we can feed to the model?" For text work, obviously, your data needs to be in available text. But if your data is in PDF that hasn't been OCR'd yet (or has been OCRd, but has not been checked for errors), then it might be worthwhile to take a step back and consider what you're trying to do with the data. For image work, obviously, you need images, simple enough. This is relevant for both customization of the model, as well as just simple data source usage.
Risk & the Human Factor
Throughout the above discussion, you've probably noticed me harping on risk. When I'm talking about capital R Risk it's not from the perspective of governments, politicians, or copyright holders, but from the purely business perspective. As consultants, the mission of Better Than Services is to solve business problems, and that comes with evaluating risks to businesses and projects, so it's something that I can never get away from. Often times we have to be aware of issues that may arise in the future from any given work, and advise our clients about it.
This is not to say that using Bedrock is risky in any way. In fact, it's no more risky than having a person do the same tasks by hand. Now if you're seasoned, that sentence probably just gave you the willies. The fun thing about GenAI is that in developing computer algorithms that are starting to get closer to live intelligence (not saying human here just yet), we also begin to operate in a world where these algorithms begin to make mistakes. We think of computers as always doing the same thing that we program them to unless there's a bug. 1+1 always equals 2, etc. However, with GenAI models, that's not really the case.
When reviewing any particular task for bedrock, it's pretty easy to follow a few simple steps to see if the juice is worth the squeeze, so to speak.
- Does the outcome of the task need to 100% be the same every time?
- If yes - Bedrock is not for you
- If not, Go for it!
- Can whatever I'm trying to do be replaced by a very small shell script?
- If yes, write a very small shell script
- If not, maybe Bedrock is for you!
- What is the risk profile of an incorrect output?
- Do I need to feed the model my own data?
- Is the data that I'm feeding the model up to good?
- Is the current model serving my needs?
- If not, do I have the data in the correct format needed to update the model?
Questions 1 and 2 really determine whether the use case is appropriate for AWS Bedrock. Everything else is just assessing the problem further. Answered those questions? Ready to implement? Or do you need some help even answering those questions? Reach out, we're happy to help!