AI solutions

LLM Economics: Which is Cheaper to deploy Open Source LLMs Vs OpenAI GPT models?

Sarfraz Nawaz

CEO and Founder of Ampcome

headings

Author :

Sarfraz Nawaz

Sarfraz Nawaz is the CEO and founder of Ampcome, which is at the forefront of Artificial Intelligence (AI) Development. Nawaz's passion for technology is matched by his commitment to creating solutions that drive real-world results. Under his leadership, Ampcome's team of talented engineers and developers craft innovative IT solutions that empower businesses to thrive in the ever-evolving technological landscape.Ampcome's success is a testament to Nawaz's dedication to excellence and his unwavering belief in the transformative power of technology.

Topic

AI solutions

Currently, the AI sector is witnessing a huge surge in large Language Models (LLMs) – both open-source and third-party.

As the world is taken over by promising generative AI applications businesses are looking forward to implementing AI systems leveraging these models. Generative AI can help to enhance customer relations, boost productivity, reduce costs and increase profits.

As companies across healthcare, finance, education, e-commerce and others are looking to implement AI systems using LLMs, a vital question arises:

Whether to use third-party LLMs or open-source LLMs like LLaMa?

Another question is – What is the cost difference between Open AI vs open source LLMs?

Your decision to choose between open source LLMs or Open AI models should be based on factors like request volume, quality, performance, data privacy, risks and more.

In this article, I will be discussing the cost of open source LLMs and Open AI. Further, we will also compare strategic factors that will help you choose the best model for your business.

Let’s get started!

‍‍

ChatGPT and Open Source LLMs

The introduction of Google Transformer Architecture in 2017 opened the doors for models like BERT, GPT and BART.

Soon researchers and industry experts started to realize the impact of LLMs and their potential to carry out natural language processing tasks and other actions beyond NLP.

OpenAI released the first GPT model, GPT-1 in 2018.

o It was pre-trained on a large corpus of text data and had 117 million parameters.

o The model was the first ground breaking model of its kind that could generate human like sentences and paragraphs.

Then in 2019, OpenAI came with GPT 2.

o It contained 1.5 billion parameters allowing it to produce more meaningful longer texts.

In 2020, OpenAI launched the largest and most advanced language model, GPT-3 with 175 billion parameters.

o It can generate texts, rewrite, summarize, classify and can even initiate human like conversations which makes it ideal for chatbots.

In 2022, OpenAI upgraded its GPT-3 model to GPT-3.5.

o It is trained on conversational data that enables the model to understand, engage and reply to natural language queries.

The fourth model in the GPT series, GPT-4 is a multimodal large language model.

o Users can access the GPT-4 in the plus version which is paid.

o Its commercial API is available through a waitlist.

o GPT-4, operating as a transformer model, underwent a two-step training process.

o Initially, it was pre-trained to anticipate the succeeding token by utilizing publicly available data as well as "data licensed from third-party providers."

o Subsequently, fine-tuning occurred using reinforcement learning, incorporating input from both human evaluators and AI feedback. This refinement process aimed to ensure human alignment and compliance with established policies.

After the enormous success of ChatGPT, various open source LLMs came into existence like Meta's LLaMa. LLaMa is a 65 billion parameters language model that outperforms GPT-3.

Recently, Microsoft and Meta collaboratively released LLaMa 2. It comes with various performance upgrades and features. It offers 7B, 13B and 70B pre trained and fine-tuned parameters models.

Llama 2 is available for both research and commercial use, accessible on platforms like Microsoft Azure and Amazon SageMaker. It is also compatible with Windows platforms such as Subsystem for Linux (WSL), Windows Terminal, Microsoft Visual Studio, and VS Code.

Standford fine tuned the LLaMa 7B model with 52k instructions and developed their Alpaca model.
Then a team of researchers fine tuned 13B parameters LLaMa model to build Vicuna which was successful in achieving 90% of the ChatGPT quality.
Stable Beluga 2 is an open access LLM based on LLaMa 2 70B model. It is fine tuned using a synthetically generated dataset in standard Alpaca format, employing the Supervised Fine-Tune (SFT) approach. Its performance levels in a few tasks have surpassed GPT-3.5 models.
Luna AI LLaMa 2 Uncensored is fine tuned using 40000 lengthy chat discussions that make it a highly advanced chatbot. It's response capability and uncensored nature outdoes ChatGPT.

There are other several exquisite models and chatbots which are fine tuned on open source LLMs. These models have shown extraordinary performance sometimes exceeding the capabilities of ChatGPT as well.

Language models fine tuned or trained on open source LLMs or ChatGPT are revolutionising various aspects of our work culture. From virtual chatbots, content creation tools, and translators, to customer assistants and other applications, language models are transforming the way we do business.

But the question is which approach should the companies take.

There are several business uses where employing ChatGPT is economical. For businesses whose request volume is less than or equal to 1000 requests per day.

However, with time if your request volume escalates to millions per day then ChatGPT will be very costly over time. In this case, fine tuning open source LLMs deployed to AWS is more affordable given the present pricing of AWS and ChatGPT.

‍

Cost Comparison between ChatGPT and Open-source LLMs

ChatGPT API Cost

As of now, ChatGPT costs $0.002/1K tokens and 1 token is equal to 3/4th a word.

The number of tokens is calculated based on the prompts and generated output. So, the total tokens are equal to prompts + generated output (in words).

Let’s take an example to understand this.

Assume that you enter a 100 word prompt for which you get 400 words of output. So, the total number of words in one query is around 500 words or 666 tokens.

Now if the chatbot is responding to 5000 queries in a day so it amounts to:

(($0.002/1000)x666*5,000)= ~$6.5 a day

Therefore, it's $200 a month.

Now what if one customer makes 4-5 queries?

After all, not everyone is an expert writer who can write perfect prompts in one go.

In this case, you will have over 200 thousand queries coming in a day. Now when you have thousands of queries coming every day your monthly expense will skyrocket exponentially.

You can also consider Claude 2 instead of OpenAI models. Anthropic’s Claude 2 is currently giving strong competition to GPT 3 and GPT 4 models. Claude 2 excels in coding, mathematics and logical thinking. It scored 71.2% on Codex HumanEval, a test designed to evaluate Python coding skills. Moreover, it can perfectly carry out PDF tasks, something which GPT 4 struggles with.

Another option is PaLM 2. It is the most advanced LLM by Google. It is capable of reasoning, coding, calculation, translation, question answering and natural language generation. Further it is available in four sizes: Gecko, Otter, Bison and Unicorn. The best thing is that it can work on mobile devices and applications based on this model can run offline too. This versatility means that you can fine tune PaLM 2 for a wide range of products in business.
‍

Open Source Large Language Models Costs

While open source models are free to use setting up their implementation and deployment can be costly.

If an organization wants to utilize large language models (LLMs) but doesn't want to construct its own data centre, a more feasible approach is to opt for cloud providers like AWS, Google Cloud, Azure, or smaller providers like Lambda Labs.

Many enterprises, including banks, healthcare providers, and telecommunications companies, already maintain strong partnerships with these major cloud providers, making this option highly appealing to them.

This way, they can easily host, train, and deploy LLMs without the need for extensive infrastructure investments.

Let’s dive into AWS costing for hosting open-source models and serving as APIs.

· A browser invokes a customer request which is then passed through Amazon API Gateway.
· The API Gateway triggers the Lambda which parses the function and sends it to AWS Sagemaker Endpoint.
· The AWS Sagemaker then invokes the model at the endpoint.

The Sagemaker costs are dependent on computing instances or hosting the models.

The Sagemaker pricing would cost $5-$6 per hour or $150 per day. In addition, the Lambda pricing and API gateways are about ~$10 and $1 per million requests.

So the cost of hosting an open source LLM on AWS will be $150 for 1000 requests per day and $160 for 1 M requests a day.

You should also keep in mind that hosting an open source model can be a challenge. But smaller language models like BERT containing 100s of parameters are still a good option to fine tune on domain specific data for businesses.

On the other hand the 70B LLaMa 2 model generates similar quality output as GPT-3.5 and outperforms Falcom, MPT and Vicuna. The LLaMa 2 30B chat model has a win rate of 75% against the Vicuna 33B and Falcom 40B models. If we consider the 70B chat model, then it wins over the PaLM-bison chat model by a huge margin.

‍

Pros and Cons Of OpenAI LLMs

Pros

· High quality responses

· Cost effective for low usage (1000 requests a day)

· Faster time to market

· Easy infrastructure set and deployment

· Minimal staff specialization on LLMs

Cons

· Risk of exposing data

· Becomes expensive over time with an increase in request volume

· Vendor lock in

· Unclear white label reselling license

‍

Pros and Cons Of Open Source LLMs

Pros

· Transparent and customizable

· Ideal and cost effective for high usage

· Fine tune models perform better than the GPT series for domain specific tasks

· Flexible to host infrastructure on any hardware device

Cons

· Large infrastructure setup cost

· Model quality can lower than OpenAI models

· Need specialized staff to train and maintain LLM models

· Requires correct Open Source License

‍

Closing Thoughts

So which is better – using the ChatGPT series or open source LLMs?

Your choice is entirely dependent on your use case. But in my opinion, ChatGPT is more proficient in performance and quality. The response quality of ChatGPT is more relevant than open source LLMs. However, with the launch of LLaMa 2, open source LLMs are also catching the pace. Moreover, as per your business requirements, fine tuning an open source LLM can be more effective in productivity as well as cost.

Companies who want to fine tune the LLMs on specific data to perform specific tasks might want to consider open source LLMs. It is because the OpenAI model might not perform on domain specific data.

Other reasons for companies to choose open source models are transparency and control over the codes and customizations. Businesses can have full access to the underlying code and can customize it as per their need. This enables them to configure the model to their privacy, security, performance and compliance requirements.

Ultimately the choice to choose closed models like ChatGPT and open source LLMs depends on performance, business needs and priorities. By considering factors like specific task needs, security, data privacy, and customizations, you can easily decide which approach is best suited for you.

---------------------------------------------------------------------------------------------------

Is finding the right tech partner to unlock AI benefits in your business hectic?

Ampcome is here to help. With decades of experience in data science, machine learning, and AI, I have led my team to build top-notch tech solutions for reputed businesses worldwide.

Let’s discuss how to propel your business!

If you are into AI, LLMs, Digital Transformation, and the Tech world – do follow Sarfraz Nawaz on LinkedIn.

Author :

Sarfraz Nawaz

Topic

AI solutions