Thursday, May 25, 2023
HomeNewsAn End-To-End RLHF...

An End-To-End RLHF Pipeline To Train ChatGPT-Like Models Is Available For Download From Microsoft AI’s DeepSpeed Chat.

- Advertisement -

It is not an exaggeration to suggest that ideas similar to ChatGPT have had a revolutionary impact on the online world. In order to increase the accessibility of ChatGPT-style models, the AI open-source community is engaged in several initiatives (such as ChatLLaMa, Alpaca, etc.). These models are incredibly adaptable and are capable of doing tasks like coding, translation, and summarization at or even better than human levels of proficiency.

A strong ChatGPT-like model still cannot be trained via a publicly accessible end-to-end RLHF pipeline, despite these commendable efforts. Even with access to such computer power, training efficiency is usually less than 5% of these machines’ capabilities. Despite having access to multi-GPU clusters, current systems cannot support the quick, cheap, and simple training of cutting-edge ChatGPT models with billions of features.

These limitations result from the fact that current DL systems, which are designed for more traditional pre-training and fine-tuning pipelines, do not adequately support the advanced RLHF training pipeline employed by InstructGPT. The Microsoft team is releasing DeepSpeed-Chat, which offers an end-to-end RLHF pipeline to train ChatGPT-like models, to increase the accessibility of RLHF training and the availability of ChatGPT-like models. It has these characteristics:

DeepSpeed-RLHF System: A strong and sophisticated system that combines DeepSpeed’s training and inference capabilities is called the Hybrid Engine (DeepSpeed-HE) for RLHF. With the aid of DeepSpeed-Inference optimisations like tensor-parallelism and high-performance transformer kernels for generation, as well as RLHF’s numerous memory optimisation techniques like ZeRO and LoRA, the Hybrid Engine can quickly move between RLHF’s inference and training modes. DeepSpeed-HE is additionally aware of the entire RLHF pipeline, which helps to further optimise memory management and data transmission throughout the various phases of RLHF. With its scale-unprecedented efficiency, the DeepSpeed-RLHF system enables the AI community to easily, swiftly, and affordably access training on intricate RLHF models.

Efficiency and affordability: RLHF training may be done quickly and affordably because DeepSpeed-HE is nearly 15 times faster than traditional systems.

Strong Scalability: DeepSpeed-HE can handle models with hundreds of billions of parameters because to its excellent scalability on multi-node multi-GPU systems.

Increasing Access to RLHF Education: Using just a single GPU for training, DeepSpeed-HE enables data scientists without access to multi-GPU systems to develop enormous, powerful RLHF models that can be used in real-world contexts.

To make the training process as efficient as possible, the researchers have integrated a complete end-to-end training pipeline into DeepSpeed-Chat and modelled it after InstructGPT.

There are three steps to the production process:

  1. Using supervised fine-tuning (SFT), which involves carefully choosing human responses to various questions, the pretrained language models are adjusted.
  2. After that, the group engages in “reward model fine-tuning,” which entails training an alternative (typically more compact than the SFT) model (RW) using a dataset that contains human-provided rankings of a variety of responses to the same question.
  3. The Proximal Policy Optimisation (PPO) algorithm is utilised in RLHF training to further modify the SFT model based on reward feedback from the RW model.

DeepSpeed-Chat is now accessible to the AI community because it is open-sourced. Users are encouraged to report issues, send PRs, and engage in discussions on the DeepSpeed GitHub page.

Get notified whenever we post something new!

Are you intrested to read future talk magazine?

Please consider giving us a monthly contribution if you can. Setting it up takes less than a minute, and you can be confident that every month, you're helping to support free, independent news. I'm grateful.

Continue reading

What is Jugalbandi? The Best Multilingual AI Chat Bot by AI4Bharat & Microsoft

Microsoft's latest innovation, Jugalbandi, is a groundbreaking generative AI-driven multilingual chatbot that can now be accessed through the popular messaging platform WhatsApp. This ingenious creation has been specifically designed to reach the remote areas of rural India, where traditional...

Tata Consultancy Services (TCS) has formed a partnership with Google Cloud to introduce its own Generative AI solution called TCS Generative AI.

Tata Consultancy Services (TCS) has recently revealed an enhanced collaboration with Google Cloud, unveiling their latest offering called TCS Generative AI. Leveraging the powerful generative AI services provided by Google Cloud, TCS Generative AI aims to deliver personalized and...

According to an AI expert, AI bots similar to StupidGPT and ChatGPT are significantly underestimated in terms of their intelligence.

According to AI expert Rodney Brooks, the perceived intelligence of AI bots like ChatGPT and StupidGPT has been greatly exaggerated. In an interview with IEEE Spectrum, Brooks argues that these language models are much less intelligent than commonly believed...

Enjoy exclusive access to all of our content

Get an online subscription and you can unlock any article you come across.