Blog Archives

Red Hat AI Factory with NVIDIA Accelerates the Path to Scalable Production AI

New co-engineered offering combines Red Hat AI Enterprise and NVIDIA’s accelerated computing software to provide a unified foundation for building, deploying, and scaling AI-enabled applications

Red Hat, a leading provider of open source solutions, today announced the Red Hat AI Factory with NVIDIA, a co-engineered software platform that combines Red Hat AI Enterprise and NVIDIA AI Enterprise to provide an end-to-end AI solution optimized for organizations deploying AI at scale. Red Hat AI Factory with NVIDIA is the latest milestone in the companies’ deep collaboration, accelerating the delivery of the newest AI innovations to enterprise customers today while also delivering Day 0 support for NVIDIA hardware architectures.  

With enterprise AI spending expected to reach over $1 trillion by 2029, driven in large part by agentic AI applications, organizations are looking to shift their strategies toward high-density, agentic workflows and address the resulting demands on AI inference and infrastructure. To help organizations keep pace, Red Hat AI Factory with NVIDIA empowers IT operations teams to streamline management of both traditional infrastructure and the evolving demands of the AI stack. 

“The shift from AI experimentation to industrial-scale, enterprise-wide production requires a fundamental change in how we manage the AI computing stack,” said Chris Wright, chief technology officer and senior vice president, Global Engineering, Red Hat. “We’re accelerating the path to deploy AI and move quickly to production using Red Hat AI Factory with NVIDIA. With a stable, high-performance foundation driven by our proven hybrid cloud offerings, we’re enabling our customers to own their AI strategy and scale with the same rigor they apply to their core IT platforms.”

Red Hat AI Factory with NVIDIA accelerates the path to production AI, from provisioning the underlying infrastructure to fueling higher performance for the models and GPUs driving the inference stack. This empowers IT administrators and operations teams to scale and maintain AI deployments with the same operational rigor and predictability as any enterprise workload.

This co-engineered software platform integrates the open source collaboration, engineering and support expertise of both Red Hat and NVIDIA to deliver a trusted, enterprise-grade solution. The Red Hat AI Factory with NVIDIA provides a highly scalable foundation for AI deployments across any environment, whether on-premises, in the cloud or at the edge. It includes core capabilities for high-performance AI inference, model tuning, customization and agent deployment and management, with a focus on security. This allows organizations to maintain architectural control from the datacenter to the public cloud, delivering:

  • Accelerated time-to-value: Advance to production AI with streamlined workflows and instant access to pre-configured models, including the indemnified IBM Granite family, NVIDIA Nemotron, and NVIDIA Cosmos open models, delivered as NVIDIA NIM microservices. Additionally, organizations can further align models to enterprise data using NVIDIA NeMo, reducing tuning time and cost. 
  • Optimized performance and cost: Maximize infrastructure usage and bolster inference performance with a unified, high-performance serving stack. Red Hat AI Factory with NVIDIA delivers built-in observability capabilities and taps Red Hat AI inference capabilities powered by vLLM, NVIDIA TensorRT-LLM, NVIDIA Dynamo, and NVIDIA BlueField to meet strict AI service level objectives. This helps organizations reduce the total cost of ownership (TCO) for AI by optimizing the connection between models and NVIDIA GPUs.
  • Strengthened enterprise posture: Leveraging the flexible and stable foundation of Red Hat Enterprise Linux, organizations benefit from advanced security and compliance capabilities built-in from the start that help to lower risk, save time and mitigate downtime. This delivers a security-hardened foundation for mission-critical AI workloads that require isolation and continuous verification.

“The next era of enterprise AI is about real-time action and tangible business return, and that requires an industrial-strength, hybrid foundation,” said Vlad Rozanovich, senior vice president, Infrastructure Solutions Group, Lenovo. “We can bring a scalable, enterprise-grade platform that combines Lenovo’s inferencing-optimized infrastructure with offerings like Red Hat AI Enterprise and the Red Hat AI Factory with NVIDIA, to give customers the real-time advantage – a resilient foundation for agentic AI that is deployable and manageable anywhere they operate.”

Red Hat Completes Acquisition of Neural Magic to Fuel Optimized Generative AI Innovation Across the Hybrid Cloud

Neural Magic’s generative AI performance engineering and model optimization algorithms will enhance Red Hat AI, helping make production AI more accessible and achievable

Red Hat, Inc., a leading provider of open source solutions, announced that it has completed its acquisition of Neural Magic, a pioneer in software and algorithms that accelerate generative AI (gen AI) inference workloads. With Neural Magic, Red Hat adds expertise in inference performance engineering and model optimization, helping further the company’s vision of high-performing AI workloads that directly map to unique customer use cases, wherever needed across the hybrid cloud.

“By adding Neural Magic’s expertise in gen AI performance engineering and optimization to Red Hat AI, we’re furthering our commitment to a gen AI that answers customers’ unique needs, from where workloads run to how they are tuned and trained,” said Matt Hicks, president and CEO, Red Hat.

The large language models (LLMs) underpinning today’s gen AI use cases, while innovative, are often too expensive and resource-intensive for most organizations to use effectively. To address these challenges, Red Hat views smaller, optimized and open source-licensed models driven by open innovation across compute architectures and deployment environments as key to the future success of AI strategies.

Neural Magic’s commitment to making optimized and efficient AI models a reality furthers Red Hat’s ability to deliver on this vision for AI. Neural Magic is also a leading contributor to vLLM, an open source project developed by UC Berkeley for open model serving, which will help bring even greater choice and accessibility in how organizations build and deploy AI workloads.

The future of hybrid cloud-ready gen AI

With Neural Magic’s technology and performance engineering expertise, Red Hat aims to break through the challenges of wide-scale enterprise AI, using open source innovation to further democratize access to AI’s transformative power via:

  • Open source-licensed models, from the 1B to 100’s of billions parameter scale, that can run anywhere and everywhere needed across the hybrid cloud – in corporate data centers, on multiple clouds and at the edge.
  • Fine-tuning capabilities that enable organizations to more easily customize LLMs to their private data and uses cases with a stronger security footprint;
  • Inference performance engineering expertise, resulting in greater operational and infrastructure efficiencies; and
  • A partner and open source ecosystem and support structures that enable broader customer choice, from LLMs and tooling to certified server hardware and underlying chip architectures.

The concept of choice is as crucial for gen AI today as it was cloud-native or containerized applications several years ago: The right environment (cloud, server, edge, etc.), accelerated compute and inference server are all critical for successful gen AI strategies. Red Hat remains firm in its commitment to customer choice across the hybrid cloud, including AI, with the acquisition of Neural Magic furthering supporting this promise.

Red Hat AI: An open source backbone for AI

The expertise and capabilities of Neural Magic will be incorporated into Red Hat AI, Red Hat’s portfolio of gen AI platforms. Built with the hybrid cloud in mind, Red Hat AI encompasses:

  • Red Hat Enterprise Linux AI (RHEL AI), a foundation model platform to more seamlessly develop, test and run the IBM Granite family of open source-licensed LLMs for enterprise applications on Linux server deployments;
  • Red Hat OpenShift AI, an AI platform that provides tools to rapidly develop, train, serve and monitor machine learning models across distributed Kubernetes environments on-site, in the public cloud or at the edge; and
  • InstructLab, an approachable open source AI community project created by Red Hat and IBM that enables anyone to shape the future of gen AI via the collaborative improvement of open source-licensed Granite LLMs using InstructLab’s fine-tuning technology.

vLLM, LLM Compressor, pre-optimized models and more are all slated to be incorporated into Red Hat AI, making Neural Magic an integral piece of Red Hat’s AI platform offerings.

Matt Hicks, president and CEO, Red Hat

“Efficiency, optimization and choice aren’t unique concepts when it comes to traditional enterprise IT, and we feel that gen AI should be no different,” said Matt Hicks, president and CEO, Red Hat. “By adding Neural Magic’s expertise in gen AI performance engineering and optimization to Red Hat AI, we’re furthering our commitment to a gen AI that answers customers’ unique needs, from where workloads run to how they are tuned and trained.”

“Neural Magic’s research and technical contributions to open source AI have significantly reduced the infrastructure required to deploy state-of-the-art large language models at scale,” said Brian Stevens, CEO, Neural Magic. “Red Hat shares our vision that the Future of AI is Open, and we are looking forward to together enabling enterprises to capture the value of GenAI without all of the friction.”

Red Hat Delivers Next Wave of Gen AI Innovation with New Red Hat Enterprise Linux AI Capabilities

RHEL AI 1.3 adds the Granite 3.0 8b model, helps simplify the preparation of AI training data and expands support for the latest accelerated compute hardware

Red Hat, Inc., a leading provider of open source solutions, today announced the latest release of Red Hat Enterprise Linux AI (RHEL AI), Red Hat’s foundation model platform for more seamlessly developing, testing and running generative artificial intelligence (gen AI) models for enterprise applications. RHEL AI 1.3 brings support for the latest advancements in the Granite large language model (LLM) family and incorporates open source advancements for data preparation while still maintaining expanded choice for hybrid cloud deployments, including the underlying accelerated compute architecture.

According to IDC’s “Market Analysis Perspective: Open GenAI, LLMs, and the Evolving Open Source Ecosystem,” 61% of respondents plan to use open source foundation models for gen AI use cases, while more than 56% of deployed foundation models are already open source. Red Hat sees this trend validating the company’s vision for enterprise gen AI, which calls for:

  • Smaller, open source-licensed models that can run anywhere and everywhere needed across the hybrid cloud.
  • Fine-tuning capabilities that enable organizations to more easily customize LLMs to private data and specific use cases.
  • Optimized and more efficient AI models driven by inference performance engineering expertise.
  • The backing of a strong partner and open source ecosystem for broader customer choice.

RHEL AI forms a key pillar for Red Hat’s AI vision, bringing together the open source-licensed Granite model family and InstructLab model alignment tools, based on the Large-scale Alignment for chatBots (LAB) methodology. These components are then packaged as an optimized, bootable Red Hat Enterprise Linux image for individual server deployments anywhere across the hybrid cloud.

Support for Granite 3.0 LLMs

RHEL AI 1.3 extends Red Hat’s commitment to Granite LLMs with support for Granite 3.0 8b English language use cases. Granite 3.0 8b is a converged model, supporting not only English but a dozen other natural languages, code generation and function calling. Non-English language use cases, as well as code and functions, are available as a developer preview within RHEL AI 1.3, with the expectation that these capabilities will be supported in future RHEL AI releases.

Simplifying data preparation with Docling

Recently open sourced by IBM Research, Docling is an upstream community project that helps parse common document formats and convert them into formats like Markdown and JSON, preparing this content for gen AI applications and training. RHEL AI 1.3 now incorporates this innovation as a supported feature, enabling users to convert PDFs into Markdown for simplified data ingestion for model tuning with InstructLab.

Through Docling, RHEL AI 1.3 now also includes context-aware chunking, which takes into account the structure and semantic elements of the documents used for gen AI training. This helps resulting gen AI applications maintain better levels of coherency and contextually-appropriate responses to questions and tasks, which otherwise would require further tuning and alignment.

Future RHEL AI releases will continue to support and refine Docling components, including additional document formats as well as integration for retrieval-augmented generation (RAG) pipelines in addition to InstructLab knowledge tuning.

Broadening the gen AI ecosystem

Choice is a fundamental component of the hybrid cloud and with gen AI serving as a signature workload for hybrid environments, this optionality needs to start with the underlying chip architectures. RHEL AI already supports leading accelerators from NVIDIA and AMD, and the 1.3 release now includes Intel Gaudi 3 as a technology preview.

Beyond chip architecture, RHEL AI is supported across major cloud providers, including AWS, Google Cloud and Microsoft Azure consoles as a “bring your own subscription” (BYOS) offering. The platform is also available soon as an optimized and validated solution option on Azure Marketplace and AWS Marketplace.

RHEL AI is available as a preferred foundation model platform on accelerated hardware offerings from Red Hat partners, including Dell PowerEdge R760xa servers and Lenovo ThinkSystem SR675 V3 servers.

Model serving improvements with Red Hat OpenShift AI

As users look to scale out the serving of LLMs, Red Hat OpenShift AI now supports parallelized serving across multiple nodes with vLLM runtimes, providing the ability to handle multiple requests in real-time. Red Hat OpenShift AI also allows users to dynamically alter an LLM’s parameters when being served, such as sharding the model across multiple GPUs or quantizing the model to a smaller footprint. These improvements are aimed at speeding up response time for users, increasing customer satisfaction and lowering churn.

Supporting Red Hat AI

RHEL AI, along with Red Hat OpenShift AI, underpins Red Hat AI, Red Hat’s portfolio of solutions that accelerate time to market and reduce the operational cost of delivering AI solutions across the hybrid cloud. RHEL AI supports individual Linux server environments, while Red Hat OpenShift AI powers distributed Kubernetes platform environments and provides integrated machine-learning operations (MLOps) capabilities. Both solutions are compatible with each other, with Red Hat OpenShift AI will incorporate all of RHEL AI’s capabilities to be delivered at scale.

Availability

RHEL AI 1.3 is now generally available. More information on additional features, improvements, bug fixes and how to upgrade to the latest version can be found here.

Red Hat Delivers Accessible, Open Source Generative AI Innovation with Red Hat Enterprise Linux AI 

The offering is the first to deliver supported, indemnified and open source-licensed IBM Granite LLMs under Red Hat’s flexible and proven enterprise subscription model. Adds open source InstructLab model alignment tools to the world’s leading enterprise Linux platform to simplify generative AI model experimentation and alignment tuning

Red Hat, Inc., a leading provider of open source solutions, today announced the launch of Red Hat Enterprise Linux AI (RHEL AI), a foundation model platform that enables users to more seamlessly develop, test and deploy generative AI (GenAI) models. RHEL AI brings together the open source-licensed Granite large language model (LLM) family from IBM Research, InstructLab model alignment tools based on the LAB (Large-scale Alignment for chatBots) methodology and a community-driven approach to model development through the InstructLab project. The entire solution is packaged as an optimized, bootable RHEL image for individual server deployments across the hybrid cloud and is also included as part of OpenShift AI, Red Hat’s hybrid machine learning operations (MLOps) platform, for running models and InstructLab at scale across distributed cluster environments. 

The launch of ChatGPT generated tremendous interest in GenAI, with the pace of innovation only accelerating since then. Enterprises have begun moving from early evaluations of GenAI services to building out AI-enabled applications. A rapidly growing ecosystem of open model options has spurred further AI innovation and illustrated that there won’t be “one model to rule them all.” Customers will benefit from an array of choices to address specific requirements, all of which stands to be further accelerated by an open approach to innovation. 

Implementing an AI strategy requires more than simply selecting a model; technology organizations need the expertise to tune a given model for their specific use case, as well as deal with the significant costs of AI implementation. The scarcity of data science skills are compounded by substantial financial requirements including: 

  • Procuring AI infrastructure or consuming AI services 
  • The complex process of tuning AI models for specific business needs ● Integrating AI into enterprise applications 
  • Managing both the application and model lifecycle. 

To truly lower the entry barriers for AI innovation, enterprises need to be able to expand the roster of who can work on AI initiatives while simultaneously getting these costs under control. With InstructLab alignment tools, Granite models and RHEL AI, Red Hat aims to apply the benefits of true open source projects – freely accessible and reusable, transparent and open to contributions – to GenAI in an effort to remove these obstacles. 

Building AI in the open with InstructLab 

IBM Research created the Large-scale Alignment for chatBots (LAB) technique, an approach for model alignment that uses taxonomy-guided synthetic data generation and a novel multi-phase tuning framework. This approach makes AI model development more open and accessible to all users by reducing reliance on expensive human annotations and proprietary models. Using the LAB method, models can be improved by specifying skills and knowledge attached to a taxonomy, generating synthetic data from that information at scale to influence the model and using the generated data for model training. 

After seeing that the LAB method could help significantly improve model performance, IBM and Red Hat decided to launch InstructLab, an open source community built around the LAB method and the open source Granite models from IBM. The InstructLab project aims to put LLM development into the hands of developers by making, building and contributing to an LLM as simple as contributing to any other open source project. 

As part of the InstructLab launch, IBM has also released a family of select Granite English language and code models in the open. These models are released under an Apache license with transparency on the datasets used to train these models. The Granite 7B English language model has been integrated into the InstructLab community, where end users can contribute the skills and knowledge to collectively enhance this model, just as they would when contributing to any other open source project. Similar support for Granite code models within InstructLab will be available soon. 

Open source AI innovation on a trusted Linux backbone – RHEL AI builds on this open approach to AI innovation, incorporating an enterprise-ready version of the InstructLab project and the Granite language and code models along with the world’s leading enterprise Linux platform to simplify deployment across a hybrid infrastructure environment. This creates a foundation model platform for bringing open source-licensed GenAI models into the enterprise. RHEL AI includes: 

  • Open source-licensed Granite language and code models that are supported and indemnified by Red Hat. 
  • A supported, lifecycled distribution of InstructLab that provides a scalable, cost-effective solution for enhancing LLM capabilities and making knowledge and skills contributions accessible to a much wider range of users. 
  • Optimized bootable model runtime instances with Granite models and InstructLab tooling packages as bootable RHEL images via RHEL image mode, including optimized Pytorch runtime libraries and accelerators for AMD Instinct™ MI300X, Intel and NVIDIA GPUs and NeMo frameworks. 
  • Red Hat’s complete enterprise support and lifecycle promise that starts with a trusted enterprise product distribution, 24×7 production support and extended lifecycle support. 

As organizations experiment and tune new AI models on RHEL AI, they have a ready on-ramp for scaling these workflows with Red Hat OpenShift AI, which will include RHEL AI, and where they can leverage OpenShift’s Kubernetes engine to train and serve AI models at scale and OpenShift AI’s integrated MLOps capabilities to manage the model lifecycle. IBM’s watsonx.ai enterprise studio, which is built on Red Hat OpenShift AI today, will benefit from the inclusion of RHEL AI in OpenShift AI upon availability, bringing additional capabilities for enterprise AI development, data management, model governance and improved price performance. 

The cloud is hybrid. So is AI. 

For more than 30 years, open source technologies have paired rapid innovation with greatly reduced IT costs and lowered barriers to innovation. Red Hat has been leading this charge for nearly as long, from delivering open enterprise Linux platforms with RHEL in the early 2000s to driving containers and Kubernetes as the foundation for open hybrid cloud and cloud-native computing with Red Hat OpenShift. 

This drive continues with Red Hat powering AI/ML strategies across the open hybrid cloud, enabling AI workloads to run where data lives, whether in the datacenter, multiple public clouds or at the edge. More than just the workloads, Red Hat’s vision for AI brings model training and tuning down this same path to better address limitations around data sovereignty, compliance and operational integrity. The consistency delivered by Red Hat’s platforms across these environments, no matter where they run, is crucial in keeping AI innovation flowing. 

RHEL AI and the InstructLab community further deliver on this vision, breaking down many of the barriers to experimenting with and building AI models while providing the tools, data and concepts needed to fuel the next wave of intelligent workloads. 

Availability 

Red Hat Enterprise Linux AI is now available as a developer preview. Building on the GPU infrastructure available on IBM Cloud, which is used to train the Granite models and support InstructLab, IBM Cloud will now be adding support for RHEL AI and OpenShift AI. This integration will allow enterprises to deploy generative AI more easily into their mission critical applications.