Apple is partnering with Nvidia to improve the speed of its artificial intelligence (AI) models. The company announced that it was looking into accelerating inference on the Nvidia platform to see if it was possible to improve both the efficiency and latency of a large language model (LLM) simultaneously
Apple used a technology called Recurrent Drafter (ReDrafter) which was combined with the Nvidia TensorRT-LLM inference acceleration framework
Apple researchers detailed the new collaboration with Nvidia to improve LLM performance and the results achieved from it
The company highlighted that it was looking into the problem of improving inference efficiency while maintaining latency in AI models. Inference in machine learning refers to the process of making predictions, decisions, or conclusions based on a given set of data or inputs while using a trained model.
This processing step in the AI model, which decodes claims and converts raw data into invisible processing information, was first announced
earlier this year
Apple has released and open-sourced ReDrafter, bringing a new approach to speculative data decoding. Using a recurrent neural network (RNN) model, it combines radial search (a mechanism where the AI explores multiple possibilities for a solution) and dynamic tree attention (tree structure data is processed using an attention mechanism)