The post NVIDIA Enhances cuML Accessibility by Reducing CUDA Binary Size for PyPI Distribution appeared on BitcoinEthereumNews.com. Timothy Morano Dec 15, 2025The post NVIDIA Enhances cuML Accessibility by Reducing CUDA Binary Size for PyPI Distribution appeared on BitcoinEthereumNews.com. Timothy Morano Dec 15, 2025

NVIDIA Enhances cuML Accessibility by Reducing CUDA Binary Size for PyPI Distribution

2025/12/16 15:44


Timothy Morano
Dec 15, 2025 18:01

NVIDIA introduces pip-installable cuML wheels on PyPI, simplifying installation and broadening accessibility by reducing CUDA binary sizes.

NVIDIA has announced a significant improvement for users of its cuML library by reducing the size of CUDA binaries, enabling direct distribution on PyPI. This marks a pivotal step in making cuML more accessible, especially for those in corporate environments who rely on internal PyPI mirrors, according to NVIDIA’s blog.

Streamlined Installation Process

With the release of version 25.10, cuML wheels are now pip-installable directly from PyPI, eliminating the need for complex installation steps or managing Conda environments. Users can now install cuML with a simple pip command, akin to any other Python package, which greatly simplifies the process.

Challenges in Binary Size Reduction

The primary hurdle NVIDIA faced was the large size of CUDA C++ libraries, which previously exceeded PyPI’s hosting capabilities. To address this, NVIDIA collaborated with the Python Software Foundation (PSF) to reduce the binary size sufficiently for hosting on PyPI. This collaboration has made it possible for users to install cuML directly, enhancing both accessibility and user experience.

Installation Guidance

For users installing cuML, NVIDIA has provided specific pip commands based on the CUDA version:

  • For CUDA 13: pip install cuml-cu13 (Wheel size: ~250 MB)
  • For CUDA 12: pip install cuml-cu12 (Wheel size: ~470 MB)

Binary Size Optimization Techniques

To reduce the binary size by approximately 30%, NVIDIA employed several optimization techniques. These included identifying and eliminating excess in the CUDA C++ codebase, which led to a reduction of the CUDA 12 libcuml dynamic shared object from 690 MB to 490 MB. The optimization not only facilitates faster downloads and reduced storage but also lowers bandwidth costs and accelerates container builds for deployment.

Understanding CUDA Compilation

CUDA binaries are inherently large due to the inclusion of numerous kernels, which are cross-products of template parameters and supported GPU architectures. NVIDIA’s approach involved separating kernel function definitions from their declarations, ensuring each kernel is compiled in one Translation Unit (TU), thereby reducing duplication and binary size.

Future Prospects

By making these improvements, NVIDIA aims to assist other developers working with CUDA C++ libraries in managing binary sizes effectively. This initiative not only benefits cuML users but also encourages a broader adoption of CUDA C++ libraries by making them more manageable and accessible.

For further insights on CUDA programming and optimization techniques, developers can refer to NVIDIA’s CUDA Programming Guide.

Image source: Shutterstock

Source: https://blockchain.news/news/nvidia-enhances-cuml-accessibility-reducing-cuda-binary-size-pypi-distribution

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

The Channel Factories We’ve Been Waiting For

The Channel Factories We’ve Been Waiting For

The post The Channel Factories We’ve Been Waiting For appeared on BitcoinEthereumNews.com. Visions of future technology are often prescient about the broad strokes while flubbing the details. The tablets in “2001: A Space Odyssey” do indeed look like iPads, but you never see the astronauts paying for subscriptions or wasting hours on Candy Crush.  Channel factories are one vision that arose early in the history of the Lightning Network to address some challenges that Lightning has faced from the beginning. Despite having grown to become Bitcoin’s most successful layer-2 scaling solution, with instant and low-fee payments, Lightning’s scale is limited by its reliance on payment channels. Although Lightning shifts most transactions off-chain, each payment channel still requires an on-chain transaction to open and (usually) another to close. As adoption grows, pressure on the blockchain grows with it. The need for a more scalable approach to managing channels is clear. Channel factories were supposed to meet this need, but where are they? In 2025, subnetworks are emerging that revive the impetus of channel factories with some new details that vastly increase their potential. They are natively interoperable with Lightning and achieve greater scale by allowing a group of participants to open a shared multisig UTXO and create multiple bilateral channels, which reduces the number of on-chain transactions and improves capital efficiency. Achieving greater scale by reducing complexity, Ark and Spark perform the same function as traditional channel factories with new designs and additional capabilities based on shared UTXOs.  Channel Factories 101 Channel factories have been around since the inception of Lightning. A factory is a multiparty contract where multiple users (not just two, as in a Dryja-Poon channel) cooperatively lock funds in a single multisig UTXO. They can open, close and update channels off-chain without updating the blockchain for each operation. Only when participants leave or the factory dissolves is an on-chain transaction…
Share
BitcoinEthereumNews2025/09/18 00:09
SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime

SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime

The post SOLANA NETWORK Withstands 6 Tbps DDoS Without Downtime appeared on BitcoinEthereumNews.com. In a pivotal week for crypto infrastructure, the Solana network
Share
BitcoinEthereumNews2025/12/16 20:44
Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be

Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be

The post Why The Green Bay Packers Must Take The Cleveland Browns Seriously — As Hard As That Might Be appeared on BitcoinEthereumNews.com. Jordan Love and the Green Bay Packers are off to a 2-0 start. Getty Images The Green Bay Packers are, once again, one of the NFL’s better teams. The Cleveland Browns are, once again, one of the league’s doormats. It’s why unbeaten Green Bay (2-0) is a 8-point favorite at winless Cleveland (0-2) Sunday according to betmgm.com. The money line is also Green Bay -500. Most expect this to be a Packers’ rout, and it very well could be. But Green Bay knows taking anyone in this league for granted can prove costly. “I think if you look at their roster, the paper, who they have on that team, what they can do, they got a lot of talent and things can turn around quickly for them,” Packers safety Xavier McKinney said. “We just got to kind of keep that in mind and know we not just walking into something and they just going to lay down. That’s not what they going to do.” The Browns certainly haven’t laid down on defense. Far from. Cleveland is allowing an NFL-best 191.5 yards per game. The Browns gave up 141 yards to Cincinnati in Week 1, including just seven in the second half, but still lost, 17-16. Cleveland has given up an NFL-best 45.5 rushing yards per game and just 2.1 rushing yards per attempt. “The biggest thing is our defensive line is much, much improved over last year and I think we’ve got back to our personality,” defensive coordinator Jim Schwartz said recently. “When we play our best, our D-line leads us there as our engine.” The Browns rank third in the league in passing defense, allowing just 146.0 yards per game. Cleveland has also gone 30 straight games without allowing a 300-yard passer, the longest active streak in the NFL.…
Share
BitcoinEthereumNews2025/09/18 00:41