The post DirectML: Accelerating AI on Windows, now with NPUs appeared first on Windows AI Platform.
]]>An NPU is a processor built for enabling machine learning (ML) workloads that are computationally intensive, do not require graphics interactions, and provide efficient power consumption. These new devices will revolutionize how AI transforms our day-to-day experiences. We are excited to share that early next year we will release the DirectML support for Intel® Core Ultra processors with Intel® AI Boost, the new integrated NPU.
DirectML is a low-level, hardware abstracted API that provides direct access to hardware capabilities of modern devices, such as GPUs, for ML workloads. It is part of the DirectX family—the Windows graphics and gaming platform—and is designed to integrate with other DirectX components, such as DirectX12. DirectML integrates with popular ML and tooling frameworks, such as the cross-platform inference engine, the ONNX Runtime and Olive, the Windows optimization tooling framework for ML models, thus simplifying the development and deployment of AI experiences across the Windows ecosystem.
By extending the hardware acceleration capabilities to include NPU support in DirectML, we are opening new possibilities for AI on Windows. DirectML with NPU support will be in developer preview in early 2024, along with the latest ONNX Runtime release, with broadening support over 2024. Stay tuned for more announcements with key partners, expanded capabilities, and how to use DirectML for NPUs.
We can’t wait to see the amazing AI experiences you will create on Windows, with DirectML on Intel® Core Ultra processors.
More information from our partner Intel®: https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Unlocking-Intel-s-Neural-Processing-Unit-with-DirectML/post/1553676
The post DirectML: Accelerating AI on Windows, now with NPUs appeared first on Windows AI Platform.
]]>The post Boost Your Gen AI Experience with our DirectML extension for Automatic1111’s WebUI appeared first on Windows AI Platform.
]]>We didn’t want to stop there, since many users access Stable Diffusion through Automatic1111’s webUI, a popular and versatile interface for Stable Diffusion.
Luckily this webUI supports extensions, and today we’re thrilled to reveal our DirectML extension!
Specifically, our extension offers DirectML support for the compute-heavy uNet models in Stable Diffusion. This unlocks the ability to run Automatic1111’s webUI performantly on wide range of GPUs from different vendors across the Windows ecosystem.
https://github.com/microsoft/Stable-Diffusion-WebUI-DirectML
Our DirectML extensions means that users can see performance wins after running an Olive optimization pass.
These perf wins exist across a range of hardware – up to 2.8x in our tests!
Note that the unoptimized and optimized Stable Diffusion models were in the ONNX format in our tests to ensure an apples-to-apples comparison. Performance was measured inside Automatic1111’s webUI.
Thank you to our partners for their work enabling Stable Diffusion on DirectML! For more from our partners, see:
To install our extension, launch the webUI and open the Extensions tab. Once there, go to Install from URL and paste in this URL (https://github.com/microsoft/Stable-Diffusion-WebUI-DirectML) before clicking Install.
This extension runs best after a converted Stable Diffusion model has been put through an Olive optimization pass.
Stable Diffusion models with different checkpoints and/or weights but the same architecture and layers as Stable Diffusion 1.5, 2.0 and 2.1 are also compatible. See our sample to get started.
Once this is complete, follow our extension’s readme for next steps.
We recommend upgrading to the latest drivers for the best performance.
We cannot guarantee that all existing Automatic1111 functionality works with our DirectML extension. We welcome contributions from the community though; please see https://github.com/microsoft/Stable-Diffusion-WebUI-DirectML#contributing
The post Boost Your Gen AI Experience with our DirectML extension for Automatic1111’s WebUI appeared first on Windows AI Platform.
]]>The post Announcing preview support for Llama 2 in DirectML appeared first on Windows AI Platform.
]]>We now have a sample showing our progress with Llama 2 7B!
See https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2
This sample relies on first doing an optimization pass on the model with Olive, a powerful optimization tool for ONNX models. Olive utilizes powerful graph fusion optimizations from the ONNX Runtime and a model architecture optimized for DirectML to speed up inference times by up to 10X!
After this optimization pass, Llama 2 7B runs fast enough that you can have a conversation in real time on multiple vendors’ hardware!
We’ve also built a little UI to make it easy to see the optimized model in action.
Thank you to our hardware partners who helped make this happen. For more on how Llama 2 lights up on our partners’ hardware with DirectML, see:
We’re excited about this milestone, but this is only a first peek – stay tuned for future enhancements to support even larger models, fine-tuning and lower-precision data types.
To run our Olive optimization pass in our sample you should first request access to the Llama 2 weights from Meta.
We recommend upgrading to the latest drivers for the best performance.
The post Announcing preview support for Llama 2 in DirectML appeared first on Windows AI Platform.
]]>The post DirectML at Build 2023 appeared first on Windows AI Platform.
]]>We are also excited to announce the launch of our new product landing page! This new product page provides the information you need to bring performant, cross-hardware AI acceleration into your app.
We partnered with Adobe and Intel to showcase how DirectML makes it possible for developers to integrate machine learning models into their applications to leverage next-generation hardware. Want to learn more? Check out Adobe Premiere Pro leverages DirectML on new AI Silicon for all the exciting details!
Get ready to take your AI models to the next level with Olive (ONNX Live)- a powerful tool for optimizing ONNX models that integrates seamlessly with ONNX Runtime and DirectML.
With Olive, you’ll be able to optimize your models like never before, thanks to its advanced techniques that incorporate cutting-edge model compression, optimization, and compilation methods. When you pair Olive’s capabilities with DirectML, you’ll get lightning-fast hardware acceleration across the entire range of Windows GPUs.
Find out more in our blog post: Optimize DirectML performance with Olive.
Text-to-image models, like Stable Diffusion, convert natural language into remarkable images. DirectML optimizations for the Windows hardware ecosystem enhance the performance of transformer and diffusion models, including Stable diffusion, enabling more efficient execution. The DirectML optimizations aim to empower developers to seamlessly integrate AI hardware acceleration into their applications at scale. Check out our DirectML Stable Diffusion blog post to learn more.
We’re entering a new era of AI experiences that span across the cloud and edge. During Build last year, Hybrid Loop was first introduced, and this year we are thrilled to announce that the Hybrid Loop has become a reality. Our goal is to empower developers by reducing their workload and enabling seamless hybrid inferencing across Azure and client devices – DirectML plays a key role in this, allowing developers to scale inferencing to GPUs and soon to NPUs.
Together with Olive and ONNX Runtime, DirectML is a part of a cutting-edge hybrid platform that enables efficient deployment of AI experiences across the Windows hardware ecosystem. We can’t wait to see what you’ll create with this groundbreaking technology!
The post DirectML at Build 2023 appeared first on Windows AI Platform.
]]>The post Adobe Premiere Pro leverages DirectML on new AI Silicon appeared first on Windows AI Platform.
]]>
Adobe Premiere Pro leverages DirectML today to power many features with AI, such as Auto Reframe and Scene Edit Detection:
DirectML with ONNX Runtime empowers Adobe to light up hardware acceleration for these features, with the same code path across a range of different GPUs. A truly simplified integration experience!
Together with our partners at Intel we worked with Adobe to demo Auto Reframe and Scene Edit Detection on Intel’s next-generation platform, Meteor Lake with integrated VPU. Check out Intel’s blog for more Meteor Lake VPU details: New AI Engine Delivers Power Efficient AI with DirectML
Thanks to DirectML, Adobe could bring these existing ML features, which run on current GPUs, to Intel’s new silicon. Here’s what Adobe had to say:
“Innovative AI-powered Adobe Premiere Pro features including Scene Edit Detection and Auto Reframe are enabling creators to rapidly produce world-changing content. We trust DirectML to enable fast and reliable AI experiences on Windows, ensuring seamless performance across devices. DirectML helps us meet the needs of customers of any size, making great use of new AI accelerator hardware such as the VPU in Intel’s upcoming Meteor Lake Platform while freeing up CPU and GPU resources for other pro video tasks.”
– Sriram Iyer, Head of Product & Partnerships, Digital Audio & Video, Adobe
This is just the beginning of the next generation of artificial intelligence on Windows. Along with our partners and our decades of GPU expertise, DirectML is extending to support a whole new class of AI silicon for a seamless developer experience. The demo provides a glimpse into what DirectML is working to make possible, stretching capabilities across the varied Windows hardware ecosystem.
To learn more about DirectML, visit our website.
The post Adobe Premiere Pro leverages DirectML on new AI Silicon appeared first on Windows AI Platform.
]]>The post Optimize DirectML performance with Olive appeared first on Windows AI Platform.
]]>With Olive, you can easily incorporate cutting-edge techniques like model compression, optimization, and compilation, all in one powerful tool. And the best part? You don’t need to be an expert in optimizing models for underlying GPUs or NPUs – Olive does all the heavy lifting for you to get the best possible performance with DirectML!
In our Stable Diffusion tests, we saw over 6x speed increase to generate an image after optimizing with Olive for DirectML!
The Olive workflow consists of configuring passes to optimize a model for one or more metrics. Olive then executes each pass to find the best candidate model. Our recommended passes for GPU optimization with DirectML are as follows:
For configuring multi-model pipelines (e.g. Stable Diffusion), see our sample on the Olive repository. To learn more about configuring Olive passes, visit: Configuring Pass — Olive documentation (microsoft.github.io)
With Olive, you’ll be able to take your AI models to the next level. Say goodbye to complicated optimization processes and hello to a streamlined, efficient workflow. To get started, check out our Olive & DirectML samples and stay tuned for additional DirectML samples like quantization.
The post Optimize DirectML performance with Olive appeared first on Windows AI Platform.
]]>The post DirectML ❤ Stable Diffusion appeared first on Windows AI Platform.
]]>We are demonstrating what can be done with Stable Diffusion models in two of our Build sessions: Shaping the future of work with AI and Deliver AI-powered experiences across cloud and edge, with Windows.
We’ve optimized DirectML to accelerate transformer and diffusion models, like Stable Diffusion, so that they run even better across the Windows hardware ecosystem. Our goal is to enable developers to infuse apps with AI hardware acceleration at scale. For more on how Stable Diffusion lights up on our partners’ hardware with DML, check out:
We worked closely with the Olive team to build a powerful optimization tool that leverages DirectML to produce models that are optimized to run across the Windows ecosystem. For more on Olive with DirectML, check out our post, Optimize DirectML performance with Olive
You can use Olive to ensure your Stable Diffusion model works as well as possible with DirectML. Make sure your model is in the ONNX format; you can use Olive to do this conversion. Once you’ve done this, follow the steps in our DML and Olive blog post
See here for a sample that shows how to optimize a Stable Diffusion model. We’ve tested this with CompVis/stable-diffusion-v1-4 and runwayml/stable-diffusion-v1-5. Stable Diffusion models with different checkpoints and/or weights but the same architecture and layers as these models will work well with Olive.
Check out tomorrow’s Build Breakout Session to see Stable Diffusion in action: Deliver AI-powered experiences across cloud and edge, with Windows
See here for a Python sample showing how to use Stable Diffusion with Olive.
We also built some samples to show how you can use DirectML in general in C++. For more links to help you get started, check out our documentation and helpful links page.
We recommend upgrading to the latest drivers for the best performance.
AMD: AMD has released optimized graphics drivers supporting AMD RDNA 3 devices including AMD Radeon
RX 7900 Series graphics cards. Download AMD Software: Adrenalin Edition 23.5.2
Intel: Developers interested in Intel drivers supporting Stable Diffusion on DirectML should contact Intel Developer Relations for additional details
NVIDIA: Users of NVIDIA GeForce RTX 20, 30 and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 532.03
The AI space is changing fast! In case you run into any problems feel free to open an issue on our Github repo or email us at [email protected]
The post DirectML ❤ Stable Diffusion appeared first on Windows AI Platform.
]]>The post Transformer support for PyTorch with DirectML is here! appeared first on Windows AI Platform.
]]>This release of PyTorch with DirectML also includes improved memory consumption capabilities to unlock faster performance and the ability to use larger batch sizes.
Finally, PyTorch with DirectML now follows a Plugin model with support for the latest version of PyTorch (1.13). After installing PyTorch, simply pip install torch-directml to get started. Once you’ve installed the Torch-DirectML plugin, you can begin training AI models starting with the following lines:
import torch
import torch_directml
dml = torch_directml.device()
tensor = torch.tensor([1]).to(dml) # Note that dml is a variable, not a string!
Please note that this release of the Torch-DirectML plugin is mapped to the “PrivateUse1” Torch backend. The new torch.directml.device() API is a convenient wrapper for sending your tensors to the DirectML device. Now you’re ready to train your models using PyTorch with DirectML!
Please leave any questions, suggestions, or issues here on GitHub. Our team is constantly engaging with the community and would love to hear your input!
The post Transformer support for PyTorch with DirectML is here! appeared first on Windows AI Platform.
]]>The post Real-Time Image Blurring & DirectX Resource Binding in the Windows ML Samples Gallery appeared first on Windows AI Platform.
]]>
Linnea May
Real-time machine learning inference is a hot topic in ML, with applications such as real-time object detection in autonomous cars or background image blurring of a video stream during a work call. This new sample demonstrates the best practices to follow with Windows Machine Learning (Windows ML) and Microsoft Media Foundation to run background image blurring during inference on a real-time video stream at 30+ frames per second when using a dedicated GPU.
Background image blurring running at 30 frames per second.
The sample uses an asynchronous Media Foundation Transform (MFT) to apply an effect to the input video stream.
The MFT will start by asynchronously requesting a new video sample from the capture source which in this case is the camera. This sample is converted into a shareable VideoFrame then run through the composition of three models: a preprocessing model, a Fully Convolutional Network, and a postprocessing model. Once the sample has been transformed, it’s returned to the Media Foundation processing pipeline via an asynchronous callback to the preview sink, which displays the sample to the screen.
The basic data flow of the Background Blur sample.
Microsoft.AI.MachineLearningExperimental.LearningModelJoinOptions fuses together three stages of the model:
Windows Machine Learning Resources
Resource | Description |
Background Blur Sample Code | Source code for the background image blur sample |
Windows ML Experimental API | Documentation for the LearningModelBuilder and LearningModelJoinOptions |
ORT model building unit tests | A good intro to building models with LearningModelBuilder and LearningModelJoinOptions |
ONNX operators schema | ONNX operators that can be used when building a model with LearningModelBuilder |
ONNX Model Zoo FCN | Open-source models in the ONNX format |
Media Foundation Resources
Resource | Description |
Introduction to Media Foundation Transforms | Start here to learn more about Media Foundation Transforms as a model for processing media data |
Windows Async MFT Sample | Demonstrates how to create an asynchronous Media Foundation Transform |
Windows Capture Engine Sample | Demonstrates how to use the Media Foundation CaptureEngine to capture video |
Numfor Mbiziwo-Tiapo
Learn how to bind and inference on Direct3D 12 Resources using ONNX Runtime in the DX Resource Binding ORT Sample.
DX Resource Binding Sample
In this sample, images are drawn to the screen using Direct3D 12. The images are then preprocessed and inferenced on in real-time using the ONNX Runtime C++ API.
The general structure of the sample is stated below:
The sample follows this sequence of steps:
The Windows ML Samples Gallery can be downloaded from the Microsoft Store or from GitHub. We encourage you to try it out and give feedback by reporting issues or requesting new samples on the issues page.
Stay tuned to the Windows AI Blog for more updates and news!
The post Real-Time Image Blurring & DirectX Resource Binding in the Windows ML Samples Gallery appeared first on Windows AI Platform.
]]>The post DirectML Plugin for TensorFlow 2 is here appeared first on Windows AI Platform.
]]>TensorFlow-DirectML-Plugin builds DirectML as a PluggableDevice backend to TensorFlow 2 for machine learning training on Windows and the Windows Subsystem for Linux. DirectML is an ML library that enables model acceleration across all DirectX 12 compatible GPUs.
Our pluggable device enables users of the latest version of TensorFlow to accelerate model training on a broad range of DX12-capable GPUs, including cards from AMD, Intel, and NVIDIA.
Using TensorFlow-DirectML-Plugin for TensorFlow 2.9 is simple. Our pluggable device package is installable through PyPI without requiring any changes to your already-existing scripts.
The plugin works with TensorFlow core and easily integrates with versions 2.9 and newer of the tensorflow or tensorflow-cpu packages to seamlessly register your existing GPU.
Learn more about installing in our Docs
We want to encourage all of you to pick up our TensorFlow-DirectML-Plugin and try it in your current workflow. If you prefer a tutorial, we have prepared samples for training SqueezeNet on GitHub.
This initial preview of our plugin package will support most basic machine learning models with increased model support and performance optimizations planned for subsequent releases.
Leave any questions, suggestions, or issues here on GitHub. Our team is constantly engaging with the community and would love to hear your input!
The post DirectML Plugin for TensorFlow 2 is here appeared first on Windows AI Platform.
]]>