How to run local AI model in Python extension
In the Nyra framework, extensions can leverage third-party AI services or run AI models locally to enhance performance and reduce costs. This tutorial outlines the process of running a local AI model within a Python extension and demonstrates how to interact with it from within the extension.
Step 1: Check Hardware Requirements
Before running an AI model locally, ensure that your hardware meets the required specifications. Key components to verify include:
CPU/GPU: Confirm if the model demands specific processing capabilities.
Memory: Ensure there is enough memory to load and execute the model effectively.
Make sure your system is capable of supporting the model's requirements for optimal performance.
Step 2: Install Necessary Software and Dependencies
Ensure Hardware Compatibility: Verify that your system meets the necessary hardware requirements to run the AI model. Key components to check include processing power (CPU/GPU) and memory capacity.
Confirm Operating System Compatibility: Ensure your operating system is compatible with the AI model. While most AI frameworks support Windows, macOS, and Linux, it's essential to verify that the specific version of your OS meets the model’s requirements.
Verify Python Version: Confirm that your Python version aligns with the Nyra Python runtime and the specific AI model you intend to run, ensuring compatibility with both the framework and the model's dependencies.
Install Required Libraries: Install the necessary libraries that the model requires to function efficiently. This may include:
TensorFlow
PyTorch
Numpy
vllm
For convenience, you can create a
requirements.txtfile to automate the installation process for these dependencies.Download the AI Model: Acquire the local version of the AI model that you wish to deploy, ensuring that it matches the specifications required for local execution within your setup.
Step 3: Implement Your Python Extension
Below is an example of how to implement a basic text generation feature using the vllm inference engine in a Python extension.
First, initialize the local model within the extension
from nyra import (
Extension,
NyraEnv,
Cmd,
CmdResult,
)
from vllm import LLM
class TextGenerationExtension(Extension):
def on_init(self, nyra_env: NyraEnv) -> None:
self.llm = LLM(model="<model_path>")
nyra_env.on_init_done()Next, implement the on_cmd method to handle text generation based on the provided input:
def on_cmd(self, nyra_env: NyraEnv, cmd: Cmd) -> None:
prompt = cmd.get_property_string("prompt")
outputs = self.llm.generate(prompt)
generated_text = outputs[0].outputs[0].text
cmd_result = CmdResult.create(StatusCode.OK)
cmd_result.set_property_string("result", generated_text)
Nyra_env.return_result(cmd_result, cmd)In this implementation, the on_cmd method extracts the prompt, generates the corresponding text using the model, and returns the generated output as the result of the command.
This approach can be extended to support additional functionalities, such as image recognition or speech-to-text, by appropriately processing the relevant input data types.
Step 4: Unload the Model
It’s important to unload the model during extension cleanup to free resources:
import gc
import torch
class TextGenerationExtension(Extension):
...
def on_deinit(self, nyra_env: NyraEnv) -> None:
del self.llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully deleted the LLM pipeline and freed GPU memory!")
nyra_env.on_deinit_done()This ensures efficient memory management, especially when working with GPU resources.
Summary
Running a local model within a Nyra Python extension mirrors the experience of native Python development. By appropriately loading and unloading the model within the relevant lifecycle methods of the extension, you can seamlessly integrate local AI models and interact with them in an efficient and structured manner.
Last updated