Support deployment of models to dedicated inference devices on my network
A
Alexander Robb
Add support to connect to local inference devices like the DGX Spark, Mac Mini, Dell, etc to enable remote inference on machines build for inference without need complex networking or kubernetes skills. Ideally a user could install desktop to so version of a desktop server on the remote compute and them manage their models from the personal workstation while running those models remotely. "Desktop Super" computers is a relatively new segment but will be appealing for industries that cant afford to build and support a server rack / on prem data center and need to keep data within their network. Small to med sized medical facilities and law firms might be some good examples. 4x mac studios or DGX sparks are less expensive than 1x h100 and should be nough compute to run the latest generation of models (400-500B parameters) at a q_4 quantizations