From Cloud to Edge Rethinking Generative AI for Low-Resource Design Challenges

rw-book-cover

Metadata

Author: arxiv.org
Full Title: From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges
Type: #snippet✂️
Document Tags: #tech
URL: https://arxiv.org/html/2402.12702v2

Highlights

The need for tiny AI models in remote areas exists for two main reasons. Firstly, these regions frequently face challenges such as limited internet connectivity and inadequate computational resources. In such scenarios, cloud-dependent large AI models are impractical. Tiny ML models tailored for offline use can overcome these barriers, bringing the power of advanced design tools to isolated communities. This democratization not only fuels local innovation but also ensures that cutting-edge AI-assisted design solutions are not the exclusive domain of well-resourced urban centers (View Highlight)
Secondly, engineering design problems in remote areas often possess unique contextual nuances - from local material constraints to specific environmental considerations and product usage patterns. Offline ML models, trained on relevant local data, can better capture and respond to these nuances, offering more tailored and effective design solutions in spaces such as reliable sewage repair or low-cost greenhouse design. (View Highlight)
By reducing reliance on large data centers and continuous internet connectivity, they contribute to lower energy consumption and a smaller carbon footprint, which is particularly crucial in ecologically sensitive remote areas (View Highlight)
Model Pruning: The process of removing non-critical and redundant components of a model without a significant loss in performance. With respect to LLMs, this can mean removing weights with smaller gradients or magnitudes and parameter reduction among others. Novel pruning methods like Wanda (Sun et al. 2023) and LLM-Pruner (Ma, Fang, and Wang 2023) present optimal solutions for making LLMs smaller. (View Highlight)
Quantization: Representing model parameters such as weights in a lower precision, i.e. using fewer bits to store the value (Gholami et al. 2021). This results in a smaller model size, faster inference, and a reduced memory footprint. LLM Quantization can be achieved either in the post-training phase (Dettmers et al. 2022) or during the pre-training or fine-tuning phase (View Highlight)
Knowledge Distillation: Transferring the knowledge of a large teacher model to a smaller learner model to replicate the original model’s output distribution difference. Knowledge Distillation has been widely used to reduce LLMs like BERT into smaller distilled versions DistilBERT (Sanh et al. 2020). More recently, approaches like MiniLLM (Gu et al. 2023) and (Hsieh et al. 2023) further optimize the distillation process to improve the student model’s performance and inference speed. (View Highlight)
Edge computing involves processing data closer to the location where it is needed, rather than relying on a central data-processing warehouse. This can significantly reduce the need for continuous, high-speed internet connectivity. One solution is to deploy AI models on edge devices, like smartphones and local servers, which can operate with intermittent connectivity (Singh and Gill 2023; Marculescu, Marculescu, and Ogras 2020). (View Highlight)
Similarly, TinyML is a field of Machine Learning that deals with deploying lighter models in resource-constrained devices that have limited computing infrastructure. Certain key aspects of TinyML include reducing latency by leveraging edge computing, optimizing models for computing at the edge and developing new tools and equipment that can facilitate running ML models on edge devices. These features enable TinyML models to provide increased user privacy due to data being processed locally, reduced power consumption and internet usage (Dutta and Bharali 2021; Ray 2022). (View Highlight)