In this episode, our host sits down with Hongyi Wang, a senior researcher at Carnegie Mellon University, to discuss his groundbreaking research on low-rank model training. Hongyi Wang's expertise lies in distributed machine learning and parallel computing, with a specific focus on scaling up models and harnessing the power of multiple GPUs for training.
The conversation begins by drawing parallels between the distributed nature of the brain and the robustness of deep learning models. Just as the brain can function even with the loss of a single neuron, deep learning models can tolerate the removal of a parameter without significant consequences.
Currently, the state of the art in scaling up models relies on data parallelism. However, for large language models, more advanced techniques are required. This is where Cuttlefish, Hongyi Wang's innovative low-rank model training method, comes into play.
Cuttlefish is designed to optimize training and reduce model size without sacrificing too much accuracy. While it is primarily intended for pre-training, it can also be used for fine-tuning. By automatically detecting redundancy in the model, Cuttlefish achieves smaller models and faster training times.
Although there is a slight drop in accuracy when using Cuttlefish, the benefits in terms of model size and training speed make it a worthwhile trade-off. This technique is particularly valuable for pre-training large language models and can be adopted by both foundational model researchers and those seeking to extend models to specific domains.
Empirical results demonstrate that Cuttlefish can produce models that are 5.6 times smaller and 1.2 times faster to train compared to full-rank models.
For those interested in getting started with ML Ops and engineering in machine learning, the episode offers valuable advice. It is recommended to experiment with existing models and frameworks like Hugging Face, gradually gaining hands-on experience and understanding of the tasks and concepts involved.
The future of Hongyi Wang's research lies in leveraging foundation models for domain-specific applications, particularly in fields like AI for science. Additionally, he aims to democratize access to foundation models, making them more widely available for various use cases.
Listeners can follow Hongyi Wang on Twitter and access his Google Scholar page for more information on his groundbreaking work.
Hongyi Wang, a Senior Researcher at the Machine Learning Department at Carnegie Mellon University, joins us. His research is in the intersection of systems and machine learning. He discussed his research paper, Cuttlefish: Low-Rank Model Training without All the Tuning, on today’s show.
Hogyi started by sharing his thoughts on whether developers need to learn how to fine-tune models. He then spoke about the need to optimize the training of ML models, especially as these models grow bigger. He discussed how data centers have the hardware to train these large models but not the community. He then spoke about the Low-Rank Adaptation (LoRa) technique and where it is used.
Hongyi discussed the Cuttlefish model and how it edges LoRa. He shared the use cases of Cattlefish and who should use it. Rounding up, he gave his advice on how people can get into the machine learning field. He also shared his future research ideas.