VinaLlama2 & VinaLlama2-Code

We are excited to introduce our second generation Vietnamese Large Language Model, VinaLlama2. Leveraging the Qwen1.5 series, we teamed up with Alibaba Qwen to deliver the best performance LLMs in the Vietnamese language under 4 versions (Turbo, Standard, Pro and SUPER). We also launched our code variant, VinaLlama2-Code, ensuring one of the most asked features from the first version of VinaLlama.

Enhanced Performance

with Quality Data

Our dataset for VinaLlama2 is built from the ground up. We obtained licenses for literature and coding books from various sources, ensuring the legal aspect of our training data. We also cut the synthetic data used in VinaLlama1, but curated a new synthetic data set.

Instead of 800B tokens, we cut our highly curated continued-pretraining dataset to just 80B tokens this time. Furthermore, we also combined all finetuning stages (continued-pretraining, Supervised-Finetuning and DPO) into one single operation called Odds Ratio Preference Optimization (ORPO).

Compute

VinaLlama2 takes model scaling to the next level. The models are trained using 64 clusters of H100, provided by Alibaba Cloud.





Benchmarks

*Note: We obtained VMLU-Slim by eliminated all Factual Retrieval questions from the dataset. Then using human labelers and GPT-4 agents to obtain the answers.

VinaLlama2 is Multimodal

VinaLlama2-Preview, which will be release this July will feature multimodality between text, image and audio. Here are the preview

 

Question: Làm sao để đi từ Sân Băng đến CGV CINEMA?

Answer: Để đi từ Sân Băng đến CGV Cinema, vui lòng đi qua con đường có các cửa hàng "Mother Care", "Tan Phu". Vui lòng rẽ phải tại "HighLand Cofe"

Question: Hãy miêu tả âm thanh sau

Answer: Đoạn ghi âm có tiếng cười của rất nhiều người, có thể là của khán giả trong một buổi biểu diễn.

VinaLlama2 Code

Based on CodeQwen-7B. VinaLlama2-Code ensures the best performance in coding tasks in Vietnamese.

 

Build with VinaLlama


We are also calling for ideas under our new incentives program for Vietnamese LLM Developers to build applications with VinaLlama2. In this program, you will be funded (up to $2000/team) and provide with free access to VinaLlama2 for a period of time.

For more information, please stay tune.


In the meantime, we are looking for researchers, engineers and business owners to test and give us your feedback for VinaLlama2, please fill out this form for early access to the models.