Stars
Mobile-Agent: The Powerful GUI Agent Family
Building a comprehensive and handy list of papers for GUI agents
LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.
GUI Dataset Collector: A Tool for Capturing and Annotating GUI Interactions with annotations in COCO format
This repo contains the Hugging Face Deep Reinforcement Learning Course.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Seamless operability between C++11 and Python
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use Nvidia Triton in Streaming use-cases ( hard to find in their…
High performance self-hosted photo and video management solution.
Efficient Triton Kernels for LLM Training
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
llama3 implementation one matrix multiplication at a time
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
VisualWebArena is a benchmark for multimodal agents.
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
🎓 Path to a free self-taught education in Computer Science!



