GitHub - facebookresearch/perception_models: State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
github.comΒ·3h
🌊Dataflow Languages
Preview
Report Post

Perception Models: Powerful Models for Image, Video, and Audio Perception

This repo is the home to the state-of-the-art for image and video perception: Perception Encoder (PE) for image, video, audio encoding, and Perception Language Model (PLM) for decoding.

Updates

  • [Dec-16-25]: We have released the Perception Encoder Audio-Visual (PE-AV) and Perception Encoder Audio-Frame (PE-A-Frame) models: [Blog][[paper](https://ai.meta.com/research/publications/pushing-the-frontier-of-audiovisual-perception-with-large-…

Similar Posts

Loading similar posts...