Curated List of AI Research Papers🌟
🚧 This page will constantly be kept updated 🚧
Welcome to a meticulously curated collection of groundbreaking AI research papers spanning across various domains such as computer vision, natural language processing (NLP), audio processing, multimodal learning, and reinforcement learning. This compilation is designed to serve as a beacon for enthusiasts and professionals alike, navigating the vast sea of AI advancements.
Quick Navigation
Classification Key
- 🏆 Foundational Papers: Over 10k citations, significantly impacting AI's evolution.
- ⭐ Significant Papers: More than 50 citations, showcasing state-of-the-art findings.
- ⏫ Emerging Trends: Innovative papers with 1 to 50 citations, demonstrating potential.
- 📰 Key Articles: Notable works presented in formats other than research papers.
Recent (2024)
Multimodal Learning & Computer Vision
- AIM: Vision Models with an Autoregressive Objective: Introducing a suite of vision models designed for versatile applications, pre-trained using an autoregressive approach to set new benchmarks in visual tasks. Read the paper | Explore the Github
- OGEN: Enhancing Vision-Language Model Generalization: This paper proposes a novel methodology aimed at significantly improving the generalization capabilities of vision-language models across varied domains. Discover more
- MLLM-Guided Image Editing (MGIE): Apple AI pioneers instruction-based image editing, making it possible to generate expressive, detailed modifications through a more intuitive interface. Learn about MGIE
- VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Learn about VASA-1
- VideoGigaGAN: Towards Detail-rich Video Super-Resolution Learn about VideoGigaGAN
- OpenVoice: Versatile Instant Voice Cloning Learn about OpenVoice
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Learn about StoryDiffusion
- xLSTM: Extended Long Short-Term MemoryLearn about xLSTM
- Low-Rank Adaptation (LoRA) Learn about Low-Rank Adaptation
- Cosine-Similarity of Embeddings: Learn about Cosine-Similarity
- Your Transformer is Secretly Linear: Learn more
- Grokked Transformers are Implicit Reasoners - A Mechanistic Journey to the Edge of Generalization :Learn more
Natural Language Processing
- AlignInstruct: Tackling Low-Resource Language Challenges: A groundbreaking solution for machine translation that addresses the challenges posed by unseen languages and low-resource settings. Explore the breakthrough
- WRAP: Synthetic Data for Language Model Pre-training: Presented by CMU and Apple, WRAP introduces an innovative approach to pre-train language models using synthetic data, enhancing the model's learning efficiency. Read the paper
- Context Understanding in Large Language Models: In collaboration with Georgetown University, Apple explores the capabilities of large language models in understanding context, presenting a new benchmark for evaluation. Dive into the research
- Optimizing Language Model Training: This research unpacks the trade-offs involved in training language models, seeking the optimal balance between pretraining depth, specialization, and computational efficiency. Unpack the insights
Audio Processing
- Acoustic Model Fusion: Apple proposes a novel approach to drastically reduce word error rates in speech recognition systems through the fusion of acoustic models. Learn how
Metrics & Evaluation
- LiDAR: Evaluating Representation Quality in JE Architectures: Apple researchers introduce a new metric for assessing the quality of representations within Joint Embedding Architectures, aiming to refine evaluation processes. Investigate the methodology
Highlights (2023)
Computer Vision
- Muse: Text-To-Image Generation: Introducing a new era of text-to-image generation with Muse, leveraging masked generative transformers. Read more
- Structure and Content-Guided Video Synthesis: Unveiling Gen-1, a model that synthesizes video by understanding structure and content. Discover
- Scaling Vision Transformers (ViT 22B): Pushing the limits with a 22 billion parameter vision transformer model. Explore
- High-Resolution Video Synthesis with VideoLDM: A leap towards aligning latents for unprecedented video synthesis quality. Learn more
Natural Language Processing (NLP)
- DetectGPT: A groundbreaking approach for zero-shot detection of machine-generated text. Read more
- Toolformer: Empowering language models to autonomously learn and utilize digital tools. Discover
- GPT-4: OpenAI's latest iteration, setting new standards for generative language models. Explore
Audio Processing
- VALL-E: Revolutionizing text to speech with zero-shot text-to-speech synthesizers. Read more
- MusicLM: A novel approach to generating music directly from text prompts. Discover
- AudioLDM: Leveraging latent diffusion models for high-fidelity text-to-audio generation. Explore
Multimodal Learning
- Kosmos-1: Aligning perception with language models for enhanced understanding. Read more
- PaLM-E: An embodied multimodal language model breaking new ground in AI interactions. Discover
Reinforcement Learning
- DreamerV3: Mastering diverse domains through innovative world models. Read more
- Direct Preference Optimization (DPO): A novel method where language models serve as reward models. Discover
Other Noteworthy Papers
- Symbolic Discovery of Optimization Algorithms (Lion): Pioneering symbolic methods for discovering new optimization algorithms. Explore
- RT-2: Enhancing robotic control with vision-language-action models. Learn more
Notable Contributions (2022)
Computer Vision
- A ConvNet for the 2020s (ConvNeXt): Elevating convolutional networks into the 2020s with advanced architectural improvements. Read more
- Block-NeRF: Introducing scalable solutions for large scene neural view synthesis. Discover
- DALL-E 2: Revolutionizing hierarchical text-conditional image generation with CLIP latents. Explore
- DreamFusion: A leap in text-to-3D content creation using 2D diffusion. Learn more
Natural Language Processing (NLP)
- LaMBDA: Pioneering dialog applications with advanced language models. Read more
- InstructGPT: A new paradigm in language model training with human feedback. Discover
- ChatGPT: OpenAI's innovative approach to optimizing language models for dialogue. Explore
Audio Processing
- mSLAM: Advancing joint pre-training for speech and text in a multitude of languages. Read more
- AudioLM: Proposing a language modeling approach to audio generation, paving new pathways. Discover
Multimodal Learning
- BLIP: Bootstrapping language-image pre-training for unified vision-language understanding. Read more
- Gato: Introducing a generalist agent capable of performing a diverse range of tasks. Discover
Reinforcement Learning
- Sophy: Demonstrating superior performance in Gran Turismo with reinforcement learning. Read more
- AlphaTensor: Discovering faster matrix multiplication algorithms through RL. Discover
Other Noteworthy Papers
- FourCastNet: A global, data-driven approach to high-resolution weather modeling. Explore
- ColabFold: Making protein folding accessible to all, marking a significant leap in bioinformatics. Learn more
These selections from 2022 highlight the dynamic and expansive nature of AI research, touching on various fields from computer vision to NLP, audio processing, multimodal learning, and reinforcement learning, driving forward our understanding and capabilities within artificial intelligence.
Foundational Works
Classic Machine Learning
- 1958: Perceptron: A probabilistic model for information storage and organization in the brain (Perceptron)
- 1986: Learning representations by back-propagating errors (Backpropagation)
- 1986: Induction of decision trees (CART)
- 1992: A training algorithm for optimal margin classifiers (SVM)
- 1996: Bagging predictors
- 2001: Random Forests
- 2001: A fast and elitist multiobjective genetic algorithm (NSGA-II)
Neural Networks and Deep Learning
- 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (HMM)
- 1989: Multilayer feedforward networks are universal approximators
- 1998: Gradient-based learning applied to document recognition (CNN/GTN)
- 2003: Latent Dirichlet Allocation (LDA)
- 2006: Reducing the Dimensionality of Data with Neural Networks (Autoencoder)
- 2008: Visualizing Data using t-SNE (t-SNE)
- 2012: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
- 2013: Efficient Estimation of Word Representations in Vector Space (Word2vec)
- 2013: Auto-Encoding Variational Bayes (VAE)
- 2014: Generative Adversarial Networks (GAN)
- 2014: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Dropout)
- 2014: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
- 2014: Adam: A Method for Stochastic Optimization (Adam)
- 2015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariance Shift (BatchNorm)
- 2015: Going Deeper With Convolutions (Inception)
- 2015: Human-level control through deep reinforcement learning (Deep Q Network)
- 2015: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Faster R-CNN)
- 2015: U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net)
- 2015: Deep Residual Learning for Image Recognition (ResNet)
- 2016: You Only Look Once: Unified, Real-Time Object Detection (YOLO)
- 2017: Attention is All you Need (Transformer)
- 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)
- 2020: Language Models are Few-Shot Learners (GPT-3)
- 2020: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
- 2021: Highly accurate protein structure prediction with AlphaFold (Alphafold)