这篇教程CVPR 2022 论文开源目录写得很实用,希望能帮到您。
【CVPR 2022 论文开源目录】
Backbone
A ConvNet for the 2020s
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
MPViT : Multi-Path Vision Transformer for Dense Prediction
Mobile-Former: Bridging MobileNet and Transformer
MetaFormer is Actually What You Need for Vision
Shunted Self-Attention via Multi-Scale Token Aggregation
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Learned Queries for Efficient Local Attention
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
CLIP
HairCLIP: Design Your Hair by Text and Reference Image
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
GAN
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Style Transformer for Image Inversion and Editing
Unsupervised Image-to-Image Translation with Generative Prior
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
OSSGAN: Open-set Semi-supervised Image Generation
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
GNN
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
MLP
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
NAS
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
OCR
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Point-NeRF: Point-based Neural Radiance Fields
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Urban Radiance Fields
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
3D Face
ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations
长尾分布(Long-Tail)
Retrieval Augmented Classification for Long-Tail Visual Recognition
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
MetaFormer is Actually What You Need for Vision
Mobile-Former: Bridging MobileNet and Transformer
Shunted Self-Attention via Multi-Scale Token Aggregation
Learned Queries for Efficient Local Attention
应用(Application)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Embracing Single Stride 3D Object Detector with Sparse Transformer
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
GroupViT: Semantic Segmentation Emerges from Text Supervision
Restormer: Efficient Transformer for High-Resolution Image Restoration
Splicing ViT Features for Semantic Appearance Transfer
Self-supervised Video Transformer
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Accelerating DETR Convergence via Semantic-Aligned Matching
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Style Transformer for Image Inversion and Editing
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Mask Transfiner for High-Quality Instance Segmentation
Language as Queries for Referring Video Object Segmentation
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Collaborative Transformers for Grounded Situation Recognition
NFormer: Robust Person Re-identification with Neighbor Transformer
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Safe Self-Refinement for Transformer-based Domain Adaptation
Fast Point Transformer
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
视觉和语言(Vision-Language)
Conditional Prompt Learning for Vision-Language Models
Bridging Video-text Retrieval with Multiple Choice Question
Visual Abductive Reasoning
自监督学习(Self-supervised Learning)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Crafting Better Contrastive Views for Siamese Representation Learning
HCSC: Hierarchical Contrastive Selective Coding
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
数据增强(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMixup: Improving Representations By Interpolating Aligned Features
知识蒸馏(Knowledge Distillation)
Decoupled Knowledge Distillation
目标检测(Object Detection)
BoxeR: Box-Attention for 2D and 3D Transformers
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Accelerating DETR Convergence via Semantic-Aligned Matching
Localization Distillation for Dense Object Detection
Focal and Global Knowledge Distillation for Detectors
A Dual Weighting Label Assignment Scheme for Object Detection
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection
半监督目标检测
Dense Learning based Semi-Supervised Object Detection
目标跟踪(Visual Tracking)
Correlation-Aware Deep Tracking
TCTrack: Temporal Contexts for Aerial Tracking
多模态目标跟踪
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
多目标跟踪(Multi-Object Tracking)
Learning of Global Objective for Network Flow in Multi-Object Tracking
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
语义分割(Semantic Segmentation)
Novel Class Discovery in Semantic Segmentation
Deep Hierarchical Semantic Segmentation
Rethinking Semantic Segmentation: A Prototype View
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
域自适应语义分割
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
少样本语义分割
Generalized Few-shot Semantic Segmentation
实例分割(Instance Segmentation)
BoxeR: Box-Attention for 2D and 3D Transformers
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Temporally Efficient Vision Transformer for Video Instance Segmentation
全景分割(Panoptic Segmentation)
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
小样本分类(Few-Shot Classification)
Integrative Few-Shot Learning for Classification and Segmentation
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
小样本分割(Few-Shot Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Integrative Few-Shot Learning for Classification and Segmentation
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
图像抠图(Image Matting)
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
视频理解(Video Understanding)
Self-supervised Video Transformer
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
行为识别(Action Recognition)
Spatio-temporal Relation Modeling for Few-shot Action Recognition
动作检测(Action Detection)
End-to-End Semi-Supervised Learning for Video Action Detection
图像编辑(Image Editing)
Style Transformer for Image Inversion and Editing
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Restormer: Efficient Transformer for High-Resolution Image Restoration
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements
超分辨率(Super-Resolution)
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
去模糊(Deblur)
图像去模糊(Image Deblur)
Learning to Deblur using Light Field Generated and Real Defocus Images
3D点云(3D Point Cloud)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
A Unified Query-based Paradigm for Point Cloud Understanding
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
Fast Point Transformer
RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds
The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution
3D目标检测(3D Object Detection)
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds
BoxeR: Box-Attention for 2D and 3D Transformers
Embracing Single Stride 3D Object Detector with Sparse Transformer
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
HyperDet3D: Learning a Scene-conditioned 3D Object Detector
OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
3D语义分割(3D Semantic Segmentation)
Scribble-Supervised LiDAR Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
3D实例分割(3D Instance Segmentation)
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
3D目标跟踪(3D Object Tracking)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
PTTR: Relational 3D Point Cloud Object Tracking with Transformer
3D人体姿态估计(3D Human Pose Estimation)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
BEV: Putting People in their Place: Monocular Regression of 3D People in Depth
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
行人重识别(Person Re-identification)
NFormer: Robust Person Re-identification with Neighbor Transformer
伪装物体检测(Camouflaged Object Detection)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
深度估计(Depth Estimation)
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
Multi-Frame Self-Supervised Depth with Transformers
立体匹配(Stereo Matching)
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
特征匹配(Feature Matching)
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
车道线检测(Lane Detection)
Rethinking Efficient Lane Detection via Curve Modeling
A Keypoint-based Global Association Network for Lane Detection
光流估计(Optical Flow Estimation)
Imposing Consistency for Optical Flow Estimation
Deep Equilibrium Optical Flow Estimation
GMFlow: Learning Optical Flow via Global Matching
图像修复(Image Inpainting)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
图像检索(Image Retrieval)
Correlation Verification for Image Retrieval
人脸识别(Face Recognition)
AdaFace: Quality Adaptive Margin for Face Recognition
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
视频生成(Video Generation)
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
参考视频目标分割(Referring Video Object Segmentation)
Language as Queries for Referring Video Object Segmentation
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
步态识别(Gait Recognition)
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark
风格迁移(Style Transfer)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
异常检测(Anomaly Detection)
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
对抗样本)
对抗样本(Adversarial Examples)
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
LAS-AT: Adversarial Training with Learnable Attack Strategy
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection
弱监督物体检测(Weakly Supervised Object Localization)
Weakly Supervised Object Localization as Domain Adaption
雷达目标检测(Radar Object Detection)
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
高光谱图像重建(Hyperspectral Image Reconstruction)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
图像拼接(Image Stitching)
Deep Rectangling for Image Stitching: A Learning Baseline
水印(Watermarking)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
Action Counting
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Grounded Situation Recognition
Collaborative Transformers for Grounded Situation Recognition
Zero-shot Learning
Unseen Classes at a Later Time? No Problem
DeepFakes
Detecting Deepfakes with Self-Blended Images
数据集(Datasets)
It's About Time: Analog Clock Reading in the Wild
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Kubric: A scalable dataset generator
Scribble-Supervised LiDAR Semantic Segmentation
Deep Rectangling for Image Stitching: A Learning Baseline
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Shape from Polarization for Complex Scenes in the Wild
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Putting People in their Place: Monocular Regression of 3D People in Depth
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Visual Abductive Reasoning
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
新任务(New Task)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
It's About Time: Analog Clock Reading in the Wild
Splicing ViT Features for Semantic Appearance Transfer
Visual Abductive Reasoning
其他(Others)
Kubric: A scalable dataset generator
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Balanced MSE for Imbalanced Visual Regression
SNUG: Self-Supervised Neural Dynamic Garments
Shape from Polarization for Complex Scenes in the Wild
LASER: LAtent SpacE Rendering for 2D Visual Localization
Single-Photon Structured Light
3DeformRS: Certifying Spatial Deformations on Point Clouds
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Robust and Accurate Superquadric Recovery: a Probabilistic Approach
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
DeepDPM: Deep Clustering With an Unknown Number of Clusters
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Proto2Proto: Can you recognize the car, the way I do?
Putting People in their Place: Monocular Regression of 3D People in Depth
Light Field Neural Rendering
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning
Escaping the Big Data Paradigm with Compact Transformers Keras:使用 MIRNet实现低光图像增强 |