「超全」CVPR 2018 收錄論文所有標題列表

新聞 05-28

新智元推薦

本文來源於公眾號CVer和專知的整理

【新智元導讀】計算機視覺最具影響力的學術會議之一的 IEEE CVPR 將於 2018 年 6 月 18 日 - 22 日在美國鹽湖城召開舉行。據 CVPR 官網顯示，今年大會有超過 3300 篇論文投稿，其中錄取 979 篇；相比去年 783 篇論文，今年增長了近 25%。本文將介紹 CVPR 2018 所有錄用論文的標題, 包括每篇論文屬於 oral, spotlight 還是 poster 的情況。

「超全」CVPR 2018 收錄論文所有標題列表

本文將介紹 CVPR 2018 所有錄用論文的標題, 包括每篇論文屬於 oral, spotlight 還是 poster 的情況。大家可以根據論文的標題去 google/baidu，即可以找到相關 pdf/github/homepage 鏈接。

Amusi 已經將 CVPR 2018 所有論文清單上傳到 daily-paper-computer-vision 上，大家直接點擊文末的「閱讀全文」，即可訪問 daily-paper-computer-vision，下載 cvpr2018-paper-list.csv。

link:

https://github.com/amusi/daily-paper-computer-vision/blob/master/2018/cvpr2018-paper-list.csv

CVPR 2018概覽

CVPR 是 IEEE Conference on Computer Vision and Pattern Recognition 的縮寫，即 IEEE 國際計算機視覺與模式識別會議。該會議是由 IEEE 舉辦的計算機視覺和模式識別領域的頂級會議。

會議的主要內容是計算機視覺與模式識別技術。CVPR 是世界頂級的計算機視覺會議（三大頂會之一，另外兩個是 ICCV 和 ECCV）。本會議每年都會有固定的研討主題，而每一年都會有公司贊助該會議並獲得在會場展示的機會。

CVPR 有著較為嚴苛的錄用標準，會議整體的錄取率通常不超過 30%，而口頭報告的論文比例更是不高於 5%。而會議的組織方是一個循環的志願群體，通常在某次會議召開的三年之前通過遴選產生。CVPR 的審稿一般是雙盲的，也就是說會議的審稿與投稿方均不知道對方的信息。通常某一篇論文需要由三位審稿者進行審讀。最後再由會議的領域主席 (area chair) 決定論文是否可被接收。

CVPR 2018

上面簡單介紹了 CVPR ，其重要性不言而喻。而本文的重點，也是各位童鞋關注的焦點就在於 CVPR 2018。我們先看一組數據：979/3303 ~= 29.6%，該數據是指 CVPR 2018 論文的收錄比。

之前在知乎和各個新聞平台上都看到了 CVPR 2018 list，但都是一組純序號，既沒有屬性也沒有論文標題。機（wu）智（nai）的童鞋也只能去 arXiv 上 follow 最新的 paper，如果能遇見帶有 CVPR 2018 標誌的 paper，相信內心還有點小激動呢。

Amusi 在對知識的不斷追求中，發現了 CVPR 2018 所有收錄論文的名單，既包含了序號，也包含了屬性（oral、spotlight 或 poster）以及最最最重要的論文標題！

有了論文標題，真的就可以為所欲為~

打開 cvpr2018-paper-list.csv，按下 crtl + F，輸入要查找的內容，如 Object Detection，然後你就可以看到一篇篇關於 Object Detection 的論文啦！

「超全」CVPR 2018 收錄論文所有標題列表

然後將需要閱讀的論文標題複製到 google/baidu 搜索框中，比如《An Analysis of Scale Invariance in Object Detection - SNIP》

「超全」CVPR 2018 收錄論文所有標題列表

打開最上面的鏈接，一般就可以成功跳轉至 arXiv 的論文下載界面

「超全」CVPR 2018 收錄論文所有標題列表

授人以魚，不如授人以魚。上述只是 Amusi 常用小技巧，真的關公面前舞大刀了，大家可以自由發揮~

溫馨提示：CVPR 2018 大會將於 2018 年 6 月 18~22 日於美國猶他州的鹽湖城（Salt Lake City）舉辦。

link: http://cvpr2018.thecvf.com/

「超全」CVPR 2018 收錄論文所有標題列表

CVPR 2018論文列表

CVPR 2018 Accepted Papers

Single-Shot Refinement Neural Network for Object Detection

Video Captioning via Hierarchical Reinforcement Learning

DensePose: Multi-Person Dense Human Pose Estimation In The Wild

Frustum PointNets for 3D Object Detection from RGB-D Data

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Shape from Shading through Shape Evolution

A High-Quality Denoising Dataset for Smartphone Cameras

Improving Color Reproduction Accuracy in the Camera Imaging Pipeline

End-to-End Dense Video Captioning with Masked Transformer

pOSE: Pseudo Object Space Error for Initialization-Free Bundle Adjustment

Learning to Segment Every Thing

Density-aware Single Image De-raining using a Multi-stream Dense Network

Densely Connected Pyramid Dehazing Network

Embodied Question Answering

TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays

Towards Open-Set Identity Preserving Face Synthesis

Baseline Desensitizing In Translation Averaging

Learning from the Deep: A Revised Underwater Image Formation Model

Context Encoding for Semantic Segmentation

Deep Texture Manifold for Ground Terrain Recognition

DS*: Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems

Sparse, Smart Contours to Represent and Edit Images

Every Smile is Unique: Landmark-guided Diverse Smile Generation

Generative Non-Rigid Shape Completion with Graph Convolutional Autoencoders

Learning a Discriminative Prior for Blind Image Deblurring

Attentional ShapeContextNet for Point Cloud Recognition

Learning Superpixels with Segmentation-Aware Affinity Loss

Real-World Repetition Estimation by Div, Grad and Curl

Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ Segmentation

MegaDepth: Learning Single-View Depth Prediction from Internet Photos

Learning Intrinsic Image Decomposition from Watching the World

Don"t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

Learning by Asking Questions

Instance Embedding Transfer to Unsupervised Video Object Segmentation

Detect-and-Track: Efficient Pose Estimation in Videos

Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval

Guided Proofreading of Automatic Segmentations for Connectomics

Augmented Skeleton Space Transfer for Depth-based Hand Pose Estimation

Context-aware Synthesis for Video Frame Interpolation

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

NAG: Network for Adversary Generation

LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

Multi-view Harmonized Bilinear Network for 3D Object Recognition

Tangent Convolutions for Dense Prediction in 3D

Semi-parametric Image Synthesis

Interactive Image Segmentation with Latent Diversity

3D Hand Pose Estimation: From Current Achievements to Future Goals

W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection

BlockDrop: Dynamic Inference Paths in Residual Networks

MapNet: Geometry-Aware Learning of Maps for Camera Localization

BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning

Salient Object Detection Driven by Fixation Prediction

3D Object Detection with Latent Support Surfaces

Practical Block-wise Neural Network Architecture Generation

Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

Visual Grounding via Accumulated Attention

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing

Perturbative Neural Networks: Rethinking Convolution in CNNs

Nonlinear 3D Face Morphable Model

Neural Baby Talk

Towards Pose Invariant Face Recognition in the Wild

MoNet: Deep Motion Exploitation for Video Object Segmentation

Exploring Disentangled Feature Representation Beyond Face Identification

Towards Effective Low-bitwidth Convolutional Neural Networks

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering

Few-Shot Image Recognition by Predicting Parameters from Activations

Single-Shot Object Detection with Enriched Semantics

Unifying Identification and Context Learning for Person Recognition

Separating Self-Expression and Visual Content in Hashtag Supervision

Multi-Cue Correlation Filters for Robust Visual Tracking

Beyond Trade-off: Accelerate FCN-based Face Detection with Higher Accuracy

On the Robustness of Semantic Segmentation Models to Adversarial Attacks

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

Illuminant Spectra-based Source Separation Using Flash Photography

Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging

Improved Human Pose Estimation through Adversarial Data Augmentation

Generative Adversarial Learning Towards Fast Weakly Supervised Detection

Audio to Body Dynamics

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Frame-Recurrent Video Super-Resolution

Deep Mutual Learning

Real-world Anomaly Detection in Surveillance Videos

Soccer on Your Tabletop

Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification

HashGAN: Deep Learning to Hash with Pair Conditional Wasserstein GAN

Excitation Backprop for RNNs

Dynamic-Structured Semantic Propagation Network

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

Video Representation Learning Using Discriminative Pooling

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Human Pose Estimation with Parsing Induced Learner

4D Human Body Correspondences from Panoramic Depth Maps

Recognizing Human Actions as Evolution of Pose Estimation Maps

GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning

Deep Adversarial Metric Learning

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

Graph-Cut RANSAC

Five-point Fundamental Matrix Estimation for Uncalibrated Cameras

Hashing as Tie-Aware Learning to Rank

Optimizing Local Feature Descriptors for Nearest Neighbor Matching

Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

Consensus Maximization for Semantic Region Correspondences

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

Zigzag Learning for Weakly Supervised Object Detection

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

VITON: An Image-based Virtual Try-on Network

Cross-Domain Self-supervised Multi-task Feature Learning Using Synthetic Game Imagery

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

Thoracic Disease Identification and Localization with Limited Supervision

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation

Deep End-to-End Time-of-Flight Imaging

Fast and Accurate Online Video Object Segmentation via Tracking Parts

Min-Entropy Latent Model for Weakly Supervised Object Detection

Future Frame Prediction for Anomaly Detection A New Baseline

Face Aging with Identity-Preserved Conditional Generative Adversarial Networks

Learning to Compare: Relation Network for Few-Shot Learning

Deep Layer Aggregation

Style Aggregated Network for Facial Landmark Detection

M3: Multimodal Memory Modelling for Video Captioning

Classification Driven Dynamic Image Enhancement

Generative Image Inpainting with Contextual Attention

Iterative Visual Reasoning Beyond Convolutions

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Textbook Question Answering under Teacher Guidance with Memory Networks

Multi-Level Factorisation Net for Person Re-Identification

Functional Map of the World

A Two-Step Disentanglement Method

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

Left-Right Comparative Recurrent Model for Stereo Matching

Analytic Expressions for Probabilistic Moments of PL-DNN with Gaussian Input

Zero-Shot Sketch-Image Hashing

Interpretable Convolutional Neural Networks

Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves

Enhancing the Spatial Resolution of Stereo Images using a Parallax Prior

Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB

Generating Synthetic X-ray Images of a Person from the Surface Geometry

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

Unsupervised CCA

Discovering Point Lights with Intensity Distance Fields

Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising

Easy Identification from Better Constraints: Multi-Shot Person Re-Identification from Reference Constraints

Recurrent Pixel Embedding for Instance Grouping

Recurrent Scene Parsing with Perspective Understanding in the Loop

Learning to Hash by Discrepancy Minimization

Fast End-to-End Trainable Guided Filter

Disentangling Structure and Aesthetics for Content-aware Image Completion

An Analysis of Scale Invariance in Object Detection - SNIP

CSGNet: Neural Shape Parser for Constructive Solid Geometry

Finding Tiny Faces in the Wild with Generative Adversarial Network

SSNet: Scale Selection Network for Online 3D Action Prediction

Integrated facial landmark localization and super-resolution of real-world very low resolution faces in arbitrary poses with GANs

The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

Deep Cross-media Knowledge Transfer

Coupled End-to-end Transfer Learning with Generalized Fisher Information

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

Viewpoint-aware Attentive Multi-view Inference for Vehicle Re-identification

MatNet: Modular Attention Network for Referring Expression Comprehension

CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation

NISP: Pruning Networks using Neuron Importance Score Propagation

Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

Efficient Video Object Segmentation via Network Modulation

Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision

Feedback-prop: Convolutional Neural Network Inference under Partial Evidence

A Memory Network Approach for Story-based Temporal Summarization of 360?Videos

Improving Occlusion and Hard Negative Handling for Single-Stage Object Detectors

UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition

Learning a Toolchain for Image Restoration

Learning to Act Properly: Predicting and Explaining Affordances from Images

Learning a Discriminative Feature Network for Semantic Segmentation

Optimizing Video Object Detection via a Scale-Time Lattice

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Cascaded Pyramid Network for Multi-Person Pose Estimation

Seeing Temporal Modulation of Lights from Standard Cameras

Point-wise Convolutional Neural Networks

Fine-grained Video Captioning for Sports Narrative

Dense 3D Regression for Hand Pose Estimation

Missing Slice Recovery for Tensors Using a Low-rank Model in Embedded Space

Learning Convolutional Networks for Content-weighted Image Compression

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation

First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations

Hand PointNet: 3D Hand Pose Estimation using Point Sets

Recovering Realistic Texture in Image Super-resolution by Spatial Feature Modulation

Cube Padding for Weakly-Supervised Saliency Prediction in 360$^{circ}$ Videos

A Face to Face Neural Conversation Model

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

Dynamic Video Segmentation Network

Multiple Granularity Group Interaction Prediction

Visual Question Reasoning on General Dependency Tree

From Lifestyle VLOGs to Everyday Interactions

COCO-Stuff: Thing and Stuff Classes in Context

GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB

Non-local Neural Networks

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Taskonomy: Disentangling Task Transfer Learning

Embodied Real-World Active Perception

SfSNet : Learning Shape, Reflectance and Illuminance of Faces `in the wild"

End-to-end Recovery of Human Shape and Pose

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction

A Fast Resection-Intersection Method for the Known Rotation Problem

Image Generation from Scene Graphs

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Finding It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Video"

Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatio-temporal Patterns

Kernelized Subspace Pooling for Deep Local Descriptors

Video Rain Removal By Multiscale Convolutional Sparse Coding

Learning from Millions of 3D Scans for Large-scale 3D Face Recognition

Referring Relationships

Improving Object Localization with Fitness NMS and Bounded IoU Loss

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization

Visual Question Generation as Dual Task of Visual Question Answering

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation

Learning Dual Convolutional Neural Networks for Low-Level Vision

Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation

MegDet: A Large Mini-Batch Object Detector

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

TOM-Net: Learning Transparent Object Matting from a Single Image

End-to-End Deep Kronecker-Product Matching for Person Re-identification

Semantic Visual Localization

Joint Cuts and Matching of Partitions in One Graph

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

Crowd Counting via Adversarial Cross-Scale Consistency Pursuit

Deep Group-shuffling Random Walk for Person Re-identification

Learning to Detect Features in Texture Images

Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification

CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles

Context-aware Deep Feature Compression for High-speed Visual Tracking

Deep Material-aware Cross-spectral Stereo Matching

Deep Extreme Cut: From Extreme Points to Object Segmentation

Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images

Harmonious Attention Network for Person Re-Identication

Unsupervised Deep Generative Adversarial Hashing Network

Pseudo-Mask Augmented Object Detection

LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

Adversarial Complementary Learning for Weakly Supervised Object Localization

Unsupervised Discovery of Object Landmarks as Structural Representations

DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map

Monocular Relative Depth Perception with Web Stereo Data Supervision

Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification

Objects as context for detecting their semantic parts

Camera Style Adaptation for Person Re-identification

Conditional Generative Adversarial Network for Structured Domain Adaptation

Rotation-sensitive Regression for Oriented Scene Text Detection

Residual Parameter Transfer for Deep Domain Adaptation

SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation

Weakly Supervised Instance Segmentation using Class Peak Response

Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network

Rotation Averaging and Strong Duality

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Im2Flow: Motion Hallucination from Static Images for Action Recognition

Feature Quantization for Defending Against Distortion of Images

End-to-end weakly-supervised semantic alignment

PointGrid: A Deep Network for 3D Shape Understanding

Imagine it for me: Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts

A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds

A Benchmark for Articulated Human Pose Estimation and Tracking

Boosting Self-Supervised Learning via Knowledge Transfer

PPFNet: Global Context Aware Local Features for Robust 3D Point Matching

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Fast Video Object Segmentation by Reference-Guided Mask Propagation

Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes

Video Person Re-identification with Competitive Snippet-similarity Aggregation and Co-attentive Snippet Embedding

One-shot Action Localization by Sequence Matching Network

Efficient Subpixel Refinement with Symbolic Linear Predictors

Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning

Group Consistent Similarity Learning via Deep CRFs for Person Re-Identification

Single Image Reflection Separation with Perceptual Losses

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Recognize Actions by Disentangling Components of Dynamics

Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Attention-aware Compositional Network for Person Re-Identification

HATS: Histograms of Averaged Time Surfaces for Robust Event-based Object Classification

Mask-guided Contrastive Attention Model for Person Re-Identification

Pose-Guided Photorealistic Face Rotation

Automatic 3D Indoor Scene Modeling from Single Panorama

SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion

A Biresolution Spectral framework for Product Quantization

Dynamic Zoom-in Network for Fast Object Detection in Large Images

On the Importance of Label Quality for Semantic Segmentation

EPINET: A Fully-Convolutional Neural Network for Light Field Depth Estimation by Using Epipolar Geometry

A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking

Erase or Fill? Deep Joint Recurrent Rain Removal and Reconstruction in Videos

Scalable and Effective Deep CCA via Soft Decorrelation

High-order tensor regularization with application to attribute ranking

3D-RCNN: Instance-level 3D Scene Understanding via Render-and-Compare

FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds

Defocus Blur Detection via Multi-Stream Bottom-Top-Bottom Fully Convolutional Network

Decorrelated Batch Normalization

Unsupervised Textual Grounding: Linking Words to Image Concepts

Scale-recurrent Network for Deep Image Deblurring

Low-Shot Recognition with Imprinted Weights

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Facelet-Bank for Fast Portrait Manipulation

Duplex Generative Adversarial Network for Unsupervised Domain Adaptation

Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

Structure Preserving Video Prediction

Tagging Like Humans: Diverse and Distinct Image Annotation

Learning to Sketch with Shortcut Cycle Consistency

GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Detecting and Recognizing Human-Object Interactions

Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections

Visual Relationship Learning with a Factorization-based Prior

Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation

Flow Guided Recurrent Neural Encoder for Video Salient Object Detection

Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment

Progressive Attention Guided Recurrent Network for Salient Object Detection

Answer with Grounding Snippets: Focal Visual-Text Attention for Visual Question Answering

Unsupervised Learning of Depth and Egomotion from Monocular Video Using 3D Geometric Constraints

Repulsion Loss: Detecting Pedestrians in a Crowd

PU-Net: Point Cloud Upsampling Network

Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF

PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

Gated Fusion Network for Single Image Dehazing

Interleaved Structured Sparse Convolutional Neural Networks

Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

Left/Right Asymmetric Layer Skippable Networks

Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation

VITAL: VIsual Tracking via Adversarial Learning

RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

Squeeze-and-Excitation Networks

Edit Probability for Scene Text Recognition

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Exploit the Unknown Gradually:~ One-Shot Video-Based Person Re-Identification by Stepwise Learning

Learning to Localize Sound Source in Visual Scenes

Dynamic Few-Shot Visual Learning without Forgetting

Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features

SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation

Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

Low-Latency Video Semantic Segmentation

Domain Adaptive Faster R-CNN for Object Detection in the Wild

DoubleFusion: Real-time Capture of Human Performance with Inner Body Shape from a Single Depth Sensor

Lean Multiclass Crowdsourcing

Tell Me Where To Look: Guided Attention Inference Network

Residual Dense Network for Image Super-Resolution

Look at Boundary: A Boundary-Aware Face Alignment Algorithm

Imagination-IQA: No-reference Image Quality Assessment via Adversarial Learning

Memory Matching Networks for One-Shot Image Recognition

3D Human Pose Estimation in the Wild by Adversarial Learning

Unsupervised Training for 3D Morphable Model Regression

Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective

IQA: Visual Question Answering in Interactive Environments

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

Low-shot Learning from Imaginary Data

Deep Regression Forests for Age Estimation

Partial Transfer Learning with Selective Adversarial Networks

A Bi-directional Message Passing Model for Salient Object Detection

Transductive Unbiased Embedding for Zero-Shot Learning

Scale-Transferrable Object Detection

Crowd Counting with Deep Negative Correlation Learning

Deep Cauchy Hashing for Hamming Space Retrieval

Demo2Vec: Reasoning Object Affordances from Online Videos

GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition

An End-to-End TextSpotter with Explicit Alignment and Attention

Stereoscopic Neural Style Transfer

Bootstrapping the Performance of Webly Supervised Semantic Segmentation

Learning Markov Clustering Networks for Scene Text Detection

Collaborative and Adversarial Network for Unsupervised domain adaptation

Reflection Removal for Large-Scale 3D Point Clouds

Pose Transferrable Person Re-Identification

Learning to Adapt Structured Output Space for Semantic Segmentation

Efficient Diverse Ensemble for Discriminative Co-Tracking

Learning a Single Convolutional Super-Resolution Network for Multiple Degradations

Probabilistic Plant Modeling via Multi-View Image-to-Image Translation

Learning to Parse Wireframes in Images of Man-Made Environments

A Variational U-Net for Conditional Appearance and Shape Generation

Learning to Find Good Correspondences

Actor and Action Video Segmentation from a Sentence

Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks

Weakly-supervised Deep Convolutional Neural Network Learning for Facial Action Unit Intensity Estimation

Maximum Classifier Discrepancy for Unsupervised Domain Adaptation

由於微信字數限制，沒有全部顯示，詳細 list 請查看 Amusi 整理的

https://github.com/amusi/daily-paper-computer-vision

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 新智元 的精彩文章:

※「一文看懂」深度神經網路加速和壓縮新進展年度報告

TAG:新智元 |