aigc-acceleration-arxiv-daily

Contributors Forks Stargazers Issues

AIGC Acceleration for Video Generation

Automatically Updated on 2026.04.04

Current Search Keywords: Video Generation, Text-to-Video, Image-to-Video, Video Editing, Diffusion Models, Real-time Generation, Video Diffusion, Video Synthesis, Latent Diffusion, Video Generation Acceleration

If you have any other keywords, please feel free to let us know :)

Web Page (Scrape Code)

Table of Contents
  1. <a href=#video-generation>Video Generation</a>
  2. <a href=#image-to-video>Image-to-Video</a>
  3. <a href=#video-editing>Video Editing</a>
  4. <a href=#diffusion-models>Diffusion Models</a>
  5. <a href=#real-time-generation>Real-time Generation</a>
  6. <a href=#dit-acceleration>DiT Acceleration</a>

Video Generation

Publish Date Title Authors PDF Code
2026-04-02 ActionParty: Multi-Subject Action Binding in Generative Video Games Alexander Pondaven et.al. 2604.02330 null
2026-04-02 Generative World Renderer Zheng-Hui Huang et.al. 2604.02329 null
2026-04-02 VOID: Video Object and Interaction Deletion Saman Motamed et.al. 2604.02296 null
2026-04-02 Resonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulation Changshe Zhang et.al. 2604.01994 null
2026-04-02 DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning Yang Zhou et.al. 2604.01765 null
2026-04-02 Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion Edoardo A. Dominici et.al. 2604.01761 null
2026-04-02 Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation Lingyu Liu et.al. 2604.01700 null
2026-04-02 From Understanding to Erasing: Towards Complete and Stable Video Object Removal Dingming Liu et.al. 2604.01693 null
2026-04-02 DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data Wonjoon Jin et.al. 2604.01666 null
2026-04-02 Moiré Video Authentication: A Physical Signature Against AI Video Generation Yuan Qing et.al. 2604.01654 null
2026-04-02 ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor Yixiao Wang et.al. 2604.01552 null
2026-04-01 Reinforcing Consistency in Video MLLMs with Structured Rewards Yihao Quan et.al. 2604.01460 null
2026-04-01 GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization Syed Ahsan Masud Zaidi et.al. 2604.01383 null
2026-04-01 TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Masking Jiyuan Hu et.al. 2604.01207 null
2026-04-01 ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation Hao Zhang et.al. 2604.01129 null
2026-04-01 PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement Zilong Li et.al. 2604.01053 null
2026-04-01 ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration Fengyuan Yang et.al. 2604.01043 null
2026-04-01 MotionGrounder: Grounded Multi-Object Motion Transfer via Diffusion Transformer Samuel Teodoro et.al. 2604.00853 null
2026-04-01 HICT: High-precision 3D CBCT reconstruction from a single X-ray Wen Ma et.al. 2604.00792 null
2026-04-01 CL-VISTA: Benchmarking Continual Learning in Video Large Language Models Haiyang Guo et.al. 2604.00677 null
2026-03-31 Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry Syed Eqbal Alam et.al. 2604.00319 null
2026-03-31 OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation Yuheng Liu et.al. 2603.30045 null
2026-03-31 Video Models Reason Early: Exploiting Plan Commitment for Maze Solving Kaleb Newman et.al. 2603.30043 null
2026-03-31 Gloria: Consistent Character Video Generation via Content Anchors Yuhang Yang et.al. 2603.29931 null
2026-03-31 SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation Ryosuke Matsuda et.al. 2603.29186 null
2026-03-31 TrajectoryMover: Generative Movement of Object Trajectories in Videos Kiran Chhatre et.al. 2603.29092 null
2026-03-30 Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos Yujin Ham et.al. 2603.29036 null
2026-03-30 Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas Felix Wimbauer et.al. 2603.28980 null
2026-03-30 Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Muyang He et.al. 2603.28489 null
2026-03-30 VistaGEN: Consistent Driving Video Generation with Fine-Grained Control Using Multiview Visual-Language Reasoning Li-Heng Chen et.al. 2603.28353 null
2026-03-30 LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization Chutian Meng et.al. 2603.28082 null
2026-03-30 FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation Liuzhou Zhang et.al. 2603.27915 null
2026-03-29 Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning Ming Liu et.al. 2603.27866 null
2026-03-29 TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets Zhixuan Liu et.al. 2603.27520 null
2026-03-29 KV Cache Quantization for Self-Forcing Video Generation: A 33-Method Empirical Study Suraj Ranganath et.al. 2603.27469 null
2026-03-28 LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model Quankai Gao et.al. 2603.27449 null
2026-03-28 TrackMAE: Video Representation Learning via Track Mask and Predict Renaud Vandeghen et.al. 2603.27268 null
2026-03-28 LightMover: Generative Light Movement with Color and Intensity Controls Gengze Zhou et.al. 2603.27209 null
2026-03-28 EFlow: Fast Few-Step Video Generator Training from Scratch via Efficient Solution Flow Dogyun Park et.al. 2603.27086 null
2026-03-28 LightCtrl: Training-free Controllable Video Relighting Yizuo Peng et.al. 2603.27083 null
2026-03-27 Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling Ruixing Zhang et.al. 2603.26610 null
2026-03-27 VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Zhaochong An et.al. 2603.26599 null
2026-03-27 Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow Ziyue Zeng et.al. 2603.26571 null
2026-03-26 ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Yawen Luo et.al. 2603.25746 null
2026-03-26 RefAlign: Representation Alignment for Reference-to-Video Generation Lei Wang et.al. 2603.25743 null
2026-03-26 PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Xiaofeng Mao et.al. 2603.25730 null
2026-03-26 Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training Xiangyang Luo et.al. 2603.25527 null
2026-03-26 EagleNet: Energy-Aware Fine-Grained Relationship Learning Network for Text-Video Retrieval Yuhan Chen et.al. 2603.25267 null
2026-03-26 Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction Jiahao Tian et.al. 2603.25209 null
2026-03-26 AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References Jiahao Wang et.al. 2603.25188 null
2026-03-26 GaussFusion: Improving 3D Reconstruction in the Wild with A Geometry-Informed Video Generator Liyuan Zhu et.al. 2603.25053 null
2026-03-26 ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors Haodong Yu et.al. 2603.24270 null
2026-03-25 DCARL: A Divide-and-Conquer Framework for Autoregressive Long-Trajectory Video Generation Junyi Ouyang et.al. 2603.24835 null
2026-03-25 DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving Pengxuan Yang et.al. 2603.24587 null
2026-03-25 Anti-I2V: Safeguarding your photos from malicious image-to-video generation Duc Vu et.al. 2603.24570 null
2026-03-25 Toward Physically Consistent Driving Video World Models under Challenging Trajectories Jiawei Zhou et.al. 2603.24506 null
2026-03-25 OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Kaihang Pan et.al. 2603.24458 null
2026-03-25 Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep Tianyi Liu et.al. 2603.24260 null
2026-03-25 Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection Jielun Peng et.al. 2603.23960 null
2026-03-25 Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval Junkai Yang et.al. 2603.23902 null
2026-03-24 WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Zhen Li et.al. 2603.23497 null
2026-03-24 Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation Brian Chao et.al. 2603.23491 null
2026-03-24 TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation Jini Yang et.al. 2603.23487 null
2026-03-24 RealMaster: Lifting Rendered Scenes into Photorealistic Video Dana Cohen-Bar et.al. 2603.23462 null
2026-03-24 I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation Jia Li et.al. 2603.23413 null
2026-03-24 ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment Yuzhi Chen et.al. 2603.23376 null
2026-03-24 ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images Yunfeng Wu et.al. 2603.23326 null
2026-03-24 GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models Zekai Gu et.al. 2603.23246 null
2026-03-24 InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance Dongwei Pan et.al. 2603.23132 null
2026-03-24 WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion Manuel-Andreas Schneider et.al. 2603.22972 null
2026-03-24 Cluster-Wise Spatio-Temporal Masking for Efficient Video-Language Pretraining Weijun Zhuang et.al. 2603.22953 null
2026-03-23 TrajLoom: Dense Future Trajectory Generation from Video Zewei Zhang et.al. 2603.22606 null
2026-03-23 Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Meiqi Wu et.al. 2603.22212 null
2026-03-23 PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation Mingju Gao et.al. 2603.22193 null
2026-03-23 Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding Yunzhuo Sun et.al. 2603.22121 null
2026-03-23 P-Flow: Prompting Visual Effects Generation Rui Zhao et.al. 2603.22091 null
2026-03-23 Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model SII-GAIR et.al. 2603.21986 null
2026-03-23 Manifold-Aware Exploration for Reinforcement Learning in Video Generation Mingzhe Zheng et.al. 2603.21872 null
2026-03-23 Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation Yuyang You et.al. 2603.21864 null
2026-03-23 Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning Sulian Thual et.al. 2603.21856 null
2026-03-23 PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models Yiwei Xie et.al. 2603.21547 null
2026-03-22 Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation Zengqun Zhao et.al. 2603.21366 null
2026-03-22 Identity-Consistent Video Generation under Large Facial-Angle Variations Bin Hu et.al. 2603.21299 null
2026-03-22 Pretrained Video Models as Differentiable Physics Simulators for Urban Wind Flows Janne Perini et.al. 2603.21210 null
2026-03-20 Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier Yujie Zhou et.al. 2603.20382 null
2026-03-20 MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints Yu Qi et.al. 2603.20194 null
2026-03-20 LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation Jiazheng Xing et.al. 2603.20192 null
2026-03-20 X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving Chaoda Zheng et.al. 2603.19979 null
2026-03-20 Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis Weisheng Xu et.al. 2603.19709 null
2026-03-20 Making Video Models Adhere to User Intent with Minor Adjustments Daniel Ajisafe et.al. 2603.19672 null
2026-03-20 OrbitNVS: Harnessing Video Diffusion Priors for Novel View Synthesis Jinglin Liang et.al. 2603.19613 null
2026-03-20 Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning Qin Zhang et.al. 2603.19607 null
2026-03-19 Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI’s Sora 2 Matthew Flathers et.al. 2603.19527 null
2026-03-19 Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Xianjin Wu et.al. 2603.19235 null
2026-03-19 MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction Haitian Li et.al. 2603.19231 null
2026-03-19 Spectrally-Guided Diffusion Noise Schedules Carlos Esteves et.al. 2603.19222 null
2026-03-19 Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos Weijia Dou et.al. 2603.19048 null
2026-03-19 V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors Songjia He et.al. 2603.18811 null
2026-03-19 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Rundong Su et.al. 2603.18742 null
2026-03-19 PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance Cong Wang et.al. 2603.18639 null
2026-03-19 Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering Jiayi Luo et.al. 2603.18636 null
2026-03-19 GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection? Yueying Zou et.al. 2603.18625 null
2026-03-19 Improving Joint Audio-Video Generation with Cross-Modal Context Learning Bingqi Ma et.al. 2603.18600 null
2026-03-19 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Hyun-kyu Ko et.al. 2603.18524 null
2026-03-19 Efficient Video Diffusion with Sparse Information Transmission for Video Compression Mingde Zhou et.al. 2603.18501 null
2026-03-18 The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering Yigit Ekin et.al. 2603.17998 null
2026-03-18 Versatile Editing of Video Content, Actions, and Dynamics without Training Vladimir Kulikov et.al. 2603.17989 null
2026-03-18 AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors Aymen Mir et.al. 2603.17975 null
2026-03-18 Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation Yingjie Chen et.al. 2603.17889 null
2026-03-18 Steering Video Diffusion Transformers with Massive Activations Xianhang Cheng et.al. 2603.17825 null
2026-03-18 ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation Dmitriy Rivkin et.al. 2603.17812 null
2026-03-18 EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards Ruixiang Wang et.al. 2603.17808 null
2026-03-18 TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos Yan Zeng et.al. 2603.17735 null
2026-03-18 Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos Songtao Jiang et.al. 2603.17693 null
2026-03-18 FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion Hugo Caselles-Dupré et.al. 2603.17555 null
2026-03-18 ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling Daowen Li et.al. 2603.17546 null
2026-03-18 AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization Dailan He et.al. 2603.17461 null
2026-03-18 SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning Xi Ye et.al. 2603.17426 null
2026-03-18 Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion Rui Hong et.al. 2603.17398 null
2026-03-18 Stereo World Model: Camera-Guided Stereo Video Generation Yang-Tian Sun et.al. 2603.17375 null
2026-03-17 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation Jisu Nam et.al. 2603.16871 null
2026-03-17 Demystifing Video Reasoning Ruisi Wang et.al. 2603.16870 null
2026-03-17 DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models Emily Yue-Ting Jia et.al. 2603.16860 null
2026-03-17 World Reconstruction From Inconsistent Views Lukas Höllein et.al. 2603.16736 null
2026-03-17 Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search Sainan Liu et.al. 2603.16711 null
2026-03-17 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation Mutian Xu et.al. 2603.16669 null
2026-03-17 VideoMatGen: PBR Materials through Joint Generative Modeling Jon Hasselgren et.al. 2603.16566 null
2026-03-17 VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment Tengjiao Yin et.al. 2603.16271 null
2026-03-17 S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight Haodong Yan et.al. 2603.16195 null
2026-03-17 Diffusion Models for Joint Audio-Video Generation Alejandro Paredes La Torre et.al. 2603.16093 null
2026-03-16 Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion Zhenghong Zhou et.al. 2603.15614 null
2026-03-16 Grounding World Simulation Models in a Real-World Metropolis Junyoung Seo et.al. 2603.15583 null
2026-03-16 iDaVIE v1.0: A virtual reality tool for interactive analysis of astronomical data cubes Alexander Sivitilli et.al. 2603.15490 null
2026-03-16 ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Ruonan Yu et.al. 2603.15478 null
2026-03-16 AnyCrowd: Instance-Isolated Identity-Pose Binding for Arbitrary Multi-Character Animation Zhenyu Xie et.al. 2603.15415 null

(<a href=#updated-on-20260404>back to top</a>)

Image-to-Video

Publish Date Title Authors PDF Code
2026-04-02 DenOiS: Dual-Domain Denoising of Observation and Solution in Ultrasound Image Reconstruction Can Deniz Bezek et.al. 2604.02105 null
2026-04-02 Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion Edoardo A. Dominici et.al. 2604.01761 null
2026-04-02 ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor Yixiao Wang et.al. 2604.01552 null
2026-04-01 AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction Aiza Maksutova et.al. 2604.01371 null
2026-04-01 OkanNet: A Lightweight Deep Learning Architecture for Classification of Brain Tumor from MRI Images Okan Uçar et.al. 2604.01264 null
2026-04-01 Simulating Realistic LiDAR Data Under Adverse Weather for Autonomous Vehicles: A Physics-Informed Learning Approach Vivek Anand et.al. 2604.01254 null
2026-04-01 Camouflage-aware Image-Text Retrieval via Expert Collaboration Yao Jiang et.al. 2604.01251 null
2026-04-01 AdaLoRA-QAT: Adaptive Low-Rank and Quantization-Aware Segmentation Prantik Deb et.al. 2604.01167 null
2026-04-01 Looking into a Pixel by Nonlinear Unmixing – A Generative Approach Maofeng Tang et.al. 2604.01141 null
2026-04-01 VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic Ziyu Wang et.al. 2604.01134 null
2026-04-01 Region-Adaptive Generative Compression with Spatially Varying Diffusion Models Lucas Relic et.al. 2604.01122 null
2026-04-01 ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction Yuheng Zhang et.al. 2604.01081 null
2026-04-01 IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models Dong-Jae Lee et.al. 2604.00757 null
2026-03-31 Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry Syed Eqbal Alam et.al. 2604.00319 null
2026-03-31 Prompt-Guided Prefiltering for VLM Image Compression Bardia Azizian et.al. 2604.00314 null
2026-03-31 Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning Ayoub Louaye Bouaziz et.al. 2604.00263 null
2026-03-31 Evaluation of neuroCombat and deep learning harmonization for multi-site magnetic resonance neuroimaging in youth with prenatal alcohol exposure Chloe Scholten et.al. 2604.00251 null
2026-03-31 Harmonization mitigates diffusion MRI scanner effects in infancy: insights from the HEALthy Brain and Childhood Development (HBCD) study Elyssa M. McMaster et.al. 2604.00246 null
2026-03-31 Pupil Design for Computational Wavefront Estimation Ali Almuallem et.al. 2604.00225 null
2026-03-31 Brain MR Image Synthesis with Multi-contrast Self-attention GAN Zaid A. Abod et.al. 2604.00070 null
2026-03-31 OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation Yuheng Liu et.al. 2603.30045 null
2026-03-31 Polyhedral Unmixing: Bridging Semantic Segmentation with Hyperspectral Unmixing via Polyhedral-Cone Partitioning Antoine Bottenmuller et.al. 2603.29438 null
2026-03-31 Rich-U-Net: A medical image segmentation model for fusing spatial depth features and capturing minute structural details Zhuoyi Fang et.al. 2603.29404 null
2026-03-31 Retinal Malady Classification using AI: A novel ViT-SVM combination architecture Shashwat Jha et.al. 2603.29181 null
2026-03-30 The Surprising Effectiveness of Noise Pretraining for Implicit Neural Representations Kushal Vyas et.al. 2603.29034 null
2026-03-30 End-to-end optimization of sparse ultrasound linear probes Sergio Urrea et.al. 2603.29014 null
2026-03-30 Hybrid Quantum-Classical AI for Industrial Defect Classification in Welding Images Akshaya Srinivasan et.al. 2603.28995 null
2026-03-30 Learning a dynamic four-chamber shape model of the human heart for 95,695 UK Biobank participants Qiang Ma et.al. 2603.28711 null
2026-03-30 MRI-to-CT synthesis using drifting models Qing Lyu et.al. 2603.28498 null
2026-03-30 Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms Muyang He et.al. 2603.28489 null
2026-03-30 Deep Learning Based Site-Specific Channel Inference Using Satellite Images Junzhe Song et.al. 2603.28083 null
2026-03-30 MolmoPoint: Better Pointing for VLMs with Grounding Tokens Christopher Clark et.al. 2603.28069 null
2026-03-30 Physics-Embedded Feature Learning for AI in Medical Imaging Pulock Das et.al. 2603.28057 null
2026-03-29 Towards Emotion Recognition with 3D Pointclouds Obtained from Facial Expression Images Laura RayĂłn Ropero et.al. 2603.27798 null
2026-03-28 Guided Lensless Polarization Imaging Noa Kraicer et.al. 2603.27357 null
2026-03-28 DeepBayesFlow: A Bayesian Structured Variational Framework for Generalizable Prostate Segmentation via Expressive Posteriors and SDE-Girsanov Uncertainty Modeling Zhuoyi Fang et.al. 2603.27263 null
2026-03-28 MD-RWKV-UNet: Scale-Aware Anatomical Encoding with Cross-Stage Fusion for Multi-Organ Segmentation Zhuoyi Fang et.al. 2603.27261 null
2026-03-28 Quantitative measurements of biological/chemical concentrations using smartphone cameras Zhendong Cao et.al. 2603.27118 null
2026-03-27 On-Device Super Resolution Imaging Using Low-Cost SPAD Array and Embedded Lightweight Deep Learning Zhenya Zang et.al. 2603.27018 null
2026-03-27 Make Geometry Matter for Spatial Reasoning Shihua Zhang et.al. 2603.26639 null
2026-03-27 Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling Ruixing Zhang et.al. 2603.26610 null
2026-03-27 From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning Yang Liu et.al. 2603.26597 null
2026-03-27 Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow Ziyue Zeng et.al. 2603.26571 null
2026-03-26 TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance Quynh Phung et.al. 2603.25707 null
2026-03-26 Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos Abdullah Hamdi et.al. 2603.25645 null
2026-03-26 A Mamba-based Perceptual Loss Function for Learning-based UGC Transcoding Zihao Qi et.al. 2603.25566 null
2026-03-26 Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case Koldo Basterretxea et.al. 2603.25510 null
2026-03-26 Language-Free Generative Editing from One Visual Example Omar Elezabi et.al. 2603.25441 null
2026-03-26 PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders Niccolò Cavagnero et.al. 2603.25398 null
2026-03-26 Underdetermined Blind Source Separation via Weighted Simplex Shrinkage Regularization and Quantum Deep Image Prior Chia-Hsiang Lin et.al. 2603.25384 null
2026-03-26 Image Rotation Angle Estimation: Comparing Circular-Aware Methods Maximilian Woehrer et.al. 2603.25351 null
2026-03-26 Pixelis: Reasoning in Pixels, from Seeing to Acting Yunpeng Zhou et.al. 2603.25091 null
2026-03-26 MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models Dohwan Ko et.al. 2603.24984 null
2026-03-26 Subject-Specific Low-Field MRI Synthesis via a Neural Operator Ziqi Gao et.al. 2603.24968 null
2026-03-25 OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video Selim Gilon et.al. 2603.24733 null
2026-03-25 Vision-Language Models vs Human: Perceptual Image Quality Assessment Imran Mehmood et.al. 2603.24578 null
2026-03-25 Anti-I2V: Safeguarding your photos from malicious image-to-video generation Duc Vu et.al. 2603.24570 null
2026-03-25 OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Kaihang Pan et.al. 2603.24458 null
2026-03-25 Modeling Spatiotemporal Neural Frames for High Resolution Brain Dynamic Wanying Qu et.al. 2603.24176 null
2026-03-25 Comparative analysis of dual-form networks for live land monitoring using multi-modal satellite image time series Iris Dumeur et.al. 2603.24109 null
2026-03-25 Blind Quality Enhancement for G-PCC Compressed Dynamic Point Clouds Tian Guo et.al. 2603.24026 null
2026-03-25 MonoSIM: An open source SIL framework for Ackermann Vehicular Systems with Monocular Vision Shantanu Rahman et.al. 2603.23965 null
2026-03-25 Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection Jielun Peng et.al. 2603.23960 null
2026-03-25 Attention-aware Inference Optimizations for Large Vision-Language Models with Memory-efficient Decoding Fatih Ilhan et.al. 2603.23914 null
2026-03-25 Joint Source-Channel-Check Coding with HARQ for Reliable Semantic Communications Boyuan Li et.al. 2603.23869 null
2026-03-24 Sentinel-2 for Crop Yield Estimation: A Systematic Review Mohammadreza Narimani et.al. 2603.23779 null
2026-03-24 Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation Brian Chao et.al. 2603.23491 null
2026-03-24 Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation Xinyu Liu et.al. 2603.23390 null
2026-03-24 GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models Zekai Gu et.al. 2603.23246 null
2026-03-24 Rigid Motion Estimation using Accelerated Iterative Coordinate Descent (REACT) for MR Imaging Kwang Eun Jang et.al. 2603.23096 null
2026-03-24 WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion Manuel-Andreas Schneider et.al. 2603.22972 null
2026-03-24 Retrieval-Guided Photovoltaic Inventory Estimation from Satellite Imagery for Distribution Grid Planning Muhao Guo et.al. 2603.22856 null
2026-03-24 L-UNet: An LSTM Network for Remote Sensing Image Change Detection Shuting Sun et.al. 2603.22842 null
2026-03-24 Viewport-based Neural 360° Image Compression Jingwei Liao et.al. 2603.22776 null
2026-03-23 Drop-In Perceptual Optimization for 3D Gaussian Splatting Ezgi Ozyilkan et.al. 2603.23297 null
2026-03-23 Single-Subject Multi-View MRI Super-Resolution via Implicit Neural Representations Heejong Kim et.al. 2603.22627 null
2026-03-23 Far-field compressive ultrasound beamforming Nikunj Khetan et.al. 2603.22496 null
2026-03-23 P-Flow: Prompting Visual Effects Generation Rui Zhao et.al. 2603.22091 null
2026-03-23 A Latent Representation Learning Framework for Hyperspectral Image Emulation in Remote Sensing Chedly Ben Azizi et.al. 2603.21911 null
2026-03-23 HMS-VesselNet: Hierarchical Multi-Scale Attention Network with Topology-Preserving Loss for Retinal Vessel Segmentation Amarnath R et.al. 2603.21891 null
2026-03-23 The Universal Normal Embedding Chen Tasker et.al. 2603.21786 null
2026-03-23 Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration Jiaqi Shang et.al. 2603.21760 null
2026-03-23 Unregistered Spectral Image Fusion: Unmixing, Adversarial Learning, and Recoverability Jiahui Song et.al. 2603.21510 null
2026-03-22 OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields Aizierjiang Aiersilan et.al. 2603.20999 null
2026-03-21 Underwater imaging without color distortions requires RAW capture Derya Akkaynak et.al. 2603.20823 null
2026-03-21 mmWave-Diffusion:A Novel Framework for Respiration Sensing Using Observation-Anchored Conditional Diffusion Model Yong Wang et.al. 2603.20700 null
2026-03-21 Seed1.8 Model Card: Towards Generalized Real-World Agency Bytedance Seed et.al. 2603.20633 null
2026-03-20 Thermal is Always Wild: Characterizing and Addressing Challenges in Thermal-Only Novel View Synthesis M. Kerem Aydin et.al. 2603.20448 null
2026-03-20 CaroTo: A Tool for Fast Comprehensive Analysis of Carotid Artery Stenosis in 4D PC- and 3D BB-MRI Data Hinrich Rahlfs et.al. 2603.20355 null
2026-03-20 A Unified Platform and Quality Assurance Framework for 3D Ultrasound Reconstruction with Robotic, Optical, and Electromagnetic Tracking Lewis Howell et.al. 2603.20077 null
2026-03-20 Investigating a Policy-Based Formulation for Endoscopic Camera Pose Recovery Jan Emily Mangulabnan et.al. 2603.20045 null
2026-03-20 Goal-Oriented Framework for Optical Flow-based Multi-User Multi-Task Video Transmission Yujie Xu et.al. 2603.19995 null
2026-03-20 Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts John Turnbull et.al. 2603.19994 null
2026-03-20 ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis Lubin Gan et.al. 2603.19925 null
2026-03-20 Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive Robin Spanier et.al. 2603.19801 null
2026-03-19 TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis Atharva Rege et.al. 2603.19386 null
2026-03-19 Spectrally-Guided Diffusion Noise Schedules Carlos Esteves et.al. 2603.19222 null
2026-03-19 GenMFSR: Generative Multi-Frame Image Restoration and Super-Resolution Harshana Weligampola et.al. 2603.19187 null
2026-03-19 Student views in AI Ethics and Social Impact Tudor-Dan Mihoc et.al. 2603.18827 null
2026-03-19 A Hybrid Physical–Digital Framework for Annotated Fracture Reduction Data Evaluated using Clinically Relevant 3D metrics Basile Longo et.al. 2603.18723 null
2026-03-19 UEPS: Robust and Efficient MRI Reconstruction Xiang Zhou et.al. 2603.18572 null
2026-03-19 SCISSR: Scribble-Conditioned Interactive Surgical Segmentation and Refinement Haonan Ping et.al. 2603.18544 null
2026-03-19 TransText: Alpha-as-RGB Representation for Transparent Text Animation Fei Zhang et.al. 2603.17944 null
2026-03-18 Energy-Aware Frame Rate Selection for Video Coding Geetha Ramasubbu et.al. 2603.18305 null
2026-03-18 Understanding Task Aggregation for Generalizable Ultrasound Foundation Models Fangyijie Wang et.al. 2603.18123 null
2026-03-18 Dual Agreement Consistency Learning with Foundation Models for Semi-Supervised Fetal Heart Ultrasound Segmentation and Diagnosis Fangyijie Wang et.al. 2603.18119 null
2026-03-18 Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2603.18118 null
2026-03-18 The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering Yigit Ekin et.al. 2603.17998 null
2026-03-18 Video Understanding: From Geometry and Semantics to Unified Models Zhaochong An et.al. 2603.17840 null
2026-03-18 Cache-enabled Generative Joint Source-Channel Coding for Evolving Semantic Communications Shunpu Tang et.al. 2603.17702 null
2026-03-18 Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos Songtao Jiang et.al. 2603.17693 null
2026-03-18 Few-Step Diffusion Sampling Through Instance-Aware Discretizations Liangyu Yuan et.al. 2603.17671 null
2026-03-18 FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion Hugo Caselles-Dupré et.al. 2603.17555 null
2026-03-18 Deep Learning-Based Airway Segmentation in Systemic Lupus Erythematosus Patients with Interstitial Lung Disease (SLE-ILD): A Comparative High-Resolution CT Analysis Sirong Piao et.al. 2603.17547 null
2026-03-18 SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning Xi Ye et.al. 2603.17426 null
2026-03-18 Structured SIR: Efficient and Expressive Importance-Weighted Inference for High-Dimensional Image Registration Ivor J. A. Simpson et.al. 2603.17415 null
2026-03-18 A 3D Reconstruction Benchmark for Asset Inspection James L. Gray et.al. 2603.17358 null
2026-03-17 A Lensless Polarization Camera Noa Kraicer et.al. 2603.17156 null
2026-03-17 Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication Omar Erak et.al. 2603.17126 null
2026-03-17 Surg $ÎŁ$ : A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence Zhitao Zeng et.al. 2603.16822 null
2026-03-17 Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping Justin McMillen et.al. 2603.16788 null
2026-03-17 Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search Sainan Liu et.al. 2603.16711 null
2026-03-17 vAccSOL: Efficient and Transparent AI Vision Offloading for Mobile Robots Adam Zahir et.al. 2603.16685 null
2026-03-17 HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes Pierre-Antoine Bannier et.al. 2603.16587 null
2026-03-17 Fanar 2.0: Arabic Generative AI Stack FANAR TEAM et.al. 2603.16397 null
2026-03-17 The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models Eduardo Nebot et.al. 2603.16050 null
2026-03-17 Clinical Priors Guided Lung Disease Detection in 3D CT Scans Kejin Lu et.al. 2603.15143 null
2026-03-16 FlatLands: Generative Floormap Completion From a Single Egocentric View Subhransu S. Bhattacharjee et.al. 2603.16016 null
2026-03-16 Standardizing Medical Images at Scale for AI Callen MacPhee et.al. 2603.15980 null
2026-03-16 GLANCE: Gaze-Led Attention Network for Compressed Edge-inference Neeraj Solanki et.al. 2603.15717 null
2026-03-16 ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Ruonan Yu et.al. 2603.15478 null
2026-03-16 Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation Yuanfan Zheng et.al. 2603.15475 null
2026-03-16 Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling Aram Davtyan et.al. 2603.15279 null
2026-03-16 CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds Vaishnavi Nagabhushana et.al. 2603.15184 null

(<a href=#updated-on-20260404>back to top</a>)

Video Editing

Publish Date Title Authors PDF Code
2026-04-02 VOID: Video Object and Interaction Deletion Saman Motamed et.al. 2604.02296 null
2026-03-31 CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Shifang Zhao et.al. 2603.29664 null
2026-03-31 TrajectoryMover: Generative Movement of Object Trajectories in Videos Kiran Chhatre et.al. 2603.29092 null
2026-03-31 X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving Chaoda Zheng et.al. 2603.19979 null
2026-03-30 AutoCut: End-to-end advertisement video editing based on multimodal discretization and controllable generation Milton Zhou et.al. 2603.28366 null
2026-03-26 TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance Quynh Phung et.al. 2603.25707 null
2026-03-25 AVControl: Efficient Framework for Training Audio-Visual Controls Matan Ben-Yosef et.al. 2603.24793 null
2026-03-25 Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep Tianyi Liu et.al. 2603.24260 null
2026-03-24 RealMaster: Lifting Rendered Scenes into Photorealistic Video Dana Cohen-Bar et.al. 2603.23462 null
2026-03-20 PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing Jiadong Liang et.al. 2603.19731 null
2026-03-19 SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Xinyao Zhang et.al. 2603.19228 null
2026-03-19 EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Yang Fu et.al. 2603.19224 null
2026-03-18 Versatile Editing of Video Content, Actions, and Dynamics without Training Vladimir Kulikov et.al. 2603.17989 null
2026-03-18 ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation Dmitriy Rivkin et.al. 2603.17812 null
2026-03-18 SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Guibin Chen et.al. 2602.21818 null
2026-03-17 SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation Jiongze Yu et.al. 2603.16864 null
2026-03-14 Script-to-Slide Grounding: Grounding Script Sentences to Slide Objects for Automatic Instructional Video Generation Rena Suzuki et.al. 2603.16931 null
2026-03-13 GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generation Hao Zhang et.al. 2602.20673 null
2026-03-10 When to Lock Attention: Training-Free KV Control in Video Diffusion Tianyi Zeng et.al. 2603.09657 null
2026-03-10 From Ideal to Real: Stable Video Object Removal under Imperfect Conditions Jiagao Hu et.al. 2603.09283 null
2026-03-06 Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion Bohai Gu et.al. 2603.06140 null
2026-03-06 GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection Xuan Huang et.al. 2603.06048 null
2026-03-06 Training-free Latent Inter-Frame Pruning with Attention Recovery Dennis Menn et.al. 2603.05811 null
2026-03-06 Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance Yiqi Lin et.al. 2603.02175 null
2026-03-06 UniVBench: Towards Unified Evaluation for Video Foundation Models Jianhui Wei et.al. 2602.21835 null
2026-03-03 NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing Tianlin Pan et.al. 2603.02802 null
2026-03-01 FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing Maomao Li et.al. 2603.01164 null
2026-02-25 StoryComposerAI: Supporting Human-AI Story Co-Creation Through Decomposition and Linking Shuo Niu et.al. 2602.21486 null
2026-02-24 PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models Wonyong Seo et.al. 2602.20583 null
2026-02-16 EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing Yehonathan Litman et.al. 2602.15031 null

(<a href=#updated-on-20260404>back to top</a>)

Diffusion Models

Publish Date Title Authors PDF Code
2026-04-02 ActionParty: Multi-Subject Action Binding in Generative Video Games Alexander Pondaven et.al. 2604.02330 null
2026-04-02 VOID: Video Object and Interaction Deletion Saman Motamed et.al. 2604.02296 null
2026-04-02 Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives Hao Zhu et.al. 2604.02250 null
2026-04-02 Reflection Generation for Composite Image Using Diffusion Model Haonan Zhao et.al. 2604.02168 null
2026-04-02 Why Gaussian Diffusion Models Fail on Discrete Data? Alexander Shabalin et.al. 2604.02028 null
2026-04-02 Multiphase cross-diffusion models for tissue structures: modeling, analysis, numerics Ansgar JĂĽngel et.al. 2604.01827 null
2026-04-02 SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers Xiang Yang et.al. 2604.01826 null
2026-04-02 Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion Edoardo A. Dominici et.al. 2604.01761 null
2026-04-02 SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing Thinh Dao et.al. 2604.01715 null
2026-04-02 Bias mitigation in graph diffusion models Meng Yu et.al. 2604.01709 null
2026-04-02 Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation Lingyu Liu et.al. 2604.01700 null
2026-04-02 From Understanding to Erasing: Towards Complete and Stable Video Object Removal Dingming Liu et.al. 2604.01693 null
2026-04-02 DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data Wonjoon Jin et.al. 2604.01666 null
2026-04-02 Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations Yue Li et.al. 2604.01635 null
2026-04-02 Cross-Domain Vessel Segmentation via Latent Similarity Mining and Iterative Co-Optimization Zhanqiang Guo et.al. 2604.01553 null
2026-04-01 Learning and Generating Mixed States Prepared by Shallow Channel Circuits Fangjun Hu et.al. 2604.01197 null
2026-04-01 ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation Hao Zhang et.al. 2604.01129 null
2026-04-01 Region-Adaptive Generative Compression with Spatially Varying Diffusion Models Lucas Relic et.al. 2604.01122 null
2026-04-01 Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation Yun-Ning et.al. 2604.01120 null
2026-04-01 Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models Jonas Schaible et.al. 2604.01106 null
2026-04-01 PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement Zilong Li et.al. 2604.01053 null
2026-04-01 EmoScene: A Dual-space Dataset for Controllable Affective Image Generation Li He et.al. 2604.00933 null
2026-04-01 IDDM: Identity-Decoupled Personalized Diffusion Models with a Tunable Privacy-Utility Trade-off Linyan Dai et.al. 2604.00903 null
2026-04-01 HICT: High-precision 3D CBCT reconstruction from a single X-ray Wen Ma et.al. 2604.00792 null
2026-04-01 Learnability-Guided Diffusion for Dataset Distillation Jeffrey A. Chan-Santiago et.al. 2604.00519 null
2026-04-01 Tucker Diffusion Model for High-dimensional Tensor Generation Jianhua Guo et.al. 2604.00481 null
2026-04-01 Learning Humanoid Navigation from Human Data Weizhuo Wang et.al. 2604.00416 null
2026-04-01 Deep Networks Favor Simple Data Weyl Lu et.al. 2604.00394 null
2026-04-01 Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data Shihao Li et.al. 2604.00391 null
2026-04-01 mmAnomaly: Leveraging Visual Context for Robust Anomaly Detection in the Non-Visual World with mmWave Radar Tarik Reza Toha et.al. 2604.00382 null
2026-03-31 Video Models Reason Early: Exploiting Plan Commitment for Maze Solving Kaleb Newman et.al. 2603.30043 null
2026-03-31 Conditional Diffusion-Based Point Cloud Imaging for UAV Position and Attitude Sensing Xinhong Dai et.al. 2603.29822 null
2026-03-31 Emotion Diffusion Classifier with Adaptive Margin Discrepancy Training for Facial Expression Recognition Rongkang Dong et.al. 2603.29578 null
2026-03-31 Total Variation Guarantees for Sampling with Stochastic Localization Jakob Kellermann et.al. 2603.29555 null
2026-03-31 iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models Xudong Zhou et.al. 2603.29469 null
2026-03-31 NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework for Non-Invasive Prediction of Perineural Invasion via Generation-Driven Classification Youngung Han et.al. 2603.29449 null
2026-03-31 Ultra-short-term volatility surfaces Federico M. Bandi et.al. 2603.29430 null
2026-03-31 Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning Jiaao Ma et.al. 2603.29426 null
2026-03-31 Pathogen diversity emerging from coevolutionary dynamics in interconnected systems Davide Zanchetta et.al. 2603.29398 null
2026-03-31 CIPHER: Counterfeit Image Pattern High-level Examination via Representation Kyeonghun Kim et.al. 2603.29356 null
2026-03-31 FOSCU: Feasibility of Synthetic MRI Generation via Duo-Diffusion Models for Enhancement of 3D U-Nets in Hepatic Segmentation Youngung Han et.al. 2603.29343 null
2026-03-31 Differentiable Normative Guidance for Nash Bargaining Solution Recovery Moirangthem Tiken Singh et.al. 2603.29297 null
2026-03-31 Diffusion Mental Averages Phonphrm Thawatdamrongkit et.al. 2603.29239 null
2026-03-30 Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos Yujin Ham et.al. 2603.29036 null
2026-03-30 MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Bharath Krishnamurthy et.al. 2603.29029 null
2026-03-30 Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds N Alex Cayco Gajic et.al. 2603.28764 null
2026-03-30 PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models Lorenza Prospero et.al. 2603.28763 null
2026-03-30 On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers Omer Dahary et.al. 2603.28762 null
2026-03-30 DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Kailai Feng et.al. 2603.28713 null
2026-03-30 Front Location for Go or Grow Models of Aerotaxis Mete Demircigil et.al. 2603.28663 null
2026-03-30 $R_{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation Linqian Fan et.al. 2603.28460 null
2026-03-30 Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science Yipeng Yu et.al. 2603.28361 null
2026-03-30 Intrinsically ultralow thermal conductivity in all-inorganic superatomic bulk crystals Mingzhang Yang et.al. 2603.28267 null
2026-03-30 ColorFLUX: A Structure-Color Decoupling Framework for Old Photo Colorization Bingchen Li et.al. 2603.28162 null
2026-03-30 SVGS: Single-View to 3D Object Editing via Gaussian Splatting Pengcheng Xue et.al. 2603.28126 null
2026-03-30 Attention Frequency Modulation: Training-Free Spectral Modulation of Diffusion Cross-Attention Seunghun Oh et.al. 2603.28114 null
2026-03-30 Physics-Embedded Feature Learning for AI in Medical Imaging Pulock Das et.al. 2603.28057 null
2026-03-30 Self-Organizing Score-based Data Assimilation Yuma Yamaoka et.al. 2603.28048 null
2026-03-30 From Independent to Correlated Diffusion: Generalized Generative Modeling with Probabilistic Computers Nihal Sanjay Singh et.al. 2603.27996 null
2026-03-30 Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment Tongfei Liu et.al. 2603.27987 null
2026-03-29 Diversity Matters: Dataset Diversification and Dual-Branch Network for Generalized AI-Generated Image Detection Nusrat Tasnim et.al. 2603.27800 null
2026-03-29 Heracles: Bridging Precise Tracking and Generative Synthesis for General Humanoid Control Zelin Tao et.al. 2603.27756 null
2026-03-29 Bridging Schrödinger and Bass: A Semimartingale Optimal Transport Problem with Diffusion Control Pierre Henry-Labordere et.al. 2603.27712 null
2026-03-29 Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers Yuhe Liu et.al. 2603.27666 null
2026-03-26 PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Xiaofeng Mao et.al. 2603.25730 null
2026-03-26 S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation Ligong Han et.al. 2603.25702 null
2026-03-26 Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning Jai Bardhan et.al. 2603.25685 null
2026-03-26 Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training Xiangyang Luo et.al. 2603.25527 null
2026-03-26 Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification Giampaolo Bovenzi et.al. 2603.25507 null
2026-03-26 Temporally Decoupled Diffusion Planning for Autonomous Driving Xiang Li et.al. 2603.25462 null
2026-03-26 Language-Free Generative Editing from One Visual Example Omar Elezabi et.al. 2603.25441 null
2026-03-26 Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells Han Zhang et.al. 2603.25240 null
2026-03-26 Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction Jiahao Tian et.al. 2603.25209 null
2026-03-26 CardioDiT: Latent Diffusion Transformers for 4D Cardiac MRI Synthesis Marvin Seyfarth et.al. 2603.25194 null
2026-03-26 VolDiT: Controllable Volumetric Medical Image Synthesis with Diffusion Transformers Marvin Seyfarth et.al. 2603.25181 null
2026-03-26 Bilingual Text-to-Motion Generation: A New Benchmark and Baselines Wanjiang Weng et.al. 2603.25178 null
2026-03-26 A Reaction-Advection-Diffusion Model to describe Non-Uniformities in Colorimetric Sensing using Thin Porous Substrates Kulkarni Namratha et.al. 2603.25124 null
2026-03-26 Learning Explicit Continuous Motion Representation for Dynamic Gaussian Splatting from Monocular Videos Xuankai Zhang et.al. 2603.25058 null
2026-03-26 BiFM: Bidirectional Flow Matching for Few-Step Image Editing and Generation Yasong Dai et.al. 2603.24942 null
2026-03-25 Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method Arthur Jacot et.al. 2603.24594 null
2026-03-25 Anti-I2V: Safeguarding your photos from malicious image-to-video generation Duc Vu et.al. 2603.24570 null
2026-03-25 Reflected diffusion models adapt to low-dimensional data Asbjørn Holk et.al. 2603.24495 null
2026-03-25 Analysis and numerical simulation of a spatio-temporal Ricker-type model for the control of Aedes aegypti mosquitoes with Sterile Insect Techniques Oscar Eduardo Escobar-Lasso et.al. 2603.24460 null
2026-03-25 Teacher-Student Diffusion Model for Text-Driven 3D Hand Motion Generation Ching-Lam Cheng et.al. 2603.24407 null
2026-03-25 ViHOI: Human-Object Interaction Synthesis with Visual Priors Songjin Cai et.al. 2603.24383 null
2026-03-25 ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors Haodong Yu et.al. 2603.24270 null
2026-03-25 LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation Ryugo Morita et.al. 2603.24086 null
2026-03-25 When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm Ye Leng et.al. 2603.24079 null
2026-03-25 HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models Yeqi He et.al. 2603.24043 null
2026-03-25 Lagrangian Relaxation Score-based Generation for Mixed Integer linear Programming Ruobing Wang et.al. 2603.24033 null
2026-03-25 DepthArb: Training-Free Depth-Arbitrated Generation for Occlusion-Robust Image Synthesis Hongjin Niu et.al. 2603.23924 null
2026-03-25 Latent Bias Alignment for High-Fidelity Diffusion Inversion in Real-World Image Reconstruction and Manipulation Weiming Chen et.al. 2603.23903 null
2026-03-25 A simple model for conserved intracellular dynamics exhibits multiscale pattern formation, traveling protein domains and arrested coarsening of lipids in the membrane Benjamin Winkler et.al. 2603.23856 null
2026-03-25 3D-LLDM: Label-Guided 3D Latent Diffusion Model for Improving High-Resolution Synthetic MR Imaging in Hepatic Structure Segmentation Kyeonghun Kim et.al. 2603.23845 null
2026-03-24 DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models Jaewon Min et.al. 2603.23499 null
2026-03-24 Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation Brian Chao et.al. 2603.23491 null
2026-03-24 RealMaster: Lifting Rendered Scenes into Photorealistic Video Dana Cohen-Bar et.al. 2603.23462 null
2026-03-24 Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation Michal Balcerak et.al. 2603.23398 null
2026-03-24 Markov State–Space Modeling and Channel Characterization for DNA-Based Molecular Communication Ruifeng Zheng et.al. 2603.23394 null
2026-03-24 FG-Portrait: 3D Flow Guided Editable Portrait Animation Yating Xu et.al. 2603.23381 null
2026-03-24 ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images Yunfeng Wu et.al. 2603.23326 null
2026-03-24 Permutation-Symmetrized Diffusion for Unconditional Molecular Generation Gyeonghoon Ko et.al. 2603.23255 null
2026-03-24 GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models Zekai Gu et.al. 2603.23246 null
2026-03-24 AeroScene: Progressive Scene Synthesis for Aerial Robotics Nghia Vu et.al. 2603.23224 null
2026-03-24 Gimbal360: Differentiable Auto-Leveling for Canonicalized $360^\circ$ Panoramic Image Completion Yuqin Lu et.al. 2603.23179 null
2026-03-24 Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards Orhun BuÄźra Baran et.al. 2603.23086 null
2026-03-24 Zero-Shot Personalization of Objects via Textual Inversion Aniket Roy et.al. 2603.23010 null
2026-03-24 Markov-Enforced Discrete Diffusion Model for Digital Semantic Symbol Error Correction Yoon Huh et.al. 2603.22983 null
2026-03-24 Asymptotic Learning Curves for Diffusion Models with Random Features Score and Manifold Data Anand Jerry George et.al. 2603.22962 null
2026-03-23 End-to-End Training for Unified Tokenization and Latent Denoising Shivam Duggal et.al. 2603.22283 null
2026-03-23 Repurposing Geometric Foundation Models for Multi-view Diffusion Wooseok Jang et.al. 2603.22275 null
2026-03-23 DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution Zhengyao Lv et.al. 2603.22271 null
2026-03-23 SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation Sashuai Zhou et.al. 2603.22228 null
2026-03-23 DA-VAE: Plug-in Latent Compression for Diffusion via Detail Alignment Xin Cai et.al. 2603.22125 null
2026-03-23 DTVI: Dual-Stage Textual and Visual Intervention for Safe Text-to-Image Generation Binhong Tan et.al. 2603.22041 null
2026-03-23 APEG: Adaptive Physical Layer Authentication with Channel Extrapolation and Generative AI Xiqi Cheng et.al. 2603.21923 null
2026-03-23 CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal Qingdong He et.al. 2603.21901 null
2026-03-23 ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval Zhuocheng Zhang et.al. 2603.21886 null
2026-03-23 Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation Donald Shenaj et.al. 2603.21884 null
2026-03-23 Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation Yuyang You et.al. 2603.21864 null
2026-03-23 Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning Sulian Thual et.al. 2603.21856 null
2026-03-23 A hybrid wavelet-based physics-informed neural network for portfolio management Bahadur Yadav et.al. 2603.21834 null
2026-03-23 Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction Kuangzhe Xu et.al. 2603.21735 null
2026-03-23 Unimodular Diffusion and Interacting Vacuum Cosmology Gopal Kashyap et.al. 2603.21675 null
2026-03-23 DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers Tianyu Cao et.al. 2603.21608 null
2026-03-23 PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models Yiwei Xie et.al. 2603.21547 null
2026-03-23 Empirical Evaluation of Link Deletion Methods for Limiting Information Diffusion on Social Media Shiori Furukawa et.al. 2603.21470 null
2026-03-22 Is the future of AI green? What can innovation diffusion models say about generative AI’s environmental impact? Robert Viseur et.al. 2603.21419 null
2026-03-22 An InSAR Phase Unwrapping Framework for Large-scale and Complex Events Yijia Song et.al. 2603.21378 null
2026-03-22 Efficient Coarse-to-Fine Diffusion Models with Time Step Sequence Redistribution Yu-Shan Tai et.al. 2603.21348 null
2026-03-20 LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation Jiazheng Xing et.al. 2603.20192 null
2026-03-20 Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods Sebastian Gerard et.al. 2603.20188 null
2026-03-20 Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD Emiel Hoogeboom et.al. 2603.20155 null
2026-03-20 How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models Luca Ambrogioni et.al. 2603.20092 null
2026-03-20 Timestep-Aware Block Masking for Efficient Diffusion Model Inference Haodong He et.al. 2603.19939 null
2026-03-20 A distribution-free lattice Boltzmann method for compartmental reaction-diffusion systems with application to epidemic modelling Alessandro De Rosis et.al. 2603.19789 null
2026-03-20 Diminishing Returns in Expanding Generative Models and Godel-Tarski-Lob Limits Angshul Majumdar et.al. 2603.19687 null
2026-03-20 ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models Mohammad Shahab Sepehri et.al. 2603.19676 null
2026-03-20 Making Video Models Adhere to User Intent with Minor Adjustments Daniel Ajisafe et.al. 2603.19672 null
2026-03-20 OmniDiT: Extending Diffusion Transformer to Omni-VTON Framework Weixuan Zeng et.al. 2603.19643 null
2026-03-20 On the role of memorization in learned priors for geophysical inverse problems Ali Siahkoohi et.al. 2603.19629 null
2026-03-20 MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation Kaixin Cai et.al. 2603.19575 null
2026-03-20 Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation Chuhan Wang et.al. 2603.19570 null
2026-03-19 TRACE: Trajectory Recovery with State Propagation Diffusion for Urban Mobility Jinming Wang et.al. 2603.19474 null
2026-03-19 TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis Atharva Rege et.al. 2603.19386 null
2026-03-19 Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Xianjin Wu et.al. 2603.19235 null
2026-03-19 Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer Chenyang Gu et.al. 2603.19227 null
2026-03-19 Spectrally-Guided Diffusion Noise Schedules Carlos Esteves et.al. 2603.19222 null
2026-03-19 Rethinking Vector Field Learning for Generative Segmentation Chaoyang Wang et.al. 2603.19218 null
2026-03-19 RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing Yue Gong et.al. 2603.19206 null
2026-03-19 MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data Masoumeh Shafieinejad et.al. 2603.19185 null
2026-03-19 ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation Kwanyoung Lee et.al. 2603.19157 null
2026-03-19 D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding Jonathan Lys et.al. 2603.19146 null
2026-03-19 Revisiting Autoregressive Models for Generative Image Classification Ilia Sudakov et.al. 2603.19122 null
2026-03-19 FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal Telang Xu et.al. 2603.19036 null
2026-03-19 Foundations of Schrödinger Bridges for Generative Modeling Sophia Tang et.al. 2603.18992 null
2026-03-19 CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think Zening Sun et.al. 2603.18991 null
2026-03-19 Neural Galerkin Normalizing Flow for Transition Probability Density Functions of Diffusion Models Riccardo Saporiti et.al. 2603.18907 null
2026-03-19 Translating MRI to PET through Conditional Diffusion Models with Enhanced Pathology Awareness Yitong Li et.al. 2603.18896 null
2026-03-19 RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction Xiucheng Wang et.al. 2603.18865 null
2026-03-18 AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors Aymen Mir et.al. 2603.17975 null
2026-03-18 LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition Vlad-Constantin Lungu-Stan et.al. 2603.17965 null
2026-03-18 Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control Zunzhe Zhang et.al. 2603.17834 null
2026-03-18 TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models Qianlong Xiang et.al. 2603.17828 null
2026-03-18 ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation Dmitriy Rivkin et.al. 2603.17812 null
2026-03-18 CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image Yizheng Song et.al. 2603.17779 null
2026-03-18 Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems Qi Liu et.al. 2603.17750 null
2026-03-18 TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos Yan Zeng et.al. 2603.17735 null
2026-03-18 Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models Jaemin Kim et.al. 2603.17677 null
2026-03-18 Proof-of-Authorship for Diffusion-based AI Generated Content De Zhang Lee et.al. 2603.17513 null
2026-03-18 A Tutorial on Learning-Based Radio Map Construction: Data, Paradigms, and Physics-Awarenes Xiucheng Wang et.al. 2603.17499 null
2026-03-18 SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning Xi Ye et.al. 2603.17426 null
2026-03-18 Joint Degradation-Aware Arbitrary-Scale Super-Resolution for Variable-Rate Extreme Image Compression Xinning Chai et.al. 2603.17408 null
2026-03-18 Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion Rui Hong et.al. 2603.17398 null
2026-03-18 Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis Rui Hong et.al. 2603.17388 null
2026-03-17 V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Han Lin et.al. 2603.16792 null
2026-03-17 Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation Chenggong Hu et.al. 2603.16747 null
2026-03-17 World Reconstruction From Inconsistent Views Lukas Höllein et.al. 2603.16736 null
2026-03-17 Self-Aware Markov Models for Discrete Reasoning Gregor Kornhardt et.al. 2603.16661 null
2026-03-17 Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration Amirhossein Kazerouni et.al. 2603.16570 null
2026-03-17 Robust Physics-Guided Diffusion for Full-Waveform Inversion Jishen Peng et.al. 2603.16393 null
2026-03-17 Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy Adrien Jacquet Crétides et.al. 2603.16368 null
2026-03-17 $D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation Ruizhi Wang et.al. 2603.16362 null
2026-03-17 Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation Xinhao Cai et.al. 2603.16340 null
2026-03-17 Probabilistic reconstruction of global sea surface temperature using generative diffusion models Haijie Li et.al. 2603.16272 null
2026-03-17 VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment Tengjiao Yin et.al. 2603.16271 null
2026-03-17 Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation Yiming Huang et.al. 2603.16211 null
2026-03-17 Physics-guided diffusion models for inverse design of disordered metamaterials Ziyuan Xie et.al. 2603.16209 null
2026-03-17 S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight Haodong Yan et.al. 2603.16195 null
2026-03-17 When Generative Augmentation Hurts: A Benchmark Study of GAN and Diffusion Models for Bias Correction in AI Classification Systems Shesh Narayan Gupta et.al. 2603.16134 null

(<a href=#updated-on-20260404>back to top</a>)

Real-time Generation

Publish Date Title Authors PDF Code
2026-04-02 AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models Yunge Wen et.al. 2604.01650 null
2026-03-31 OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation Yuheng Liu et.al. 2603.30045 null
2026-03-31 From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety Ganen Sethupathy et.al. 2603.29777 null
2026-03-31 $R_\text{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation Linqian Fan et.al. 2603.28460 null
2026-03-28 Fair Benchmarking of Emerging One-Step Generative Models Against Multistep Diffusion and Flow Models Advaith Ravishankar et.al. 2603.14186 null
2026-03-27 LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis Stanislaw Szymanowicz et.al. 2603.20176 null
2026-03-23 DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution Zhengyao Lv et.al. 2603.22271 null
2026-03-23 Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation Yuyang You et.al. 2603.21864 null
2026-03-22 Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control Grayson Lee et.al. 2603.13733 null
2026-03-21 Smart Operation Theatre: An AI-based System for Surgical Gauze Counting Saraf Krish et.al. 2603.20752 null
2026-03-19 cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization Yuyang Liu et.al. 2603.19163 null
2026-03-19 Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering Jiayi Luo et.al. 2603.18636 null
2026-03-18 Fast Beam-Brainstorm: Few-Step Generative Site-Specific Beamforming with Flexible Probing Zihao Zhou et.al. 2603.17622 null
2026-03-18 Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion Rui Hong et.al. 2603.17398 null
2026-03-17 Unlearning for One-Step Generative Models via Unbalanced Optimal Transport Hyundo Choi et.al. 2603.16489 null
2026-03-16 GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution Qiaosi Yi et.al. 2603.16769 null
2026-03-16 Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces Zilan Cheng et.al. 2603.14798 null
2026-03-15 GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies He Zhang et.al. 2603.14245 null
2026-03-12 Sinkhorn-Drifting Generative Models Ping He et.al. 2603.12366 null
2026-03-12 FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance Quanhao Li et.al. 2603.12146 null
2026-03-12 InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model InSpatio Team et.al. 2603.11911 null
2026-03-11 Auroral Acceleration Generates Electron Beams in Jupiter’s Middle Magnetosphere June Piasecki et.al. 2603.10760 null
2026-03-11 Riemannian MeanFlow for One-Step Generation on Manifolds Zichen Zhong et.al. 2603.10718 null
2026-03-11 AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow Duojia Li et.al. 2603.10701 null
2026-03-10 FrameDiT: Diffusion Transformer with Frame-Level Matrix Attention for Efficient Video Generation Minh Khoa Le et.al. 2603.09721 null
2026-03-09 WaDi: Weight Direction-aware Distillation for One-step Image Synthesis Lei Wang et.al. 2603.08258 null
2026-03-08 TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward Yihong Luo et.al. 2603.07700 null

(<a href=#updated-on-20260404>back to top</a>)

DiT Acceleration

Publish Date Title Authors PDF Code
2026-03-27 FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation Dong Liu et.al. 2505.20353 null
2026-03-13 AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation Xuanhua Yin et.al. 2603.12575 null
2026-03-05 Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers Guandong Li et.al. 2603.05315 null
2026-02-28 Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization Tong Shao et.al. 2512.23258 null
2026-02-28 BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching Hanshuai Cui et.al. 2509.13789 null
2026-02-13 ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration Fanpu Cao et.al. 2512.17298 null
2026-02-11 SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Dongting Hu et.al. 2601.08303 null
2026-01-28 StreamFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs Jiacheng Yang et.al. 2601.20273 null
2026-01-15 TetriServe: Efficient DiT Serving for Heterogeneous Image Generation Runyu Lu et.al. 2510.01565 null
2026-01-09 Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers Dogyun Park et.al. 2510.21986 null
2025-12-30 Bidirectional Sparse Attention for Faster Video Diffusion Training Chenlu Zhan et.al. 2509.01085 null
2025-12-16 OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration Ruitong Sun et.al. 2512.14096 null
2025-09-23 Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark Siu Hang Ho et.al. 2509.17894 null
2025-08-26 Direction Informed Trees (DIT*): Optimal Path Planning via Direction Filter and Direction Cost Heuristic Liding Zhang et.al. 2508.19168 null
2025-05-16 Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration Haipeng Fang et.al. 2505.11707 null

(<a href=#updated-on-20260404>back to top</a>)

Notes:

Function added: