Current Search Keywords: Video Generation, Text-to-Video, Image-to-Video, Video Editing, Diffusion Models, Real-time Generation, Video Diffusion, Video Synthesis, Latent Diffusion, Video Generation Acceleration
If you have any other keywords, please feel free to let us know :)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | ActionParty: Multi-Subject Action Binding in Generative Video Games | Alexander Pondaven et.al. | 2604.02330 | null |
| 2026-04-02 | Generative World Renderer | Zheng-Hui Huang et.al. | 2604.02329 | null |
| 2026-04-02 | VOID: Video Object and Interaction Deletion | Saman Motamed et.al. | 2604.02296 | null |
| 2026-04-02 | Resonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulation | Changshe Zhang et.al. | 2604.01994 | null |
| 2026-04-02 | DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning | Yang Zhou et.al. | 2604.01765 | null |
| 2026-04-02 | Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion | Edoardo A. Dominici et.al. | 2604.01761 | null |
| 2026-04-02 | Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation | Lingyu Liu et.al. | 2604.01700 | null |
| 2026-04-02 | From Understanding to Erasing: Towards Complete and Stable Video Object Removal | Dingming Liu et.al. | 2604.01693 | null |
| 2026-04-02 | DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data | Wonjoon Jin et.al. | 2604.01666 | null |
| 2026-04-02 | Moiré Video Authentication: A Physical Signature Against AI Video Generation | Yuan Qing et.al. | 2604.01654 | null |
| 2026-04-02 | ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor | Yixiao Wang et.al. | 2604.01552 | null |
| 2026-04-01 | Reinforcing Consistency in Video MLLMs with Structured Rewards | Yihao Quan et.al. | 2604.01460 | null |
| 2026-04-01 | GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization | Syed Ahsan Masud Zaidi et.al. | 2604.01383 | null |
| 2026-04-01 | TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Masking | Jiyuan Hu et.al. | 2604.01207 | null |
| 2026-04-01 | ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation | Hao Zhang et.al. | 2604.01129 | null |
| 2026-04-01 | PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement | Zilong Li et.al. | 2604.01053 | null |
| 2026-04-01 | ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration | Fengyuan Yang et.al. | 2604.01043 | null |
| 2026-04-01 | MotionGrounder: Grounded Multi-Object Motion Transfer via Diffusion Transformer | Samuel Teodoro et.al. | 2604.00853 | null |
| 2026-04-01 | HICT: High-precision 3D CBCT reconstruction from a single X-ray | Wen Ma et.al. | 2604.00792 | null |
| 2026-04-01 | CL-VISTA: Benchmarking Continual Learning in Video Large Language Models | Haiyang Guo et.al. | 2604.00677 | null |
| 2026-03-31 | Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry | Syed Eqbal Alam et.al. | 2604.00319 | null |
| 2026-03-31 | OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation | Yuheng Liu et.al. | 2603.30045 | null |
| 2026-03-31 | Video Models Reason Early: Exploiting Plan Commitment for Maze Solving | Kaleb Newman et.al. | 2603.30043 | null |
| 2026-03-31 | Gloria: Consistent Character Video Generation via Content Anchors | Yuhang Yang et.al. | 2603.29931 | null |
| 2026-03-31 | SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation | Ryosuke Matsuda et.al. | 2603.29186 | null |
| 2026-03-31 | TrajectoryMover: Generative Movement of Object Trajectories in Videos | Kiran Chhatre et.al. | 2603.29092 | null |
| 2026-03-30 | Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos | Yujin Ham et.al. | 2603.29036 | null |
| 2026-03-30 | Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas | Felix Wimbauer et.al. | 2603.28980 | null |
| 2026-03-30 | Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms | Muyang He et.al. | 2603.28489 | null |
| 2026-03-30 | VistaGEN: Consistent Driving Video Generation with Fine-Grained Control Using Multiview Visual-Language Reasoning | Li-Heng Chen et.al. | 2603.28353 | null |
| 2026-03-30 | LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization | Chutian Meng et.al. | 2603.28082 | null |
| 2026-03-30 | FlashSign: Pose-Free Guidance for Efficient Sign Language Video Generation | Liuzhou Zhang et.al. | 2603.27915 | null |
| 2026-03-29 | Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning | Ming Liu et.al. | 2603.27866 | null |
| 2026-03-29 | TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets | Zhixuan Liu et.al. | 2603.27520 | null |
| 2026-03-29 | KV Cache Quantization for Self-Forcing Video Generation: A 33-Method Empirical Study | Suraj Ranganath et.al. | 2603.27469 | null |
| 2026-03-28 | LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model | Quankai Gao et.al. | 2603.27449 | null |
| 2026-03-28 | TrackMAE: Video Representation Learning via Track Mask and Predict | Renaud Vandeghen et.al. | 2603.27268 | null |
| 2026-03-28 | LightMover: Generative Light Movement with Color and Intensity Controls | Gengze Zhou et.al. | 2603.27209 | null |
| 2026-03-28 | EFlow: Fast Few-Step Video Generator Training from Scratch via Efficient Solution Flow | Dogyun Park et.al. | 2603.27086 | null |
| 2026-03-28 | LightCtrl: Training-free Controllable Video Relighting | Yizuo Peng et.al. | 2603.27083 | null |
| 2026-03-27 | Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling | Ruixing Zhang et.al. | 2603.26610 | null |
| 2026-03-27 | VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward | Zhaochong An et.al. | 2603.26599 | null |
| 2026-03-27 | Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow | Ziyue Zeng et.al. | 2603.26571 | null |
| 2026-03-26 | ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling | Yawen Luo et.al. | 2603.25746 | null |
| 2026-03-26 | RefAlign: Representation Alignment for Reference-to-Video Generation | Lei Wang et.al. | 2603.25743 | null |
| 2026-03-26 | PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference | Xiaofeng Mao et.al. | 2603.25730 | null |
| 2026-03-26 | Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training | Xiangyang Luo et.al. | 2603.25527 | null |
| 2026-03-26 | EagleNet: Energy-Aware Fine-Grained Relationship Learning Network for Text-Video Retrieval | Yuhan Chen et.al. | 2603.25267 | null |
| 2026-03-26 | Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction | Jiahao Tian et.al. | 2603.25209 | null |
| 2026-03-26 | AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References | Jiahao Wang et.al. | 2603.25188 | null |
| 2026-03-26 | GaussFusion: Improving 3D Reconstruction in the Wild with A Geometry-Informed Video Generator | Liyuan Zhu et.al. | 2603.25053 | null |
| 2026-03-26 | ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors | Haodong Yu et.al. | 2603.24270 | null |
| 2026-03-25 | DCARL: A Divide-and-Conquer Framework for Autoregressive Long-Trajectory Video Generation | Junyi Ouyang et.al. | 2603.24835 | null |
| 2026-03-25 | DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving | Pengxuan Yang et.al. | 2603.24587 | null |
| 2026-03-25 | Anti-I2V: Safeguarding your photos from malicious image-to-video generation | Duc Vu et.al. | 2603.24570 | null |
| 2026-03-25 | Toward Physically Consistent Driving Video World Models under Challenging Trajectories | Jiawei Zhou et.al. | 2603.24506 | null |
| 2026-03-25 | OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning | Kaihang Pan et.al. | 2603.24458 | null |
| 2026-03-25 | Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep | Tianyi Liu et.al. | 2603.24260 | null |
| 2026-03-25 | Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection | Jielun Peng et.al. | 2603.23960 | null |
| 2026-03-25 | Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval | Junkai Yang et.al. | 2603.23902 | null |
| 2026-03-24 | WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG | Zhen Li et.al. | 2603.23497 | null |
| 2026-03-24 | Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation | Brian Chao et.al. | 2603.23491 | null |
| 2026-03-24 | TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation | Jini Yang et.al. | 2603.23487 | null |
| 2026-03-24 | RealMaster: Lifting Rendered Scenes into Photorealistic Video | Dana Cohen-Bar et.al. | 2603.23462 | null |
| 2026-03-24 | I3DM: Implicit 3D-aware Memory Retrieval and Injection for Consistent Video Scene Generation | Jia Li et.al. | 2603.23413 | null |
| 2026-03-24 | ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment | Yuzhi Chen et.al. | 2603.23376 | null |
| 2026-03-24 | ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images | Yunfeng Wu et.al. | 2603.23326 | null |
| 2026-03-24 | GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models | Zekai Gu et.al. | 2603.23246 | null |
| 2026-03-24 | InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance | Dongwei Pan et.al. | 2603.23132 | null |
| 2026-03-24 | WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion | Manuel-Andreas Schneider et.al. | 2603.22972 | null |
| 2026-03-24 | Cluster-Wise Spatio-Temporal Masking for Efficient Video-Language Pretraining | Weijun Zhuang et.al. | 2603.22953 | null |
| 2026-03-23 | TrajLoom: Dense Future Trajectory Generation from Video | Zewei Zhang et.al. | 2603.22606 | null |
| 2026-03-23 | Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models | Meiqi Wu et.al. | 2603.22212 | null |
| 2026-03-23 | PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation | Mingju Gao et.al. | 2603.22193 | null |
| 2026-03-23 | Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding | Yunzhuo Sun et.al. | 2603.22121 | null |
| 2026-03-23 | P-Flow: Prompting Visual Effects Generation | Rui Zhao et.al. | 2603.22091 | null |
| 2026-03-23 | Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model | SII-GAIR et.al. | 2603.21986 | null |
| 2026-03-23 | Manifold-Aware Exploration for Reinforcement Learning in Video Generation | Mingzhe Zheng et.al. | 2603.21872 | null |
| 2026-03-23 | Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation | Yuyang You et.al. | 2603.21864 | null |
| 2026-03-23 | Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning | Sulian Thual et.al. | 2603.21856 | null |
| 2026-03-23 | PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models | Yiwei Xie et.al. | 2603.21547 | null |
| 2026-03-22 | Relax Forcing: Relaxed KV-Memory for Consistent Long Video Generation | Zengqun Zhao et.al. | 2603.21366 | null |
| 2026-03-22 | Identity-Consistent Video Generation under Large Facial-Angle Variations | Bin Hu et.al. | 2603.21299 | null |
| 2026-03-22 | Pretrained Video Models as Differentiable Physics Simulators for Urban Wind Flows | Janne Perini et.al. | 2603.21210 | null |
| 2026-03-20 | Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier | Yujie Zhou et.al. | 2603.20382 | null |
| 2026-03-20 | MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints | Yu Qi et.al. | 2603.20194 | null |
| 2026-03-20 | LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation | Jiazheng Xing et.al. | 2603.20192 | null |
| 2026-03-20 | X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving | Chaoda Zheng et.al. | 2603.19979 | null |
| 2026-03-20 | Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis | Weisheng Xu et.al. | 2603.19709 | null |
| 2026-03-20 | Making Video Models Adhere to User Intent with Minor Adjustments | Daniel Ajisafe et.al. | 2603.19672 | null |
| 2026-03-20 | OrbitNVS: Harnessing Video Diffusion Priors for Novel View Synthesis | Jinglin Liang et.al. | 2603.19613 | null |
| 2026-03-20 | Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning | Qin Zhang et.al. | 2603.19607 | null |
| 2026-03-19 | Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI’s Sora 2 | Matthew Flathers et.al. | 2603.19527 | null |
| 2026-03-19 | Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding | Xianjin Wu et.al. | 2603.19235 | null |
| 2026-03-19 | MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction | Haitian Li et.al. | 2603.19231 | null |
| 2026-03-19 | Spectrally-Guided Diffusion Noise Schedules | Carlos Esteves et.al. | 2603.19222 | null |
| 2026-03-19 | Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos | Weijia Dou et.al. | 2603.19048 | null |
| 2026-03-19 | V-Dreamer: Automating Robotic Simulation and Trajectory Synthesis via Video Generation Priors | Songjia He et.al. | 2603.18811 | null |
| 2026-03-19 | 6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models | Rundong Su et.al. | 2603.18742 | null |
| 2026-03-19 | PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance | Cong Wang et.al. | 2603.18639 | null |
| 2026-03-19 | Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering | Jiayi Luo et.al. | 2603.18636 | null |
| 2026-03-19 | GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection? | Yueying Zou et.al. | 2603.18625 | null |
| 2026-03-19 | Improving Joint Audio-Video Generation with Cross-Modal Context Learning | Bingqi Ma et.al. | 2603.18600 | null |
| 2026-03-19 | 3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model | Hyun-kyu Ko et.al. | 2603.18524 | null |
| 2026-03-19 | Efficient Video Diffusion with Sparse Information Transmission for Video Compression | Mingde Zhou et.al. | 2603.18501 | null |
| 2026-03-18 | The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering | Yigit Ekin et.al. | 2603.17998 | null |
| 2026-03-18 | Versatile Editing of Video Content, Actions, and Dynamics without Training | Vladimir Kulikov et.al. | 2603.17989 | null |
| 2026-03-18 | AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors | Aymen Mir et.al. | 2603.17975 | null |
| 2026-03-18 | Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation | Yingjie Chen et.al. | 2603.17889 | null |
| 2026-03-18 | Steering Video Diffusion Transformers with Massive Activations | Xianhang Cheng et.al. | 2603.17825 | null |
| 2026-03-18 | ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation | Dmitriy Rivkin et.al. | 2603.17812 | null |
| 2026-03-18 | EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards | Ruixiang Wang et.al. | 2603.17808 | null |
| 2026-03-18 | TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos | Yan Zeng et.al. | 2603.17735 | null |
| 2026-03-18 | Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos | Songtao Jiang et.al. | 2603.17693 | null |
| 2026-03-18 | FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion | Hugo Caselles-Dupré et.al. | 2603.17555 | null |
| 2026-03-18 | ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling | Daowen Li et.al. | 2603.17546 | null |
| 2026-03-18 | AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization | Dailan He et.al. | 2603.17461 | null |
| 2026-03-18 | SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning | Xi Ye et.al. | 2603.17426 | null |
| 2026-03-18 | Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion | Rui Hong et.al. | 2603.17398 | null |
| 2026-03-18 | Stereo World Model: Camera-Guided Stereo Video Generation | Yang-Tian Sun et.al. | 2603.17375 | null |
| 2026-03-17 | WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation | Jisu Nam et.al. | 2603.16871 | null |
| 2026-03-17 | Demystifing Video Reasoning | Ruisi Wang et.al. | 2603.16870 | null |
| 2026-03-17 | DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models | Emily Yue-Ting Jia et.al. | 2603.16860 | null |
| 2026-03-17 | World Reconstruction From Inconsistent Views | Lukas Höllein et.al. | 2603.16736 | null |
| 2026-03-17 | Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search | Sainan Liu et.al. | 2603.16711 | null |
| 2026-03-17 | Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation | Mutian Xu et.al. | 2603.16669 | null |
| 2026-03-17 | VideoMatGen: PBR Materials through Joint Generative Modeling | Jon Hasselgren et.al. | 2603.16566 | null |
| 2026-03-17 | VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment | Tengjiao Yin et.al. | 2603.16271 | null |
| 2026-03-17 | S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight | Haodong Yan et.al. | 2603.16195 | null |
| 2026-03-17 | Diffusion Models for Joint Audio-Video Generation | Alejandro Paredes La Torre et.al. | 2603.16093 | null |
| 2026-03-16 | Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion | Zhenghong Zhou et.al. | 2603.15614 | null |
| 2026-03-16 | Grounding World Simulation Models in a Real-World Metropolis | Junyoung Seo et.al. | 2603.15583 | null |
| 2026-03-16 | iDaVIE v1.0: A virtual reality tool for interactive analysis of astronomical data cubes | Alexander Sivitilli et.al. | 2603.15490 | null |
| 2026-03-16 | ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer | Ruonan Yu et.al. | 2603.15478 | null |
| 2026-03-16 | AnyCrowd: Instance-Isolated Identity-Pose Binding for Arbitrary Multi-Character Animation | Zhenyu Xie et.al. | 2603.15415 | null |
(<a href=#updated-on-20260404>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | DenOiS: Dual-Domain Denoising of Observation and Solution in Ultrasound Image Reconstruction | Can Deniz Bezek et.al. | 2604.02105 | null |
| 2026-04-02 | Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion | Edoardo A. Dominici et.al. | 2604.01761 | null |
| 2026-04-02 | ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor | Yixiao Wang et.al. | 2604.01552 | null |
| 2026-04-01 | AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction | Aiza Maksutova et.al. | 2604.01371 | null |
| 2026-04-01 | OkanNet: A Lightweight Deep Learning Architecture for Classification of Brain Tumor from MRI Images | Okan Uçar et.al. | 2604.01264 | null |
| 2026-04-01 | Simulating Realistic LiDAR Data Under Adverse Weather for Autonomous Vehicles: A Physics-Informed Learning Approach | Vivek Anand et.al. | 2604.01254 | null |
| 2026-04-01 | Camouflage-aware Image-Text Retrieval via Expert Collaboration | Yao Jiang et.al. | 2604.01251 | null |
| 2026-04-01 | AdaLoRA-QAT: Adaptive Low-Rank and Quantization-Aware Segmentation | Prantik Deb et.al. | 2604.01167 | null |
| 2026-04-01 | Looking into a Pixel by Nonlinear Unmixing – A Generative Approach | Maofeng Tang et.al. | 2604.01141 | null |
| 2026-04-01 | VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic | Ziyu Wang et.al. | 2604.01134 | null |
| 2026-04-01 | Region-Adaptive Generative Compression with Spatially Varying Diffusion Models | Lucas Relic et.al. | 2604.01122 | null |
| 2026-04-01 | ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction | Yuheng Zhang et.al. | 2604.01081 | null |
| 2026-04-01 | IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models | Dong-Jae Lee et.al. | 2604.00757 | null |
| 2026-03-31 | Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry | Syed Eqbal Alam et.al. | 2604.00319 | null |
| 2026-03-31 | Prompt-Guided Prefiltering for VLM Image Compression | Bardia Azizian et.al. | 2604.00314 | null |
| 2026-03-31 | Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning | Ayoub Louaye Bouaziz et.al. | 2604.00263 | null |
| 2026-03-31 | Evaluation of neuroCombat and deep learning harmonization for multi-site magnetic resonance neuroimaging in youth with prenatal alcohol exposure | Chloe Scholten et.al. | 2604.00251 | null |
| 2026-03-31 | Harmonization mitigates diffusion MRI scanner effects in infancy: insights from the HEALthy Brain and Childhood Development (HBCD) study | Elyssa M. McMaster et.al. | 2604.00246 | null |
| 2026-03-31 | Pupil Design for Computational Wavefront Estimation | Ali Almuallem et.al. | 2604.00225 | null |
| 2026-03-31 | Brain MR Image Synthesis with Multi-contrast Self-attention GAN | Zaid A. Abod et.al. | 2604.00070 | null |
| 2026-03-31 | OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation | Yuheng Liu et.al. | 2603.30045 | null |
| 2026-03-31 | Polyhedral Unmixing: Bridging Semantic Segmentation with Hyperspectral Unmixing via Polyhedral-Cone Partitioning | Antoine Bottenmuller et.al. | 2603.29438 | null |
| 2026-03-31 | Rich-U-Net: A medical image segmentation model for fusing spatial depth features and capturing minute structural details | Zhuoyi Fang et.al. | 2603.29404 | null |
| 2026-03-31 | Retinal Malady Classification using AI: A novel ViT-SVM combination architecture | Shashwat Jha et.al. | 2603.29181 | null |
| 2026-03-30 | The Surprising Effectiveness of Noise Pretraining for Implicit Neural Representations | Kushal Vyas et.al. | 2603.29034 | null |
| 2026-03-30 | End-to-end optimization of sparse ultrasound linear probes | Sergio Urrea et.al. | 2603.29014 | null |
| 2026-03-30 | Hybrid Quantum-Classical AI for Industrial Defect Classification in Welding Images | Akshaya Srinivasan et.al. | 2603.28995 | null |
| 2026-03-30 | Learning a dynamic four-chamber shape model of the human heart for 95,695 UK Biobank participants | Qiang Ma et.al. | 2603.28711 | null |
| 2026-03-30 | MRI-to-CT synthesis using drifting models | Qing Lyu et.al. | 2603.28498 | null |
| 2026-03-30 | Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms | Muyang He et.al. | 2603.28489 | null |
| 2026-03-30 | Deep Learning Based Site-Specific Channel Inference Using Satellite Images | Junzhe Song et.al. | 2603.28083 | null |
| 2026-03-30 | MolmoPoint: Better Pointing for VLMs with Grounding Tokens | Christopher Clark et.al. | 2603.28069 | null |
| 2026-03-30 | Physics-Embedded Feature Learning for AI in Medical Imaging | Pulock Das et.al. | 2603.28057 | null |
| 2026-03-29 | Towards Emotion Recognition with 3D Pointclouds Obtained from Facial Expression Images | Laura RayĂłn Ropero et.al. | 2603.27798 | null |
| 2026-03-28 | Guided Lensless Polarization Imaging | Noa Kraicer et.al. | 2603.27357 | null |
| 2026-03-28 | DeepBayesFlow: A Bayesian Structured Variational Framework for Generalizable Prostate Segmentation via Expressive Posteriors and SDE-Girsanov Uncertainty Modeling | Zhuoyi Fang et.al. | 2603.27263 | null |
| 2026-03-28 | MD-RWKV-UNet: Scale-Aware Anatomical Encoding with Cross-Stage Fusion for Multi-Organ Segmentation | Zhuoyi Fang et.al. | 2603.27261 | null |
| 2026-03-28 | Quantitative measurements of biological/chemical concentrations using smartphone cameras | Zhendong Cao et.al. | 2603.27118 | null |
| 2026-03-27 | On-Device Super Resolution Imaging Using Low-Cost SPAD Array and Embedded Lightweight Deep Learning | Zhenya Zang et.al. | 2603.27018 | null |
| 2026-03-27 | Make Geometry Matter for Spatial Reasoning | Shihua Zhang et.al. | 2603.26639 | null |
| 2026-03-27 | Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling | Ruixing Zhang et.al. | 2603.26610 | null |
| 2026-03-27 | From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning | Yang Liu et.al. | 2603.26597 | null |
| 2026-03-27 | Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow | Ziyue Zeng et.al. | 2603.26571 | null |
| 2026-03-26 | TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance | Quynh Phung et.al. | 2603.25707 | null |
| 2026-03-26 | Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos | Abdullah Hamdi et.al. | 2603.25645 | null |
| 2026-03-26 | A Mamba-based Perceptual Loss Function for Learning-based UGC Transcoding | Zihao Qi et.al. | 2603.25566 | null |
| 2026-03-26 | Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case | Koldo Basterretxea et.al. | 2603.25510 | null |
| 2026-03-26 | Language-Free Generative Editing from One Visual Example | Omar Elezabi et.al. | 2603.25441 | null |
| 2026-03-26 | PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders | Niccolò Cavagnero et.al. | 2603.25398 | null |
| 2026-03-26 | Underdetermined Blind Source Separation via Weighted Simplex Shrinkage Regularization and Quantum Deep Image Prior | Chia-Hsiang Lin et.al. | 2603.25384 | null |
| 2026-03-26 | Image Rotation Angle Estimation: Comparing Circular-Aware Methods | Maximilian Woehrer et.al. | 2603.25351 | null |
| 2026-03-26 | Pixelis: Reasoning in Pixels, from Seeing to Acting | Yunpeng Zhou et.al. | 2603.25091 | null |
| 2026-03-26 | MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models | Dohwan Ko et.al. | 2603.24984 | null |
| 2026-03-26 | Subject-Specific Low-Field MRI Synthesis via a Neural Operator | Ziqi Gao et.al. | 2603.24968 | null |
| 2026-03-25 | OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video | Selim Gilon et.al. | 2603.24733 | null |
| 2026-03-25 | Vision-Language Models vs Human: Perceptual Image Quality Assessment | Imran Mehmood et.al. | 2603.24578 | null |
| 2026-03-25 | Anti-I2V: Safeguarding your photos from malicious image-to-video generation | Duc Vu et.al. | 2603.24570 | null |
| 2026-03-25 | OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning | Kaihang Pan et.al. | 2603.24458 | null |
| 2026-03-25 | Modeling Spatiotemporal Neural Frames for High Resolution Brain Dynamic | Wanying Qu et.al. | 2603.24176 | null |
| 2026-03-25 | Comparative analysis of dual-form networks for live land monitoring using multi-modal satellite image time series | Iris Dumeur et.al. | 2603.24109 | null |
| 2026-03-25 | Blind Quality Enhancement for G-PCC Compressed Dynamic Point Clouds | Tian Guo et.al. | 2603.24026 | null |
| 2026-03-25 | MonoSIM: An open source SIL framework for Ackermann Vehicular Systems with Monocular Vision | Shantanu Rahman et.al. | 2603.23965 | null |
| 2026-03-25 | Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection | Jielun Peng et.al. | 2603.23960 | null |
| 2026-03-25 | Attention-aware Inference Optimizations for Large Vision-Language Models with Memory-efficient Decoding | Fatih Ilhan et.al. | 2603.23914 | null |
| 2026-03-25 | Joint Source-Channel-Check Coding with HARQ for Reliable Semantic Communications | Boyuan Li et.al. | 2603.23869 | null |
| 2026-03-24 | Sentinel-2 for Crop Yield Estimation: A Systematic Review | Mohammadreza Narimani et.al. | 2603.23779 | null |
| 2026-03-24 | Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation | Brian Chao et.al. | 2603.23491 | null |
| 2026-03-24 | Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation | Xinyu Liu et.al. | 2603.23390 | null |
| 2026-03-24 | GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models | Zekai Gu et.al. | 2603.23246 | null |
| 2026-03-24 | Rigid Motion Estimation using Accelerated Iterative Coordinate Descent (REACT) for MR Imaging | Kwang Eun Jang et.al. | 2603.23096 | null |
| 2026-03-24 | WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion | Manuel-Andreas Schneider et.al. | 2603.22972 | null |
| 2026-03-24 | Retrieval-Guided Photovoltaic Inventory Estimation from Satellite Imagery for Distribution Grid Planning | Muhao Guo et.al. | 2603.22856 | null |
| 2026-03-24 | L-UNet: An LSTM Network for Remote Sensing Image Change Detection | Shuting Sun et.al. | 2603.22842 | null |
| 2026-03-24 | Viewport-based Neural 360° Image Compression | Jingwei Liao et.al. | 2603.22776 | null |
| 2026-03-23 | Drop-In Perceptual Optimization for 3D Gaussian Splatting | Ezgi Ozyilkan et.al. | 2603.23297 | null |
| 2026-03-23 | Single-Subject Multi-View MRI Super-Resolution via Implicit Neural Representations | Heejong Kim et.al. | 2603.22627 | null |
| 2026-03-23 | Far-field compressive ultrasound beamforming | Nikunj Khetan et.al. | 2603.22496 | null |
| 2026-03-23 | P-Flow: Prompting Visual Effects Generation | Rui Zhao et.al. | 2603.22091 | null |
| 2026-03-23 | A Latent Representation Learning Framework for Hyperspectral Image Emulation in Remote Sensing | Chedly Ben Azizi et.al. | 2603.21911 | null |
| 2026-03-23 | HMS-VesselNet: Hierarchical Multi-Scale Attention Network with Topology-Preserving Loss for Retinal Vessel Segmentation | Amarnath R et.al. | 2603.21891 | null |
| 2026-03-23 | The Universal Normal Embedding | Chen Tasker et.al. | 2603.21786 | null |
| 2026-03-23 | Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration | Jiaqi Shang et.al. | 2603.21760 | null |
| 2026-03-23 | Unregistered Spectral Image Fusion: Unmixing, Adversarial Learning, and Recoverability | Jiahui Song et.al. | 2603.21510 | null |
| 2026-03-22 | OrbitStream: Training-Free Adaptive 360-degree Video Streaming via Semantic Potential Fields | Aizierjiang Aiersilan et.al. | 2603.20999 | null |
| 2026-03-21 | Underwater imaging without color distortions requires RAW capture | Derya Akkaynak et.al. | 2603.20823 | null |
| 2026-03-21 | mmWave-Diffusion:A Novel Framework for Respiration Sensing Using Observation-Anchored Conditional Diffusion Model | Yong Wang et.al. | 2603.20700 | null |
| 2026-03-21 | Seed1.8 Model Card: Towards Generalized Real-World Agency | Bytedance Seed et.al. | 2603.20633 | null |
| 2026-03-20 | Thermal is Always Wild: Characterizing and Addressing Challenges in Thermal-Only Novel View Synthesis | M. Kerem Aydin et.al. | 2603.20448 | null |
| 2026-03-20 | CaroTo: A Tool for Fast Comprehensive Analysis of Carotid Artery Stenosis in 4D PC- and 3D BB-MRI Data | Hinrich Rahlfs et.al. | 2603.20355 | null |
| 2026-03-20 | A Unified Platform and Quality Assurance Framework for 3D Ultrasound Reconstruction with Robotic, Optical, and Electromagnetic Tracking | Lewis Howell et.al. | 2603.20077 | null |
| 2026-03-20 | Investigating a Policy-Based Formulation for Endoscopic Camera Pose Recovery | Jan Emily Mangulabnan et.al. | 2603.20045 | null |
| 2026-03-20 | Goal-Oriented Framework for Optical Flow-based Multi-User Multi-Task Video Transmission | Yujie Xu et.al. | 2603.19995 | null |
| 2026-03-20 | Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts | John Turnbull et.al. | 2603.19994 | null |
| 2026-03-20 | ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis | Lubin Gan et.al. | 2603.19925 | null |
| 2026-03-20 | Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive | Robin Spanier et.al. | 2603.19801 | null |
| 2026-03-19 | TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis | Atharva Rege et.al. | 2603.19386 | null |
| 2026-03-19 | Spectrally-Guided Diffusion Noise Schedules | Carlos Esteves et.al. | 2603.19222 | null |
| 2026-03-19 | GenMFSR: Generative Multi-Frame Image Restoration and Super-Resolution | Harshana Weligampola et.al. | 2603.19187 | null |
| 2026-03-19 | Student views in AI Ethics and Social Impact | Tudor-Dan Mihoc et.al. | 2603.18827 | null |
| 2026-03-19 | A Hybrid Physical–Digital Framework for Annotated Fracture Reduction Data Evaluated using Clinically Relevant 3D metrics | Basile Longo et.al. | 2603.18723 | null |
| 2026-03-19 | UEPS: Robust and Efficient MRI Reconstruction | Xiang Zhou et.al. | 2603.18572 | null |
| 2026-03-19 | SCISSR: Scribble-Conditioned Interactive Surgical Segmentation and Refinement | Haonan Ping et.al. | 2603.18544 | null |
| 2026-03-19 | TransText: Alpha-as-RGB Representation for Transparent Text Animation | Fei Zhang et.al. | 2603.17944 | null |
| 2026-03-18 | Energy-Aware Frame Rate Selection for Video Coding | Geetha Ramasubbu et.al. | 2603.18305 | null |
| 2026-03-18 | Understanding Task Aggregation for Generalizable Ultrasound Foundation Models | Fangyijie Wang et.al. | 2603.18123 | null |
| 2026-03-18 | Dual Agreement Consistency Learning with Foundation Models for Semi-Supervised Fetal Heart Ultrasound Segmentation and Diagnosis | Fangyijie Wang et.al. | 2603.18119 | null |
| 2026-03-18 | Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models | Yuhao Dong et.al. | 2603.18118 | null |
| 2026-03-18 | The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering | Yigit Ekin et.al. | 2603.17998 | null |
| 2026-03-18 | Video Understanding: From Geometry and Semantics to Unified Models | Zhaochong An et.al. | 2603.17840 | null |
| 2026-03-18 | Cache-enabled Generative Joint Source-Channel Coding for Evolving Semantic Communications | Shunpu Tang et.al. | 2603.17702 | null |
| 2026-03-18 | Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos | Songtao Jiang et.al. | 2603.17693 | null |
| 2026-03-18 | Few-Step Diffusion Sampling Through Instance-Aware Discretizations | Liangyu Yuan et.al. | 2603.17671 | null |
| 2026-03-18 | FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion | Hugo Caselles-Dupré et.al. | 2603.17555 | null |
| 2026-03-18 | Deep Learning-Based Airway Segmentation in Systemic Lupus Erythematosus Patients with Interstitial Lung Disease (SLE-ILD): A Comparative High-Resolution CT Analysis | Sirong Piao et.al. | 2603.17547 | null |
| 2026-03-18 | SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning | Xi Ye et.al. | 2603.17426 | null |
| 2026-03-18 | Structured SIR: Efficient and Expressive Importance-Weighted Inference for High-Dimensional Image Registration | Ivor J. A. Simpson et.al. | 2603.17415 | null |
| 2026-03-18 | A 3D Reconstruction Benchmark for Asset Inspection | James L. Gray et.al. | 2603.17358 | null |
| 2026-03-17 | A Lensless Polarization Camera | Noa Kraicer et.al. | 2603.17156 | null |
| 2026-03-17 | Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication | Omar Erak et.al. | 2603.17126 | null |
| 2026-03-17 | Surg $ÎŁ$ : A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence | Zhitao Zeng et.al. | 2603.16822 | null |
| 2026-03-17 | Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping | Justin McMillen et.al. | 2603.16788 | null |
| 2026-03-17 | Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search | Sainan Liu et.al. | 2603.16711 | null |
| 2026-03-17 | vAccSOL: Efficient and Transparent AI Vision Offloading for Mobile Robots | Adam Zahir et.al. | 2603.16685 | null |
| 2026-03-17 | HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes | Pierre-Antoine Bannier et.al. | 2603.16587 | null |
| 2026-03-17 | Fanar 2.0: Arabic Generative AI Stack | FANAR TEAM et.al. | 2603.16397 | null |
| 2026-03-17 | The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models | Eduardo Nebot et.al. | 2603.16050 | null |
| 2026-03-17 | Clinical Priors Guided Lung Disease Detection in 3D CT Scans | Kejin Lu et.al. | 2603.15143 | null |
| 2026-03-16 | FlatLands: Generative Floormap Completion From a Single Egocentric View | Subhransu S. Bhattacharjee et.al. | 2603.16016 | null |
| 2026-03-16 | Standardizing Medical Images at Scale for AI | Callen MacPhee et.al. | 2603.15980 | null |
| 2026-03-16 | GLANCE: Gaze-Led Attention Network for Compressed Edge-inference | Neeraj Solanki et.al. | 2603.15717 | null |
| 2026-03-16 | ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer | Ruonan Yu et.al. | 2603.15478 | null |
| 2026-03-16 | Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation | Yuanfan Zheng et.al. | 2603.15475 | null |
| 2026-03-16 | Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling | Aram Davtyan et.al. | 2603.15279 | null |
| 2026-03-16 | CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds | Vaishnavi Nagabhushana et.al. | 2603.15184 | null |
(<a href=#updated-on-20260404>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | VOID: Video Object and Interaction Deletion | Saman Motamed et.al. | 2604.02296 | null |
| 2026-03-31 | CutClaw: Agentic Hours-Long Video Editing via Music Synchronization | Shifang Zhao et.al. | 2603.29664 | null |
| 2026-03-31 | TrajectoryMover: Generative Movement of Object Trajectories in Videos | Kiran Chhatre et.al. | 2603.29092 | null |
| 2026-03-31 | X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving | Chaoda Zheng et.al. | 2603.19979 | null |
| 2026-03-30 | AutoCut: End-to-end advertisement video editing based on multimodal discretization and controllable generation | Milton Zhou et.al. | 2603.28366 | null |
| 2026-03-26 | TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance | Quynh Phung et.al. | 2603.25707 | null |
| 2026-03-25 | AVControl: Efficient Framework for Training Audio-Visual Controls | Matan Ben-Yosef et.al. | 2603.24793 | null |
| 2026-03-25 | Accelerating Diffusion-based Video Editing via Heterogeneous Caching: Beyond Full Computing at Sampled Denoising Timestep | Tianyi Liu et.al. | 2603.24260 | null |
| 2026-03-24 | RealMaster: Lifting Rendered Scenes into Photorealistic Video | Dana Cohen-Bar et.al. | 2603.23462 | null |
| 2026-03-20 | PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing | Jiadong Liang et.al. | 2603.19731 | null |
| 2026-03-19 | SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing | Xinyao Zhang et.al. | 2603.19228 | null |
| 2026-03-19 | EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing | Yang Fu et.al. | 2603.19224 | null |
| 2026-03-18 | Versatile Editing of Video Content, Actions, and Dynamics without Training | Vladimir Kulikov et.al. | 2603.17989 | null |
| 2026-03-18 | ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation | Dmitriy Rivkin et.al. | 2603.17812 | null |
| 2026-03-18 | SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model | Guibin Chen et.al. | 2602.21818 | null |
| 2026-03-17 | SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation | Jiongze Yu et.al. | 2603.16864 | null |
| 2026-03-14 | Script-to-Slide Grounding: Grounding Script Sentences to Slide Objects for Automatic Instructional Video Generation | Rena Suzuki et.al. | 2603.16931 | null |
| 2026-03-13 | GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generation | Hao Zhang et.al. | 2602.20673 | null |
| 2026-03-10 | When to Lock Attention: Training-Free KV Control in Video Diffusion | Tianyi Zeng et.al. | 2603.09657 | null |
| 2026-03-10 | From Ideal to Real: Stable Video Object Removal under Imperfect Conditions | Jiagao Hu et.al. | 2603.09283 | null |
| 2026-03-06 | Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion | Bohai Gu et.al. | 2603.06140 | null |
| 2026-03-06 | GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection | Xuan Huang et.al. | 2603.06048 | null |
| 2026-03-06 | Training-free Latent Inter-Frame Pruning with Attention Recovery | Dennis Menn et.al. | 2603.05811 | null |
| 2026-03-06 | Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance | Yiqi Lin et.al. | 2603.02175 | null |
| 2026-03-06 | UniVBench: Towards Unified Evaluation for Video Foundation Models | Jianhui Wei et.al. | 2602.21835 | null |
| 2026-03-03 | NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing | Tianlin Pan et.al. | 2603.02802 | null |
| 2026-03-01 | FREE-Edit: Using Editing-aware Injection in Rectified Flow Models for Zero-shot Image-Driven Video Editing | Maomao Li et.al. | 2603.01164 | null |
| 2026-02-25 | StoryComposerAI: Supporting Human-AI Story Co-Creation Through Decomposition and Linking | Shuo Niu et.al. | 2602.21486 | null |
| 2026-02-24 | PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models | Wonyong Seo et.al. | 2602.20583 | null |
| 2026-02-16 | EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing | Yehonathan Litman et.al. | 2602.15031 | null |
(<a href=#updated-on-20260404>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | ActionParty: Multi-Subject Action Binding in Generative Video Games | Alexander Pondaven et.al. | 2604.02330 | null |
| 2026-04-02 | VOID: Video Object and Interaction Deletion | Saman Motamed et.al. | 2604.02296 | null |
| 2026-04-02 | Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives | Hao Zhu et.al. | 2604.02250 | null |
| 2026-04-02 | Reflection Generation for Composite Image Using Diffusion Model | Haonan Zhao et.al. | 2604.02168 | null |
| 2026-04-02 | Why Gaussian Diffusion Models Fail on Discrete Data? | Alexander Shabalin et.al. | 2604.02028 | null |
| 2026-04-02 | Multiphase cross-diffusion models for tissue structures: modeling, analysis, numerics | Ansgar JĂĽngel et.al. | 2604.01827 | null |
| 2026-04-02 | SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers | Xiang Yang et.al. | 2604.01826 | null |
| 2026-04-02 | Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion | Edoardo A. Dominici et.al. | 2604.01761 | null |
| 2026-04-02 | SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing | Thinh Dao et.al. | 2604.01715 | null |
| 2026-04-02 | Bias mitigation in graph diffusion models | Meng Yu et.al. | 2604.01709 | null |
| 2026-04-02 | Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation | Lingyu Liu et.al. | 2604.01700 | null |
| 2026-04-02 | From Understanding to Erasing: Towards Complete and Stable Video Object Removal | Dingming Liu et.al. | 2604.01693 | null |
| 2026-04-02 | DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data | Wonjoon Jin et.al. | 2604.01666 | null |
| 2026-04-02 | Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations | Yue Li et.al. | 2604.01635 | null |
| 2026-04-02 | Cross-Domain Vessel Segmentation via Latent Similarity Mining and Iterative Co-Optimization | Zhanqiang Guo et.al. | 2604.01553 | null |
| 2026-04-01 | Learning and Generating Mixed States Prepared by Shallow Channel Circuits | Fangjun Hu et.al. | 2604.01197 | null |
| 2026-04-01 | ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation | Hao Zhang et.al. | 2604.01129 | null |
| 2026-04-01 | Region-Adaptive Generative Compression with Spatially Varying Diffusion Models | Lucas Relic et.al. | 2604.01122 | null |
| 2026-04-01 | Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation | Yun-Ning et.al. | 2604.01120 | null |
| 2026-04-01 | Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models | Jonas Schaible et.al. | 2604.01106 | null |
| 2026-04-01 | PHASOR: Anatomy- and Phase-Consistent Volumetric Diffusion for CT Virtual Contrast Enhancement | Zilong Li et.al. | 2604.01053 | null |
| 2026-04-01 | EmoScene: A Dual-space Dataset for Controllable Affective Image Generation | Li He et.al. | 2604.00933 | null |
| 2026-04-01 | IDDM: Identity-Decoupled Personalized Diffusion Models with a Tunable Privacy-Utility Trade-off | Linyan Dai et.al. | 2604.00903 | null |
| 2026-04-01 | HICT: High-precision 3D CBCT reconstruction from a single X-ray | Wen Ma et.al. | 2604.00792 | null |
| 2026-04-01 | Learnability-Guided Diffusion for Dataset Distillation | Jeffrey A. Chan-Santiago et.al. | 2604.00519 | null |
| 2026-04-01 | Tucker Diffusion Model for High-dimensional Tensor Generation | Jianhua Guo et.al. | 2604.00481 | null |
| 2026-04-01 | Learning Humanoid Navigation from Human Data | Weizhuo Wang et.al. | 2604.00416 | null |
| 2026-04-01 | Deep Networks Favor Simple Data | Weyl Lu et.al. | 2604.00394 | null |
| 2026-04-01 | Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data | Shihao Li et.al. | 2604.00391 | null |
| 2026-04-01 | mmAnomaly: Leveraging Visual Context for Robust Anomaly Detection in the Non-Visual World with mmWave Radar | Tarik Reza Toha et.al. | 2604.00382 | null |
| 2026-03-31 | Video Models Reason Early: Exploiting Plan Commitment for Maze Solving | Kaleb Newman et.al. | 2603.30043 | null |
| 2026-03-31 | Conditional Diffusion-Based Point Cloud Imaging for UAV Position and Attitude Sensing | Xinhong Dai et.al. | 2603.29822 | null |
| 2026-03-31 | Emotion Diffusion Classifier with Adaptive Margin Discrepancy Training for Facial Expression Recognition | Rongkang Dong et.al. | 2603.29578 | null |
| 2026-03-31 | Total Variation Guarantees for Sampling with Stochastic Localization | Jakob Kellermann et.al. | 2603.29555 | null |
| 2026-03-31 | iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models | Xudong Zhou et.al. | 2603.29469 | null |
| 2026-03-31 | NeoNet: An End-to-End 3D MRI-Based Deep Learning Framework for Non-Invasive Prediction of Perineural Invasion via Generation-Driven Classification | Youngung Han et.al. | 2603.29449 | null |
| 2026-03-31 | Ultra-short-term volatility surfaces | Federico M. Bandi et.al. | 2603.29430 | null |
| 2026-03-31 | Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning | Jiaao Ma et.al. | 2603.29426 | null |
| 2026-03-31 | Pathogen diversity emerging from coevolutionary dynamics in interconnected systems | Davide Zanchetta et.al. | 2603.29398 | null |
| 2026-03-31 | CIPHER: Counterfeit Image Pattern High-level Examination via Representation | Kyeonghun Kim et.al. | 2603.29356 | null |
| 2026-03-31 | FOSCU: Feasibility of Synthetic MRI Generation via Duo-Diffusion Models for Enhancement of 3D U-Nets in Hepatic Segmentation | Youngung Han et.al. | 2603.29343 | null |
| 2026-03-31 | Differentiable Normative Guidance for Nash Bargaining Solution Recovery | Moirangthem Tiken Singh et.al. | 2603.29297 | null |
| 2026-03-31 | Diffusion Mental Averages | Phonphrm Thawatdamrongkit et.al. | 2603.29239 | null |
| 2026-03-30 | Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos | Yujin Ham et.al. | 2603.29036 | null |
| 2026-03-30 | MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation | Bharath Krishnamurthy et.al. | 2603.29029 | null |
| 2026-03-30 | Geometry-aware similarity metrics for neural representations on Riemannian and statistical manifolds | N Alex Cayco Gajic et.al. | 2603.28764 | null |
| 2026-03-30 | PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models | Lorenza Prospero et.al. | 2603.28763 | null |
| 2026-03-30 | On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers | Omer Dahary et.al. | 2603.28762 | null |
| 2026-03-30 | DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing | Kailai Feng et.al. | 2603.28713 | null |
| 2026-03-30 | Front Location for Go or Grow Models of Aerotaxis | Mete Demircigil et.al. | 2603.28663 | null |
| 2026-03-30 | $R_{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation | Linqian Fan et.al. | 2603.28460 | null |
| 2026-03-30 | Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science | Yipeng Yu et.al. | 2603.28361 | null |
| 2026-03-30 | Intrinsically ultralow thermal conductivity in all-inorganic superatomic bulk crystals | Mingzhang Yang et.al. | 2603.28267 | null |
| 2026-03-30 | ColorFLUX: A Structure-Color Decoupling Framework for Old Photo Colorization | Bingchen Li et.al. | 2603.28162 | null |
| 2026-03-30 | SVGS: Single-View to 3D Object Editing via Gaussian Splatting | Pengcheng Xue et.al. | 2603.28126 | null |
| 2026-03-30 | Attention Frequency Modulation: Training-Free Spectral Modulation of Diffusion Cross-Attention | Seunghun Oh et.al. | 2603.28114 | null |
| 2026-03-30 | Physics-Embedded Feature Learning for AI in Medical Imaging | Pulock Das et.al. | 2603.28057 | null |
| 2026-03-30 | Self-Organizing Score-based Data Assimilation | Yuma Yamaoka et.al. | 2603.28048 | null |
| 2026-03-30 | From Independent to Correlated Diffusion: Generalized Generative Modeling with Probabilistic Computers | Nihal Sanjay Singh et.al. | 2603.27996 | null |
| 2026-03-30 | Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment | Tongfei Liu et.al. | 2603.27987 | null |
| 2026-03-29 | Diversity Matters: Dataset Diversification and Dual-Branch Network for Generalized AI-Generated Image Detection | Nusrat Tasnim et.al. | 2603.27800 | null |
| 2026-03-29 | Heracles: Bridging Precise Tracking and Generative Synthesis for General Humanoid Control | Zelin Tao et.al. | 2603.27756 | null |
| 2026-03-29 | Bridging Schrödinger and Bass: A Semimartingale Optimal Transport Problem with Diffusion Control | Pierre Henry-Labordere et.al. | 2603.27712 | null |
| 2026-03-29 | Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers | Yuhe Liu et.al. | 2603.27666 | null |
| 2026-03-26 | PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference | Xiaofeng Mao et.al. | 2603.25730 | null |
| 2026-03-26 | S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation | Ligong Han et.al. | 2603.25702 | null |
| 2026-03-26 | Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning | Jai Bardhan et.al. | 2603.25685 | null |
| 2026-03-26 | Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training | Xiangyang Luo et.al. | 2603.25527 | null |
| 2026-03-26 | Lightweight GenAI for Network Traffic Synthesis: Fidelity, Augmentation, and Classification | Giampaolo Bovenzi et.al. | 2603.25507 | null |
| 2026-03-26 | Temporally Decoupled Diffusion Planning for Autonomous Driving | Xiang Li et.al. | 2603.25462 | null |
| 2026-03-26 | Language-Free Generative Editing from One Visual Example | Omar Elezabi et.al. | 2603.25441 | null |
| 2026-03-26 | Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells | Han Zhang et.al. | 2603.25240 | null |
| 2026-03-26 | Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction | Jiahao Tian et.al. | 2603.25209 | null |
| 2026-03-26 | CardioDiT: Latent Diffusion Transformers for 4D Cardiac MRI Synthesis | Marvin Seyfarth et.al. | 2603.25194 | null |
| 2026-03-26 | VolDiT: Controllable Volumetric Medical Image Synthesis with Diffusion Transformers | Marvin Seyfarth et.al. | 2603.25181 | null |
| 2026-03-26 | Bilingual Text-to-Motion Generation: A New Benchmark and Baselines | Wanjiang Weng et.al. | 2603.25178 | null |
| 2026-03-26 | A Reaction-Advection-Diffusion Model to describe Non-Uniformities in Colorimetric Sensing using Thin Porous Substrates | Kulkarni Namratha et.al. | 2603.25124 | null |
| 2026-03-26 | Learning Explicit Continuous Motion Representation for Dynamic Gaussian Splatting from Monocular Videos | Xuankai Zhang et.al. | 2603.25058 | null |
| 2026-03-26 | BiFM: Bidirectional Flow Matching for Few-Step Image Editing and Generation | Yasong Dai et.al. | 2603.24942 | null |
| 2026-03-25 | Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method | Arthur Jacot et.al. | 2603.24594 | null |
| 2026-03-25 | Anti-I2V: Safeguarding your photos from malicious image-to-video generation | Duc Vu et.al. | 2603.24570 | null |
| 2026-03-25 | Reflected diffusion models adapt to low-dimensional data | Asbjørn Holk et.al. | 2603.24495 | null |
| 2026-03-25 | Analysis and numerical simulation of a spatio-temporal Ricker-type model for the control of Aedes aegypti mosquitoes with Sterile Insect Techniques | Oscar Eduardo Escobar-Lasso et.al. | 2603.24460 | null |
| 2026-03-25 | Teacher-Student Diffusion Model for Text-Driven 3D Hand Motion Generation | Ching-Lam Cheng et.al. | 2603.24407 | null |
| 2026-03-25 | ViHOI: Human-Object Interaction Synthesis with Visual Priors | Songjin Cai et.al. | 2603.24383 | null |
| 2026-03-25 | ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors | Haodong Yu et.al. | 2603.24270 | null |
| 2026-03-25 | LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation | Ryugo Morita et.al. | 2603.24086 | null |
| 2026-03-25 | When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm | Ye Leng et.al. | 2603.24079 | null |
| 2026-03-25 | HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models | Yeqi He et.al. | 2603.24043 | null |
| 2026-03-25 | Lagrangian Relaxation Score-based Generation for Mixed Integer linear Programming | Ruobing Wang et.al. | 2603.24033 | null |
| 2026-03-25 | DepthArb: Training-Free Depth-Arbitrated Generation for Occlusion-Robust Image Synthesis | Hongjin Niu et.al. | 2603.23924 | null |
| 2026-03-25 | Latent Bias Alignment for High-Fidelity Diffusion Inversion in Real-World Image Reconstruction and Manipulation | Weiming Chen et.al. | 2603.23903 | null |
| 2026-03-25 | A simple model for conserved intracellular dynamics exhibits multiscale pattern formation, traveling protein domains and arrested coarsening of lipids in the membrane | Benjamin Winkler et.al. | 2603.23856 | null |
| 2026-03-25 | 3D-LLDM: Label-Guided 3D Latent Diffusion Model for Improving High-Resolution Synthetic MR Imaging in Hepatic Structure Segmentation | Kyeonghun Kim et.al. | 2603.23845 | null |
| 2026-03-24 | DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models | Jaewon Min et.al. | 2603.23499 | null |
| 2026-03-24 | Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation | Brian Chao et.al. | 2603.23491 | null |
| 2026-03-24 | RealMaster: Lifting Rendered Scenes into Photorealistic Video | Dana Cohen-Bar et.al. | 2603.23462 | null |
| 2026-03-24 | Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation | Michal Balcerak et.al. | 2603.23398 | null |
| 2026-03-24 | Markov State–Space Modeling and Channel Characterization for DNA-Based Molecular Communication | Ruifeng Zheng et.al. | 2603.23394 | null |
| 2026-03-24 | FG-Portrait: 3D Flow Guided Editable Portrait Animation | Yating Xu et.al. | 2603.23381 | null |
| 2026-03-24 | ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images | Yunfeng Wu et.al. | 2603.23326 | null |
| 2026-03-24 | Permutation-Symmetrized Diffusion for Unconditional Molecular Generation | Gyeonghoon Ko et.al. | 2603.23255 | null |
| 2026-03-24 | GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models | Zekai Gu et.al. | 2603.23246 | null |
| 2026-03-24 | AeroScene: Progressive Scene Synthesis for Aerial Robotics | Nghia Vu et.al. | 2603.23224 | null |
| 2026-03-24 | Gimbal360: Differentiable Auto-Leveling for Canonicalized $360^\circ$ Panoramic Image Completion | Yuqin Lu et.al. | 2603.23179 | null |
| 2026-03-24 | Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards | Orhun BuÄźra Baran et.al. | 2603.23086 | null |
| 2026-03-24 | Zero-Shot Personalization of Objects via Textual Inversion | Aniket Roy et.al. | 2603.23010 | null |
| 2026-03-24 | Markov-Enforced Discrete Diffusion Model for Digital Semantic Symbol Error Correction | Yoon Huh et.al. | 2603.22983 | null |
| 2026-03-24 | Asymptotic Learning Curves for Diffusion Models with Random Features Score and Manifold Data | Anand Jerry George et.al. | 2603.22962 | null |
| 2026-03-23 | End-to-End Training for Unified Tokenization and Latent Denoising | Shivam Duggal et.al. | 2603.22283 | null |
| 2026-03-23 | Repurposing Geometric Foundation Models for Multi-view Diffusion | Wooseok Jang et.al. | 2603.22275 | null |
| 2026-03-23 | DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution | Zhengyao Lv et.al. | 2603.22271 | null |
| 2026-03-23 | SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation | Sashuai Zhou et.al. | 2603.22228 | null |
| 2026-03-23 | DA-VAE: Plug-in Latent Compression for Diffusion via Detail Alignment | Xin Cai et.al. | 2603.22125 | null |
| 2026-03-23 | DTVI: Dual-Stage Textual and Visual Intervention for Safe Text-to-Image Generation | Binhong Tan et.al. | 2603.22041 | null |
| 2026-03-23 | APEG: Adaptive Physical Layer Authentication with Channel Extrapolation and Generative AI | Xiqi Cheng et.al. | 2603.21923 | null |
| 2026-03-23 | CLEAR: Context-Aware Learning with End-to-End Mask-Free Inference for Adaptive Video Subtitle Removal | Qingdong He et.al. | 2603.21901 | null |
| 2026-03-23 | ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval | Zhuocheng Zhang et.al. | 2603.21886 | null |
| 2026-03-23 | Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation | Donald Shenaj et.al. | 2603.21884 | null |
| 2026-03-23 | Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation | Yuyang You et.al. | 2603.21864 | null |
| 2026-03-23 | Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning | Sulian Thual et.al. | 2603.21856 | null |
| 2026-03-23 | A hybrid wavelet-based physics-informed neural network for portfolio management | Bahadur Yadav et.al. | 2603.21834 | null |
| 2026-03-23 | Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction | Kuangzhe Xu et.al. | 2603.21735 | null |
| 2026-03-23 | Unimodular Diffusion and Interacting Vacuum Cosmology | Gopal Kashyap et.al. | 2603.21675 | null |
| 2026-03-23 | DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers | Tianyu Cao et.al. | 2603.21608 | null |
| 2026-03-23 | PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models | Yiwei Xie et.al. | 2603.21547 | null |
| 2026-03-23 | Empirical Evaluation of Link Deletion Methods for Limiting Information Diffusion on Social Media | Shiori Furukawa et.al. | 2603.21470 | null |
| 2026-03-22 | Is the future of AI green? What can innovation diffusion models say about generative AI’s environmental impact? | Robert Viseur et.al. | 2603.21419 | null |
| 2026-03-22 | An InSAR Phase Unwrapping Framework for Large-scale and Complex Events | Yijia Song et.al. | 2603.21378 | null |
| 2026-03-22 | Efficient Coarse-to-Fine Diffusion Models with Time Step Sequence Redistribution | Yu-Shan Tai et.al. | 2603.21348 | null |
| 2026-03-20 | LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation | Jiazheng Xing et.al. | 2603.20192 | null |
| 2026-03-20 | Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods | Sebastian Gerard et.al. | 2603.20188 | null |
| 2026-03-20 | Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD | Emiel Hoogeboom et.al. | 2603.20155 | null |
| 2026-03-20 | How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models | Luca Ambrogioni et.al. | 2603.20092 | null |
| 2026-03-20 | Timestep-Aware Block Masking for Efficient Diffusion Model Inference | Haodong He et.al. | 2603.19939 | null |
| 2026-03-20 | A distribution-free lattice Boltzmann method for compartmental reaction-diffusion systems with application to epidemic modelling | Alessandro De Rosis et.al. | 2603.19789 | null |
| 2026-03-20 | Diminishing Returns in Expanding Generative Models and Godel-Tarski-Lob Limits | Angshul Majumdar et.al. | 2603.19687 | null |
| 2026-03-20 | ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models | Mohammad Shahab Sepehri et.al. | 2603.19676 | null |
| 2026-03-20 | Making Video Models Adhere to User Intent with Minor Adjustments | Daniel Ajisafe et.al. | 2603.19672 | null |
| 2026-03-20 | OmniDiT: Extending Diffusion Transformer to Omni-VTON Framework | Weixuan Zeng et.al. | 2603.19643 | null |
| 2026-03-20 | On the role of memorization in learned priors for geophysical inverse problems | Ali Siahkoohi et.al. | 2603.19629 | null |
| 2026-03-20 | MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation | Kaixin Cai et.al. | 2603.19575 | null |
| 2026-03-20 | Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation | Chuhan Wang et.al. | 2603.19570 | null |
| 2026-03-19 | TRACE: Trajectory Recovery with State Propagation Diffusion for Urban Mobility | Jinming Wang et.al. | 2603.19474 | null |
| 2026-03-19 | TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis | Atharva Rege et.al. | 2603.19386 | null |
| 2026-03-19 | Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding | Xianjin Wu et.al. | 2603.19235 | null |
| 2026-03-19 | Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer | Chenyang Gu et.al. | 2603.19227 | null |
| 2026-03-19 | Spectrally-Guided Diffusion Noise Schedules | Carlos Esteves et.al. | 2603.19222 | null |
| 2026-03-19 | Rethinking Vector Field Learning for Generative Segmentation | Chaoyang Wang et.al. | 2603.19218 | null |
| 2026-03-19 | RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing | Yue Gong et.al. | 2603.19206 | null |
| 2026-03-19 | MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data | Masoumeh Shafieinejad et.al. | 2603.19185 | null |
| 2026-03-19 | ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation | Kwanyoung Lee et.al. | 2603.19157 | null |
| 2026-03-19 | D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding | Jonathan Lys et.al. | 2603.19146 | null |
| 2026-03-19 | Revisiting Autoregressive Models for Generative Image Classification | Ilia Sudakov et.al. | 2603.19122 | null |
| 2026-03-19 | FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal | Telang Xu et.al. | 2603.19036 | null |
| 2026-03-19 | Foundations of Schrödinger Bridges for Generative Modeling | Sophia Tang et.al. | 2603.18992 | null |
| 2026-03-19 | CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think | Zening Sun et.al. | 2603.18991 | null |
| 2026-03-19 | Neural Galerkin Normalizing Flow for Transition Probability Density Functions of Diffusion Models | Riccardo Saporiti et.al. | 2603.18907 | null |
| 2026-03-19 | Translating MRI to PET through Conditional Diffusion Models with Enhanced Pathology Awareness | Yitong Li et.al. | 2603.18896 | null |
| 2026-03-19 | RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction | Xiucheng Wang et.al. | 2603.18865 | null |
| 2026-03-18 | AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors | Aymen Mir et.al. | 2603.17975 | null |
| 2026-03-18 | LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition | Vlad-Constantin Lungu-Stan et.al. | 2603.17965 | null |
| 2026-03-18 | Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control | Zunzhe Zhang et.al. | 2603.17834 | null |
| 2026-03-18 | TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models | Qianlong Xiang et.al. | 2603.17828 | null |
| 2026-03-18 | ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation | Dmitriy Rivkin et.al. | 2603.17812 | null |
| 2026-03-18 | CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image | Yizheng Song et.al. | 2603.17779 | null |
| 2026-03-18 | Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems | Qi Liu et.al. | 2603.17750 | null |
| 2026-03-18 | TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos | Yan Zeng et.al. | 2603.17735 | null |
| 2026-03-18 | Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models | Jaemin Kim et.al. | 2603.17677 | null |
| 2026-03-18 | Proof-of-Authorship for Diffusion-based AI Generated Content | De Zhang Lee et.al. | 2603.17513 | null |
| 2026-03-18 | A Tutorial on Learning-Based Radio Map Construction: Data, Paradigms, and Physics-Awarenes | Xiucheng Wang et.al. | 2603.17499 | null |
| 2026-03-18 | SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning | Xi Ye et.al. | 2603.17426 | null |
| 2026-03-18 | Joint Degradation-Aware Arbitrary-Scale Super-Resolution for Variable-Rate Extreme Image Compression | Xinning Chai et.al. | 2603.17408 | null |
| 2026-03-18 | Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion | Rui Hong et.al. | 2603.17398 | null |
| 2026-03-18 | Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis | Rui Hong et.al. | 2603.17388 | null |
| 2026-03-17 | V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising | Han Lin et.al. | 2603.16792 | null |
| 2026-03-17 | Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation | Chenggong Hu et.al. | 2603.16747 | null |
| 2026-03-17 | World Reconstruction From Inconsistent Views | Lukas Höllein et.al. | 2603.16736 | null |
| 2026-03-17 | Self-Aware Markov Models for Discrete Reasoning | Gregor Kornhardt et.al. | 2603.16661 | null |
| 2026-03-17 | Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration | Amirhossein Kazerouni et.al. | 2603.16570 | null |
| 2026-03-17 | Robust Physics-Guided Diffusion for Full-Waveform Inversion | Jishen Peng et.al. | 2603.16393 | null |
| 2026-03-17 | Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy | Adrien Jacquet Crétides et.al. | 2603.16368 | null |
| 2026-03-17 | $D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation | Ruizhi Wang et.al. | 2603.16362 | null |
| 2026-03-17 | Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation | Xinhao Cai et.al. | 2603.16340 | null |
| 2026-03-17 | Probabilistic reconstruction of global sea surface temperature using generative diffusion models | Haijie Li et.al. | 2603.16272 | null |
| 2026-03-17 | VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment | Tengjiao Yin et.al. | 2603.16271 | null |
| 2026-03-17 | Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation | Yiming Huang et.al. | 2603.16211 | null |
| 2026-03-17 | Physics-guided diffusion models for inverse design of disordered metamaterials | Ziyuan Xie et.al. | 2603.16209 | null |
| 2026-03-17 | S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight | Haodong Yan et.al. | 2603.16195 | null |
| 2026-03-17 | When Generative Augmentation Hurts: A Benchmark Study of GAN and Diffusion Models for Bias Correction in AI Classification Systems | Shesh Narayan Gupta et.al. | 2603.16134 | null |
(<a href=#updated-on-20260404>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-04-02 | AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models | Yunge Wen et.al. | 2604.01650 | null |
| 2026-03-31 | OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation | Yuheng Liu et.al. | 2603.30045 | null |
| 2026-03-31 | From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety | Ganen Sethupathy et.al. | 2603.29777 | null |
| 2026-03-31 | $R_\text{dm}$ : Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation | Linqian Fan et.al. | 2603.28460 | null |
| 2026-03-28 | Fair Benchmarking of Emerging One-Step Generative Models Against Multistep Diffusion and Flow Models | Advaith Ravishankar et.al. | 2603.14186 | null |
| 2026-03-27 | LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis | Stanislaw Szymanowicz et.al. | 2603.20176 | null |
| 2026-03-23 | DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution | Zhengyao Lv et.al. | 2603.22271 | null |
| 2026-03-23 | Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation | Yuyang You et.al. | 2603.21864 | null |
| 2026-03-22 | Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control | Grayson Lee et.al. | 2603.13733 | null |
| 2026-03-21 | Smart Operation Theatre: An AI-based System for Surgical Gauze Counting | Saraf Krish et.al. | 2603.20752 | null |
| 2026-03-19 | cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization | Yuyang Liu et.al. | 2603.19163 | null |
| 2026-03-19 | Training-Free Sparse Attention for Fast Video Generation via Offline Layer-Wise Sparsity Profiling and Online Bidirectional Co-Clustering | Jiayi Luo et.al. | 2603.18636 | null |
| 2026-03-18 | Fast Beam-Brainstorm: Few-Step Generative Site-Specific Beamforming with Flexible Probing | Zihao Zhou et.al. | 2603.17622 | null |
| 2026-03-18 | Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion | Rui Hong et.al. | 2603.17398 | null |
| 2026-03-17 | Unlearning for One-Step Generative Models via Unbalanced Optimal Transport | Hyundo Choi et.al. | 2603.16489 | null |
| 2026-03-16 | GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution | Qiaosi Yi et.al. | 2603.16769 | null |
| 2026-03-16 | Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces | Zilan Cheng et.al. | 2603.14798 | null |
| 2026-03-15 | GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies | He Zhang et.al. | 2603.14245 | null |
| 2026-03-12 | Sinkhorn-Drifting Generative Models | Ping He et.al. | 2603.12366 | null |
| 2026-03-12 | FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance | Quanhao Li et.al. | 2603.12146 | null |
| 2026-03-12 | InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model | InSpatio Team et.al. | 2603.11911 | null |
| 2026-03-11 | Auroral Acceleration Generates Electron Beams in Jupiter’s Middle Magnetosphere | June Piasecki et.al. | 2603.10760 | null |
| 2026-03-11 | Riemannian MeanFlow for One-Step Generation on Manifolds | Zichen Zhong et.al. | 2603.10718 | null |
| 2026-03-11 | AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow | Duojia Li et.al. | 2603.10701 | null |
| 2026-03-10 | FrameDiT: Diffusion Transformer with Frame-Level Matrix Attention for Efficient Video Generation | Minh Khoa Le et.al. | 2603.09721 | null |
| 2026-03-09 | WaDi: Weight Direction-aware Distillation for One-step Image Synthesis | Lei Wang et.al. | 2603.08258 | null |
| 2026-03-08 | TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward | Yihong Luo et.al. | 2603.07700 | null |
(<a href=#updated-on-20260404>back to top</a>)
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2026-03-27 | FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation | Dong Liu et.al. | 2505.20353 | null |
| 2026-03-13 | AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation | Xuanhua Yin et.al. | 2603.12575 | null |
| 2026-03-05 | Frequency-Aware Error-Bounded Caching for Accelerating Diffusion Transformers | Guandong Li et.al. | 2603.05315 | null |
| 2026-02-28 | Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization | Tong Shao et.al. | 2512.23258 | null |
| 2026-02-28 | BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching | Hanshuai Cui et.al. | 2509.13789 | null |
| 2026-02-13 | ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration | Fanpu Cao et.al. | 2512.17298 | null |
| 2026-02-11 | SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices | Dongting Hu et.al. | 2601.08303 | null |
| 2026-01-28 | StreamFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs | Jiacheng Yang et.al. | 2601.20273 | null |
| 2026-01-15 | TetriServe: Efficient DiT Serving for Heterogeneous Image Generation | Runyu Lu et.al. | 2510.01565 | null |
| 2026-01-09 | Sprint: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers | Dogyun Park et.al. | 2510.21986 | null |
| 2025-12-30 | Bidirectional Sparse Attention for Faster Video Diffusion Training | Chenlu Zhan et.al. | 2509.01085 | null |
| 2025-12-16 | OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration | Ruitong Sun et.al. | 2512.14096 | null |
| 2025-09-23 | Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark | Siu Hang Ho et.al. | 2509.17894 | null |
| 2025-08-26 | Direction Informed Trees (DIT*): Optimal Path Planning via Direction Filter and Direction Cost Heuristic | Liding Zhang et.al. | 2508.19168 | null |
| 2025-05-16 | Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration | Haipeng Fang et.al. | 2505.11707 | null |
(<a href=#updated-on-20260404>back to top</a>)
Notes:
sorting rule of the above table to prioritize papers based on the time of their latest update rather than their initial publication date. If an article has been recently modified, it will appear earlier in the list.Function added: