每日简报

2026-05-25

← 历史归档

Lum1104/Understand-Anything

TypeScript · ★ 29,430 · 🍴 2,446 · 📈 5,625 stars today

Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini CLI, and more.

中文介绍 将任意代码自动转换为可交互的知识图谱,用户可探索、搜索并提问,兼容 Claude Code、Codex、Cursor 等多种 AI 编码工具,帮助开发者快速理解复杂代码结构,提升开发效率和协作能力。

anthropics/knowledge-work-plugins

Python · ★ 14,678 · 🍴 1,799 · 📈 1,448 stars today

Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork

中文介绍 Anthropic 开源的插件库,专为知识工作者设计,在 Claude Cowork 环境中使用,扩展文档管理、数据协作等功能,简化研究、编辑等任务流程,提升工作效率。

rohitg00/ai-engineering-from-scratch

Python · ★ 17,840 · 🍴 3,049 · 📈 3,167 stars today

Learn it. Build it. Ship it for others.

中文介绍 提供从零开始的 AI 工程学习资源,通过实战项目引导,涵盖模型构建和发布全过程,帮助开发者系统掌握 AI 工程技能,适合初学者和希望深化实践者。

affaan-m/ECC

JavaScript · ★ 191,765 · 🍴 29,683 · 📈 2,052 stars today

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

中文介绍 AI 代理性能优化系统,集成技能、本能、记忆和安全模块,采用研究优先开发方法,适用于 Claude Code、Codex 等编码环境,提升代理响应速度和可靠性。

mukul975/Anthropic-Cybersecurity-Skills

Python · ★ 8,864 · 🍴 1,099 · 📈 999 stars today

754 structured cybersecurity skills for AI agents · Mapped to 5 frameworks: MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND & NIST AI RMF · agentskills.io standard · Works with Claude Code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI & 20+ platforms · 26 security domains · Apache 2.0

中文介绍 为 AI 代理提供 754 个结构化网络安全技能,映射到 MITRE ATT&CK、NIST CSF 2.0 等五大框架,遵循 agentskills.io 标准,兼容 Claude Code 和 GitHub Copilot,增强安全能力。

colbymchenry/codegraph

TypeScript · ★ 24,153 · 🍴 1,326 · 📈 3,171 stars today

Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local

中文介绍 预索引的代码知识图谱,为 Claude Code、Codex、Cursor 等 AI 工具优化,减少 token 使用和工具调用,支持 100% 本地运行,加速代码理解和生成过程。

manaflow-ai/cmux

Swift · ★ 19,320 · 🍴 1,462 · 📈 598 stars today

Ghostty-based macOS terminal with vertical tabs and notifications for AI coding agents

中文介绍 基于 Ghostty 的 macOS 终端应用,支持垂直标签和 AI 代理通知,优化 AI 编码工作流管理,提供现代化界面,适用于使用 AI 工具的 macOS 开发者。

multica-ai/andrej-karpathy-skills

★ 153,642 · 🍴 15,741 · 📈 2,753 stars today

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.

中文介绍 提供基于 Andrej Karpathy 观察的 CLAUDE.md 配置文件,用于优化 Claude Code 行为,避免常见 LLM 编码错误,帮助开发者生成更可靠和高质量的代码。

Fincept-Corporation/FinceptTerminal

Python · ★ 23,671 · 🍴 3,265 · 📈 462 stars today

FinceptTerminal is a modern finance application offering advanced market analytics, investment research, and economic data tools, designed for interactive exploration and data-driven decision-making in a user-friendly environment.

中文介绍 现代金融终端应用,集成市场分析、投资研究和经济数据工具,支持交互式探索和数据驱动决策,适用于金融专业人士和投资者进行深度市场分析。

paperless-ngx/paperless-ngx

Python · ★ 41,130 · 🍴 2,731 · 📈 151 stars today

A community-supported supercharged document management system: scan, index and archive all your documents

中文介绍 社区支持的增强型文档管理系统,支持扫描、索引和归档所有文档,集成 OCR 和搜索功能,帮助个人和组织高效管理纸质和电子文档,提升检索效率。

anthropics/claude-cookbooks

Jupyter Notebook · ★ 43,846 · 🍴 5,033 · 📈 108 stars today

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

中文介绍 Anthropic 提供的 Claude 使用示例集合,包含笔记本和食谱,展示创新且高效的应用方法,提供代码示例和最佳实践,帮助用户从基础到高级掌握 Claude 功能。

Leonxlnx/taste-skill

Shell · ★ 19,181 · 🍴 1,633 · 📈 188 stars today

Taste-Skill - gives your AI good taste. stops the AI from generating boring, generic slop

中文介绍 Taste-Skill 技能文件,通过预设规则提升 AI 的审美品味,避免生成平庸、泛泛的内容,适用于内容创作、营销等需要独特风格和高质量输出的场景。

moeru-ai/airi

TypeScript · ★ 39,564 · 🍴 4,016 · 📈 32 stars today

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minecraft, Factorio playing. Web / macOS / Windows supported.

中文介绍 自托管的 AI 伴侣项目,模拟虚拟角色,支持实时语音聊天和游戏集成如 Minecraft,旨在实现高交互性虚拟陪伴体验,适合 AI 爱好者和游戏玩家。

shiyu-coder/Kronos

Python · ★ 25,931 · 🍴 4,507 · 📈 243 stars today

Kronos: A Foundation Model for the Language of Financial Markets

中文介绍 Kronos 是专门针对金融市场语言的基础模型,用于处理金融文本和数据,支持市场分析、预测等任务,帮助研究人员和开发者构建金融 AI 应用。

Axorax/awesome-free-apps

JavaScript · ★ 4,205 · 🍴 221 · 📈 141 stars today

Curated list of the best free apps for PC and mobile

中文介绍 策展的 PC 和移动设备免费应用列表,涵盖办公、创意、娱乐等领域,精选最佳工具并提供详细介绍,帮助用户快速发现和下载优质免费软件。

hardikpandya/stop-slop

★ 4,181 · 🍴 368 · 📈 353 stars today

A skill file for removing AI tells from prose

中文介绍 提供技能文件以移除文本中的 AI 生成痕迹,通过后处理技术优化内容,使其更自然人性化,适用于内容创作者希望产出无 AI 味道的写作场景。

garrytan/gstack

TypeScript · ★ 102,075 · 🍴 15,235 · 📈 600 stars today

Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA

中文介绍 使用 Garry Tan 的 Claude Code 配置方案,包含 23 个意见化工具,模拟 CEO、设计师、工程经理等角色,覆盖开发全流程,优化 AI 辅助编码效率。

How to Build Your First Team of AI Agents Using Claude (Full Course)

@eng_khairallah1 · 58.6K 粉丝 · 1.7M 阅 · 804 赞 · 111 转

Everyone is talking about AI agents. Save this :) Build an agent. Deploy an agent. Agent this. Agent that. But when you actually sit down to build one, you hit a wall. The tutorials assume you already

中文介绍 博主指出当前AI代理教程常假设读者已有基础知识,导致实际构建时遇到障碍。本帖分享使用Claude构建首个AI代理团队的完整课程,涵盖从零开始的构建和部署,帮助新手克服入门难题。

Thoughts on LLMs, May 2026

@nateberkopec · 29.8K 粉丝 · 166.7K 阅 · 500 赞 · 51 转

In May 2026, we are now five months post-Opus-4.5 and the great “Christmas Break Revolution” that saw us all hunkered down in front of our laptops over the new year. I’m now about nine to twelve

中文介绍 2026年5月,博主回顾Opus-4.5发布五个月后的LLM领域变化,描述「圣诞假期革命」期间的技术突破,基于九到十二个月的使用经验,分享对大型语言模型发展的见解和预测。

DeepSeek's 10 trillion USD grand strategy

@bookwormengr · 12.4K 粉丝 · 59.9K 阅 · 512 赞 · 66 转

Have you ever wondered, how DeepSeek may make money, and lot of it? They didn't come up with competitive coding plans like GLM, MoonShot and MiniMax. They don't have multimodal, audio, video models.

中文介绍 博主分析AI公司DeepSeek的潜在盈利路径,指出其缺乏类似GLM、MoonShot的竞争编码计划和多模态模型,探讨如何通过独特策略实现十万亿美元目标,提供商业视角。

The Hermes Agent Memory Guidebook

@KSimback · 17.1K 粉丝 · 40.0K 阅 · 508 赞 · 69 转

TLDR: this is your definitive guide to all things related to memory systems for Hermes Agent. Why create this? Because every week I see new posts or articles describing some new memory tool for Hermes

中文介绍 针对Hermes Agent内存系统工具频繁更新的现象,博主编写权威指南,涵盖所有相关系统和工具,旨在为用户提供一站式参考,解决信息碎片化问题。

The Gemini app becomes more agentic, delivering proactive, 24/7 help

@GeminiApp · 507.0K 粉丝 · 34.7K 阅 · 546 赞 · 42 转

Gemini is becoming a more helpful AI assistant, with an intuitive new UI, proactive daily briefs and Gemini Spark, an agent to help you get things done around the clock. It’s been a banner year for

中文介绍 Gemini应用推出更新,引入直观新UI、主动每日简报和Gemini Spark代理,实现更智能的24/7主动帮助,提升AI助手的实用性和用户体验,向代理化发展。

how i make AI videos (a beginner’s breakdown)

@0xileri · 7.3K 粉丝 · 12.2K 阅 · 533 赞 · 63 转

I’ve been getting a lot of DMs since I started posting AI videos, so I figured I’d just write it all out. Fair warning: I’m still learning too. This is just what’s been working for me. tools

中文介绍 因收到大量私信,博主以初学者视角分享自己制作AI视频的方法和工具,强调仍在学习中,提供实用指南帮助他人入门AI视频创作。

[AINews] All Model Labs are now Agent Labs

a quiet day lets us tie together a few quotes as all model labs become agent labs

中文介绍 AI 新闻显示,所有模型实验室现已转变为代理实验室,表明行业向代理系统的转型趋势。

Google I/O showed how the path for AI-driven science is shifting

During Tuesday’s Google I/O keynote, Demis Hassabis, the CEO of Google DeepMind, proclaimed that we are currently “standing in the foothills of the singularity.” It was a striking statement—the singularity is the theoretical future moment when AI rapidly exceeds human intelligence and dramatically t

中文介绍 在 Google I/O 主题演讲中,DeepMind CEO Demis Hassabis 表示,当前处于奇点前沿,AI 驱动的科学路径正在发生变化。

OpenAI named a Leader in enterprise coding agents by Gartner

OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.

中文介绍 OpenAI 在 2026 年 Gartner 魔力象限中被认定为企业 AI 编码代理领导者,其 Codex 产品在创新和规模化部署方面受到认可。

How Virgin Atlantic ships faster with Codex

How Virgin Atlantic used Codex to ship its revamped mobile app on a fixed holiday travel deadline, reaching near-total unit test coverage and zero P1 defects.

中文介绍 Virgin Atlantic 使用 OpenAI 的 Codex 工具,在固定截止日期前成功发布更新移动应用,达到几乎全覆盖的单元测试和零关键缺陷。

Roundtables: Can AI Learn to Understand the World?

Listen to the session or watch below AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recent developments have brought world models to the forefront of the AI discussion. Watch a conversation with editor in chief Mat Honan, senior AI editor

中文介绍 MIT Tech Review 举办圆桌讨论,探讨 AI 能否理解世界,涉及 AI 公司构建世界模型以超越 LLM 限制的进展。

Giving Agents Computers — Ivan Burazin, Daytona

We chat with Daytona's CEO about their insane 74% MoM Growth, 850K Daily Runs, Bare Metal Sandboxes, RL Evals, and the New Agent Cloud

中文介绍 Latent Space 采访了 Daytona CEO Ivan Burazin,该公司实现 74% 的月度增长,每日运行达 85 万次,并提供裸金属沙盒、RL 评估和新代理云服务。

Scaling creativity in the age of AI

Storytelling is core to humanity’s DNA, stemming from our impulse to express ideals, warnings, hopes, and experiences. Technology has always been woven through the medium and the distribution: from early humans’ innovation of natural pigments and charcoals for cave paintings to literal representatio

中文介绍 MIT Tech Review 讨论 AI 时代如何规模化创造力,强调讲故事作为人性核心,技术与媒介分发的演变。

Anthropic’s Code with Claude showed off coding’s future—whether you like it or not

The vibes were strong at Code with Claude, Anthropic’s two-day event for software developers in London that kicked off on May 19, the same day as Google’s I/O in Palo Alto. (A coincidence, not a flex, Anthropic staffers assured me.) “Who here has shipped a pull request in the last week that was comp

中文介绍 Anthropic 在伦敦举办 Code with Claude 开发者活动,展示编码未来,活动日期与 Google I/O 相同。

AdventHealth advances whole-person care with OpenAI

AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.

中文介绍 AdventHealth 采用 OpenAI 的 ChatGPT for Healthcare,以优化工作流程、减轻行政负担,从而更专注于患者护理。

每日论文 · arXiv cs.CR 最新公告批次

arXiv 通常在美东时间周日至周四 20:00 更新;周末无新公告。当前展示最新可用批次。

A blueprint for constructing 3-pass AKE protocols under commitment-based models

第一作者: Rodrigo Martín Sánchez-Ledesma · 方向: 密码学协议

Abstract:The commitment-based AKE model provides a formal security framework for key exchange protocols that avoid long-term cryptographic material, achieving authentication through a final out-of-band verification of session-derived values. Within this model, secure KA-based and KEM-based protocols were previously constructed via a commitment-based MT compiler, yielding optimized 4-pass protocols. In this work, we show that 3-pass protocols secure under this model exist for both primitives. These protocols are constructed ad hoc, following the core ideas of the commitment-based MT authenticator, and their SK security in the unauthenticated model is proved using the same game-based techniques, achieving bounds of the same form as those previously achieved. The resulting protocols provide one-way authentication in three message exchanges.

论文介绍 本文在基于承诺的认证密钥交换(AKE)模型下,首次证明了安全的3轮协议的存在性。该模型避免了长期密码材料的依赖,通过带外验证实现认证。研究者为基于密钥协商(KA)和密钥封装机制(KEM)的原语设计了专门的3轮协议,并证明了其安全性。这些协议在三次消息交换中即可实现单向认证,为设计更高效的密钥交换方案提供了新蓝图。

Validating Threat Modeling Results with the Help of Vulnerable Test Applications

第一作者: Oleksandr Adamov · 方向: 软件安全

Abstract:Validating threat modeling results remains difficult because completeness is hard to judge without an external oracle. Existing studies often rely on expert-produced reference models and other human baselines, but these can contain omissions or disagreements. This paper evaluates a complementary, vulnerability-grounded validation approach. We apply threat modeling to intentionally vulnerable applications with a known vulnerability set to measure the number of related vulnerabilities that can be discovered. We compare ThreMoLIA, an LLM-assisted threat modeling solution developed by our team, with the Microsoft Threat Modeling Tool (MTMT) across two vulnerable applications: AzureGoat and the Vulnerable Bank Application (VulnBank). The inputs to both tools are limited to architecture, data flow diagrams, and their descriptions. The results show that ThreMoLIA achieved higher vulner

论文介绍 验证威胁建模结果的完整性是一个难题。本文提出一种基于已知漏洞集合的验证方法,通过将威胁建模应用于含有已知漏洞的脆弱应用来评估其发现能力。研究对比了LLM辅助的威胁建模方案ThreMoLIA与微软威胁建模工具,结果显示该方法在发现相关漏洞方面更具优势,为评估威胁建模工具的有效性提供了实证途径。

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

第一作者: Matthias Cosler · 方向: 密码学协议

Abstract:Tools like Tamarin and ProVerif have achieved notable success in analyzing and verifying complex real-world protocols such as EMV, 5G, and WPA2, even detecting zero-day exploits. Despite these successes, verifying such protocols remains a time-consuming, challenging task, often requiring significant human effort and expertise. In this paper, we present a reinforcement learning (RL) framework inspired by AlphaZero and AlphaProof that implements a new style of proof search for Tamarin. We have developed a stateless API for Tamarin that acts as a classical RL environment. We guide a Monte Carlo Tree Search (MCTS) by a neural heuristic that learns from completed subproofs. We evaluate our framework on 16 case studies, ranging from classical protocol models to challenging state-of-the-art protocol models from recent publications. Our method finds more proofs automatically than Tamari

论文介绍 使用Tamarin等工具验证复杂安全协议需要大量人力。本文提出一个受AlphaZero启发的强化学习框架,用于自动化Tamarin的证明搜索过程。该框架将证明搜索建模为强化学习问题,利用蒙特卡洛树搜索和神经启发式函数。实验表明,该方法在多个案例上实现了更高的自动证明成功率,并生成了更短的证明,有望降低协议验证的门槛和成本。

Kernel-Based ReLU Approximation for Homomorphic Encryption-Compatible Privacy-preserving Deep Learning Models

第一作者: Dimitrios Sygletos · 方向: 密码学协议

Abstract:As privacy concerns in AI technologies continue to grow, Homomorphic Encryption (HE) offers a way to perform computations on encrypted data without the need of decryption during operations. However, HE is limited to addition and multiplication, making non-linear functions incompatible in their original form. This limitation has become more critical with the widespread use of Large Language Models (LLMs), where the non-linearity of activation functions such as the Rectified Linear Unit (ReLU) poses challenges for deployment in privacy-preserving Natural Language Processing (NLP) settings. This paper proposes a kernel-based approximation of ReLU, enabling its use within HE-constrained settings and thus contributing a critical step toward supporting privacy-preserving LLMs. A smooth kernel-based function, mimicking ReLU, is approximated using a second-degree polynomial, inspired by

论文介绍 同态加密(HE)允许在加密数据上计算,但仅支持加法和乘法,与深度神经网络中的非线性激活函数(如ReLU)不兼容。本文提出一种基于核函数的ReLU近似方法,使用二次多项式进行光滑逼近,使其能够在HE约束下运行。该工作为在隐私保护环境下部署大型语言模型(LLM)中的非线性层提供了关键技术支持。

CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference

第一作者: Guanlong Wu · 方向: AI 安全

Abstract:Large Language Models (LLMs) rely on Key-Value (KV) caching to accelerate inference, and many serving systems further share the KV cache across users' requests to reduce redundant computation. While widely adopted, unrestricted cross-user sharing introduces side-channel vulnerabilities, allowing an adversary to infer user inputs by probing for cache reuse. Existing defenses disable sharing entirely to prevent leakage; yet such a coarse-grained strategy sacrifices substantial reuse potential, since prompts often include large portions of privacy-irrelevant segments, such as system instructions or publicly accessible materials. Building on this, we present CachePrune, a privacy-aware KV cache sharing mechanism that enables fine-grained reuse of KV entries across requests. Realizing such fine granularity requires token-level cache management, as reusable segments vary in length and

论文介绍 大型语言模型(LLM)服务系统常共享KV缓存以提升效率,但这引入了侧信道漏洞,攻击者可通过探测缓存复用推断用户输入。现有防御措施完全禁用共享以避免泄露,但牺牲了复用潜力。本文提出CachePrune,一种隐私感知的细粒度KV缓存共享机制,允许跨请求复用与隐私无关的缓存部分,在保护隐私的同时提升系统效率。

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

第一作者: Ahmed Sabbah · 方向: AI 安全

Abstract:We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static and dynamic feature representations extracted from emulator and real-device executions. The dataset is organized into yearly slices and evaluated under three deployment protocols that emulate realistic learning scenarios: (1) same-year training and testing, (2) cross-year deployment without model updates, and (3) expanding-window retraining with cumulative historical data. Across multiple classifier families, adversarial examples are generated using FGSM and SPSA under feasibility constraints. We measure clean performance, Adversarial Accuracy (AA), Attack Success Rate (ASR), and introduce temporal linkage metrics -- RobustDrop, $\Delta$ASR, and Adversarial Amplification Factor (AAF) -- to quantify the relationship between distribution shift

论文介绍 本文对安卓恶意软件检测模型在超过十年数据上的对抗鲁棒性进行了纵向研究。评估模拟了三种现实部署场景,并引入了衡量分布漂移与鲁棒性关系的时序指标。研究发现,在时间概念漂移下,模型的对抗脆弱性会发生显著变化,强调了考虑数据分布动态性对于安全部署检测模型的重要性。

When Youth Enter the Algorithmic Wild: Discovering and Understanding Potentially Harmful Teen Videos on Douyin and Kwai

第一作者: Shaoxuan Zhou · 方向: 安全研究

Abstract:Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging: recommendation feeds are personalized black boxes, harmful content employs sophisticated evasion tactics, and naive crawlers fail to replicate authentic teen behavior. To bridge this gap, we propose PHTV-Scout, the first large-scale, behaviorally grounded measurement framework for Potentially Harmful Teen Videos (PHTVs). We integrate an offline survey of 683 adolescents with a tri-module online pipeline: (1) PHTV Hunter simulates teen accounts to collect recommendation feeds; (2) PHTV Arbiter, a LoRA-finetuned multimodal classifier, detects PHTVs with 94.29% accur

论文介绍 短视频平台可能通过算法向青少年推荐有害内容。本文提出首个大规模、基于行为的测量框架PHTV-Scout,用于发现和分析此类视频。该框架整合了青少年行为调查与在线爬取、检测流水线,能有效模拟真实用户行为并识别有害内容,为研究和治理平台算法风险提供了新工具。

AI Security Research Should Better Incentivize Defense Research

第一作者: Youqian Zhang · 方向: AI 安全

Abstract:This work examines an imbalance in artificial intelligence (AI) security research: the field tends to produce more work on attacking AI systems than on defending them. Drawing on related academic papers, we find biased attack-to-defense ratios across subfields, including federated learning, speech recognition, membership inference, large language models, etc. The imbalance possibly means far beyond a simple count: attack papers are routinely evaluated under favorable conditions that make threats look more severe than they are in practice, while defenses are held to a stricter standard that few can meet. The result is a literature rich in demonstrated vulnerabilities and thin on usable and deployed protections. We thus argue that AI security research should better incentivize defense research.

论文介绍 本文指出当前人工智能安全研究存在攻防失衡的问题:攻击性研究多于防御性研究,且评估标准不一致。攻击论文常在有利条件下评估以放大威胁,而防御论文则面临更严苛的标准。这种不平衡导致文献中漏洞展示多、可用防御少。作者主张应更好地激励防御性研究,以构建更全面的AI安全生态。

Security, Privacy, and Ethical Risks in OpenClaw

第一作者: Yutong Jin · 方向: 系统安全

Abstract:This paper systematically investigates the security, privacy, and ethical risks, as well as the traceability challenges of OpenClaw, a locally executable AI agent system for natural language interaction and real-world task completion. While OpenClaw shows strong potential for personal assistance, office automation, cross-platform task management, and information integration, it also raises serious security, privacy, and ethical concerns. By analyzing its system architecture, core functionalities, deployment model, and representative application scenarios, this paper aims to reveal the risks that may arise when such a highly privileged agent is integrated into personal and organizational digital environments. We focus in particular on the challenges associated with persistent local storage, tool invocation, cross-context information aggregation, multi-user interaction, and the in

论文介绍 本文系统研究了本地运行的AI智能体系统OpenClaw的安全、隐私、伦理风险及溯源挑战。尽管其在个人助理、办公自动化等方面有潜力,但深度集成到数字环境也带来了严重关切。分析聚焦于其持久化存储、工具调用、跨上下文信息聚合及多用户交互等功能所带来的潜在风险。

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

第一作者: Vivek Dahiya · 方向: 网络安全

Abstract:We evaluate whether frontier LLMs are ready for cybersecurity through a dual-mode benchmark: white-box function-level vulnerability detection (VulnLLM-R, across C/Java/Python) and black-box web application security testing (five production-style applications with 118 ground-truth vulnerabilities across 20+ CWE families, which we will open-source). We test six frontier models (GPT-5.4, Codex~5.3, Claude Opus~4.6, Sonnet~4.6, Gemini~3.1~Pro and Gemini~3~Flash) and two domain-specialized models across four testing paradigms. Our findings are sobering: (1)~every frontier model produces 10-50% false positive rates in white-box detection, systematically over-predicting vulnerabilities; (2)~in black-box testing, frontier models achieve only 4-8% ground-truth coverage, improving to just 10-19% even with external security tools (Playwright MCP, Burp Suite MCP); (3)~structured penetration

论文介绍 本文通过双模式基准测试评估前沿LLM在网络安全任务中的能力:白盒漏洞检测和黑盒Web应用安全测试。测试涵盖多个前沿模型。结果令人警醒:LLM在白盒检测中存在高误报率;在黑盒测试中,即使借助外部工具,发现真实漏洞的覆盖率也很低。这表明当前LLM尚不能胜任复杂的网络安全实战任务。

Prompt Overflow: What the Guardrail Inspects Is Not What the Model Infers

第一作者: Yuanbo Zhou · 方向: AI 安全

Abstract:Guardrail models (a.k.a. safety checkers) are widely deployed to screen user inputs before they reach large language models (LLMs), serving as a primary defense against prompt injection attacks. Due to strict context constraints, these models handle overlength prompts through truncation or segmentation-based inspection. While prior work has focused on semantic adversarial inputs, the security implications of these long-input processing mechanisms remain largely unexplored. In this paper, we identify a critical blind spot arising from the mismatch between the limited inspection windows of guardrail models and the substantially larger context inference windows of downstream LLMs. We introduce a novel Prompt Overflow Attack, which exploits this mismatch by fragmenting malicious instructions and interleaving them with benign filler content across an overlong prompt, such that no ind

论文介绍 护栏模型常用于筛选用户输入,以防止针对LLM的提示注入攻击。但这些模型因上下文长度限制会对长提示进行截断。本文发现了一个关键盲点:护栏模型的检查窗口与下游LLM的推理窗口不匹配。攻击者可利用此不匹配,将恶意指令分散在长文本中,从而绕过检测,揭示了现有防御机制的一个新漏洞。

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

第一作者: Kieu Dang · 方向: AI 安全

Abstract:Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but existing methods often struggle with semantic distortion, factual inconsistency, and adversarial attacks. In addition, key-conditioned watermarks for provider-specific detection, especially in cross-provider and multi-user scenarios, remain largely underexplored. To address these challenges, we propose SAFESEAL, a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility, effectively balancing detectability, utility, and robustness. SAFESEAL preserves named entities while substituting linguistic terms with context-aware synonyms through a key-con

论文介绍 为保护大型语言模型的知识产权,水印技术是一种潜在的防御手段,但现有方法常在语义失真、事实不一致和抗攻击性方面存在不足。本文提出SAFESEAL,一种密钥控制的水印框架,旨在实现强可检测性的同时,最小化对模型实用性的影响,有效平衡了检测能力、实用性和鲁棒性。

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

第一作者: Luze Sun · 方向: 安全研究

Abstract:When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere. We introduce PoisonForge, a benchmark that parameterizes this threat along four dimensions (bias type, poisoning mode, appearance count, and target output length) and evaluates 12 open-weight models (from 2B to 32B parameters) across five families under a primarily 1% poison budget. With only 10 poisoned examples among 1,000 fine-tuning examples, 11 of 12 models exceed a 70% attack success rate (ASR) in their most vulnerable configuration. Meanwhile, unintended leakage to non-target tasks remains below 0.5%, and models perform well on sta

论文介绍 在不可信数据集上微调LLM时,攻击者可通过任务级数据投毒植入后门。本文提出PoisonForge基准,从多个维度参数化这一威胁,并评估了多个开源模型。结果表明,仅需极少量投毒样本,就能在目标任务上实现高攻击成功率,同时对非目标任务影响极小,揭示了指令微调流程中的严重供应链安全风险。

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

第一作者: Mingyuan Fan · 方向: 隐私保护

Abstract:The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we introduce ActInv, which solves an intermediate activation matching problem to reconstruct the client's input. Extensive evaluations demonstrate that ActInv achieves high-fidelity reconstructions, even in the presence of common perturbation-based defenses such as Gaussian noise injection and activation sparsification. To systematically understand this vulnerability, we develop Perturbation Amplification Factor (PAF), a metric for qua

论文介绍 分割推理将LLM划分为客户端和服务器端以增强隐私,但其隐私保护能力未经充分研究。本文提出ActInv方法,通过解决一个中间激活匹配问题来重建客户端输入。评估表明,即使存在高斯噪声注入等常见扰动防御,ActInv仍能实现高保真重建,并量化了扰动的影响,揭示了分割推理范式的隐私风险。

Encrypted Neural Networks without Overflows

第一作者: Philipp Kern · 方向: 密码学协议

Abstract:Fully homomorphic encryption (FHE) enables private inference by evaluating neural networks on encrypted data. In this way, we can delegate the computation to a third party server without ever revealing the user's data. Currently, the CKKS scheme is the backbone of most efficient FHE implementations, but it only supports addition, multiplication, and array rotation operations, thus requiring all activation functions of the neural network to be approximated by polynomials within a certain interval, imposing strict design tolerances. In this paper, we demonstrate for the first time that this scheme is vulnerable to overflow attacks, i.e., seemingly benign inputs that can exceed such tolerances of the FHE circuit, thereby causing corrupt and unusable outputs. To avoid them, we propose a formal verification technique that computes certified bounds on the ranges of all neurons in the

论文介绍 使用CKKS方案的同态加密是隐私计算的主力,但本文首次揭示其存在溢出攻击漏洞:看似无害的输入可能超出多项式近似区间,导致输出错误。为防范此攻击,研究提出了一种形式化验证技术,可计算神经网络中所有神经元输出的认证范围,从而确保加密推理的正确性。

BYOT-CPS: A Hybrid Cyber-Physical Systems Testbed for IoT Security Assessment and Platform Evaluation

第一作者: Yan Lin Aung · 方向: 系统安全

Abstract:Internet of Things (IoT) security research continues to face a methodological gap between scalable virtual experimentation and realistic device behaviour. While pure simulation and emulation platforms provide control, repeatability, and scale, they do not fully reproduce firmware-specific behaviours, hardware characteristics, and vendor implementation weaknesses that frequently determine real-world exploitability. Conversely, physicalonly testbeds provide realism but are costly to assemble, difficult to reconfigure, and hard to replicate across institutions. This paper presents Build Your Own Cyber-Physical Systems Testbed (BYOT-CPS), a hybrid cyber-physical testbed that connects real IoT devices to virtualised network infrastructure built on GNS3. BYOT-CPS is designed to support security experimentation, education, and independent evaluation of commercial IoT security platforms

论文介绍 物联网安全研究在可扩展虚拟实验与真实设备行为之间存在方法鸿沟。本文提出BYOT-CPS,一种混合网络物理系统测试床,将真实物联网设备连接到基于GNS3构建的虚拟化网络基础设施。该测试床旨在支持安全实验、教学以及对商用物联网安全平台的独立评估,兼顾了真实性与可扩展性。

Botnet Detection on CTU-13 Using Lightweight Machine Learning Models

第一作者: Subhash Gurappa · 方向: AI 安全

Abstract:Botnets are among the most persistent cyber threats, enabling large-scale attacks such as spam, credential theft, and distributed denial-of-service (DDoS). While deep learning approaches have recently been applied to botnet detection, they are computationally intensive and often lack interpretability. We present a comparative study of lightweight machine learning models including Logistic Regression, Decision Tree, and Random Forest on the CTU-13 dataset, a benchmark for botnet traffic analysis. We extract interpretable flow-based features and evaluate each model on detection accuracy, precision, recall, F1 score, and feature importance. Results demonstrate that lightweight models can achieve competitive detection performance with minimal computational cost, while also offering interpretability critical for forensic investigation. On CTU-13, our Random Forest achieves a PR-AUC o

论文介绍 深度学习用于僵尸网络检测存在计算量大、缺乏可解释性的问题。本文在CTU-13基准数据集上,比较了逻辑回归、决策树和随机森林等轻量级机器学习模型。结果表明,轻量级模型能以极低的计算成本达到有竞争力的检测性能,并提供关键的可解释性,更适用于取证调查场景。

Beyond Zero: Enterprise Security for the AI Era

第一作者: Joseph Valente · 方向: 安全研究

Abstract:The rise of autonomous AI agents and the accelerating velocity of corporate data access are stretching the application-centric model of zero trust security to its breaking point. This paper introduces Beyond Zero, a new security paradigm designed for the AI era. The Beyond Zero architecture performs per-resource and method access decisions for humans and agents at machine speed. By shrinking the trust boundary from the application level to the individual action, and by coupling static authorization guarantees with dynamic, AI-driven reasoning, Beyond Zero enables a self-defending enterprise capable of mediating thousands of human and machine decisions per second. This paper outlines Google's vision for the future of this access model as well a call for industry collaboration and standards development.

论文介绍 随着自主AI智能体的兴起,以应用为中心的零信任安全模型正面临挑战。本文提出「Beyond Zero」新范式,将信任边界从应用层收缩到单个动作,并结合静态授权与AI驱动的动态推理。该架构旨在为人类和机器智能体提供每秒数千次决策的机器速度访问控制,构建能自我防御的企业。

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

第一作者: Tanzim Ahad · 方向: 系统安全

Abstract:Multi-agent AI pipelines typically assume that agent misconduct originates from model misalignment. We identify a structural failure in this assumption, the \emph{Misattribution Gap}, where memory-layer attacks produce behaviors indistinguishable from model failure, causing defenders to apply the wrong remediation. We formalize \emph{Semantic Norm Drift} (SND) as a third path to agent misconduct, distinct from emergent misalignment and collusion. In SND, a policy-formatted document enters a shared vector store through normal uploads and later reappears as trusted system context after provenance is lost through a Trust Laundering Chain. Across 64 documented failures, attribution systems consistently blamed the model. Four safety classifiers, including one trained on memory poisoning, produced zero detections across 510 checkpoints. In 59 of 65 valid cases, agents explicitly cited

论文介绍 多智能体AI系统常假设智能体的不当行为源于模型错位。本文揭示了一个结构性失败「归因鸿沟」,即记忆层攻击导致的行为与模型失败无法区分,致使防御者采用错误的修复措施。研究定义了「语义规范漂移」作为一种新的智能体不当行为路径,并证明现有安全分类器对此类攻击的检测能力几乎为零。

CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

第一作者: Joydeep Chandra · 方向: AI 安全

Abstract:Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, a three-layer architecture providing a unified treatment of these challenges with explicit public and private separation. Layer one applies neural-ODE temporal decay to shortcut edges, providing a per-query expected recall-loss bound of Big-O of Pq lambda delta t, with a monotone-envelope guarantee reducing bound looseness to 1.8 to 3.2 times observed loss. Layer two conditions Shapley valuation on detected changepoints and provides finite-sample error guarantees under noise. Layer three uses EXP3-IX to achieve Big-O of the square root of T log T regret while enfor

论文介绍 时序知识图谱数据市场的静态设计存在索引过时、定价错误和隐私预算滥用三重失败。本文提出CHRONOS,一个三层架构,统一处理这些挑战。它使用神经常微分方程更新索引、基于变点对Shapley价值进行定价,并利用算法协调智能体对共享差分隐私预算的消耗,以支持动态数据市场的安全运营。

On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

第一作者: Aratrika Mustafi · 方向: 隐私保护

Abstract:Gradient-flow sampling interprets a Gibbs distribution as the minimizer of an energy functional over probability measures and generates dynamics converging to this target. Under spherical Hellinger-Kantorovich (SHK) geometry, the flow couples transport and reaction and coincides with birth-death Langevin dynamics. In this work, we develop a perturbation theory for SHK gradient flows. For two potentials $V$ and $V^{\prime}$, we compare the associated flows from a common initialization and quantify how potential discrepancies propagate over time. A uniform perturbation bound yields dimension-free, pointwise control of the log-likelihood ratio and Rényi divergence, while additional structure allows us to derive bounds for the KL divergence as well. We apply these results to approximate sampling for the exponential mechanism in differential privacy. The likelihood-ratio control prov

论文介绍 本文发展了球面Hellinger-Kantorovich梯度流的扰动理论,比较了从相同初始分布出发、由不同势函数驱动的流,并量化了势函数差异随时间的传播。该理论提供了无维度、逐点的对数似然比和Rényi散度控制。这些结果被应用于差分隐私中指数机制的近似采样,证明了似然比的受控性。

Communication Security and Sensing Privacy in FMCW-Based ISAC Through Signal Modulation

第一作者: Murat Temiz · 方向: 系统安全

Abstract:This study proposes a novel radar-centric signaling design and architecture for secure integrated sensing and communication (ISAC) systems. The proposed framework is designed to provide robust physical layer security for data transmission while simultaneously enhancing sensing privacy. It employs index modulation and phase coding over frequency-modulated continuous-wave radar (FMCW) chirps, where index modulation (IM) provides an outer layer of data security, and we explicitly design the phase coding (PC) to perturb the resulting signal's ambiguity function (AF) to enhance sensing privacy. This design reduces the risk of unauthorized surveillance by rendering target velocity estimation practically infeasible for unauthorized passive sensing hardware (i.e., a sensing eavesdropper, S-Eve) and significantly impairing its range estimation capabilities. Furthermore, this study also p

论文介绍 本文为基于调频连续波(FMCW)雷达的集成感知与通信(ISAC)系统提出了一种新的信号设计。通过结合索引调制和相位编码,在提供数据传输物理层安全的同时,通过扰动信号模糊函数来增强感知隐私,使未授权的被动感知硬件难以估计目标速度并严重损害其测距能力。

Sample-wise Targeted Adversarial Attacks on Test-time Adaptation

第一作者: Phuc Duc Nguyen · 方向: AI 安全

Abstract:Test-time adaptation (TTA) effectively counters distribution shifts but exposes models to adversarial manipulation via the unlabeled test stream. Existing class-wise targeted attacks remain impractical for stealthy exploitation in this setting: since TTA operates on batches, forcing a subset of samples toward a target label unintentionally pulls similar benign samples along, resulting in a conspicuously high frequency of the target label that is easy to detect. To capture a more realistic threat, we introduce a sample-wise targeted attack. Unlike prior approaches, the attacker aims to misclassify only inputs carrying an attacker-chosen trigger, while preserving the global label distribution of benign queries to evade detection. To achieve this, we propose a meta-learning-based attack with a novel priority-aware gradient alignment strategy that explicitly prioritizes attack succe

论文介绍 测试时自适应(TTA)能应对分布漂移,但也使其容易受到对抗操纵。现有类级针对性攻击在批量处理中易被检测。本文提出更隐蔽的样本级针对性攻击:仅误分类带有特定触发器的样本,同时保持全局标签分布不变以规避检测。为此设计了基于元学习的攻击方法,实验验证了其有效性。

Formal Verification of Probing Security via Conditional Independence

第一作者: Satoshi Kura · 方向: 系统安全

Abstract:Side-channel attacks are a major threat to the security of cryptosystems. Masking is a widely used countermeasure against such attacks, but proving the security of masked algorithms is error-prone without formal verification. In this work, we propose a novel approach to formal verification of noninterference properties of masked algorithms based on probabilistic separation logic. By establishing a connection between noninterference and conditional independence, we show how noninterference can be verified using Lilac, a separation logic for conditional independence. We also provide several proof rules that facilitate the verification of probing security and demonstrate their application to example algorithms.

论文介绍 掩码是防御侧信道攻击的常用方法,但其安全性证明易出错。本文提出一种基于概率分离逻辑的形式化验证新方法,通过建立非干扰性与条件独立性之间的联系,利用Lilac逻辑来验证掩码算法的非干扰属性,并提供了证明规则以简化过程。

On APN Exponents and the Differential and Boomerang Properties of Binomials in Characteristic 3

第一作者: Namhun Koo · 方向: 安全研究

Abstract:Recent studies on binomials of the form $F_r(x) = x^r(1 + \chi(x))$ over $\mathbb{F}_{p^n}$ have shown that these functions can exhibit very low boomerang uniformity. In this paper, we focus on the specific behavior of such binomials in characteristic $3$, where instances of extremely low boomerang uniformity-namely $0$ or $1$-seem to arise more frequently than in other characteristics. First, we provide a systematic analysis of Almost Perfect Nonlinear (APN) power functions in characteristic $3$. We present an explicit parametrization of APN exponents arising from the construction of Zha and Wang and demonstrate through numerical results for $n \le 13$ that this generalized framework accounts for several previously known and sporadic APN instances. Building on this classification, we identify and rigorously prove two classes of binomials $F_r$ that are locally-PN and possess th

论文介绍 本文聚焦于特征为3的有限域上的一类二项式函数,系统分析了几乎完美非线性(APN)幂函数。研究通过显式参数化APN指数,并严格证明了两类局部置换多项式二项式具有极低的回旋一致性(0或1)。这些结果深化了对具有优秀密码学性质函数的理解。

From Preventive to Reactive: How AI Coding Assistants Transform Developers' Security Awareness

第一作者: Faisal Haque Bappy · 方向: 软件安全

Abstract:AI coding assistants are now central to professional software development, yet their impact on how developers think about and practice security remains poorly understood. While prior work has documented vulnerability rates in AI-generated code, a more fundamental question persists: how do these tools transform security awareness in authentic, ongoing development practice? We conducted semi-structured interviews with 15 professional software engineers and observed them completing security-relevant coding tasks with AI assistance, spanning 3 experience cohorts defined by their relationship to AI tools during professional formation. We find that AI coding assistants reorganize rather than eliminate security thinking, shifting it from the act of writing code to the act of reviewing it. This transition from preventive to reactive security is structurally encouraged by interaction mod

论文介绍 AI编程助手如何影响开发者的安全实践尚不明确。本文通过对专业软件工程师的访谈和任务观察发现,AI助手并未消除安全思考,而是将其从编写代码的行为重组到了审查代码的行为。这种从「预防式」到「反应式」安全的转变,由交互模式结构性地促进。

Security of LLM-generated Code: A Comparative Analysis

第一作者: Srivathsan G Morkonda · 方向: 软件安全

Abstract:The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.

论文介绍 大量开发者使用或计划使用AI工具生成代码。本文对七个流行LLM生成的代码进行了安全性实证评估。结果显示,所有被评估的LLM都会生成包含漏洞的代码,且其中大部分为严重或高风险漏洞。这表明当前LLM在代码生成的安全性方面存在显著缺陷。

Intercloud: Eventual Consistency for Decentralised Economies via Chilling-Effect Consensus

第一作者: Gregory Magarshak · 方向: 网络安全

Abstract:We present Intercloud, a decentralised economic network in which streams of private data are secured by Watcher swarms that observe only cryptographic hashes, never plaintext. Intercloud requires no global consensus beyond a single shared random seed per epoch. Two mechanisms provide security: (i) ripple deduplication via epoch-stamped identifiers, preventing any ripple from propagating through the same node twice per epoch, guaranteeing termination without global coordination; and (ii) chilling-effect consensus, in which a swarm reaches finality by attesting to the absence of conflicting evidence rather than voting between alternatives. Any conflicting attestation automatically yields a self-certifying Proof of Corruption. We prove four main results. First, execution ripples terminate in bounded time via the ripple-ID mechanism. Second, a swarm of about 35 Watchers -- assigned

论文介绍 本文提出Intercloud,一个去中心化经济网络,其私有数据流由仅观察哈希值的观察者群保障安全。该网络不需要全局共识,通过两种机制确保安全:基于时间戳标识符的波纹去重保证终止;以及「寒蝉效应共识」,即通过证明无冲突证据来达成最终性,任何冲突证明都自动生成腐败证据。

Leveraging Large Language Models for Sentiment Analysis: Multi-Modal Analysis of Decentraland's MANA Token

第一作者: Xintong Wu · 方向: AI 安全

Abstract:Decentraland, a decentralized virtual reality platform operating within the expanding Metaverse ecosystem, utilizes its native MANA token to facilitate virtual asset transactions and governance. This study investigates the integration of Discord community sentiment with multi-modal financial data to enhance cryptocurrency price prediction within virtual world economies. We address: (1) identifying sentiment patterns within Decentraland's Discord community, and (2) evaluating the impact of multi-modal features on token return forecasting. Using a BERT-based large language model for sentiment analysis, we develop two LSTM architectures: a baseline incorporating historical prices and a multi-modal variant integrating sentiment scores, trading volume, and market capitalization. Results indicate predominantly neutral community sentiment with a positive skew. The multi-modal model sig

论文介绍 本文研究将Decentraland平台Discord社区的情感信号与多模态金融数据相结合,以增强MANA代币的价格预测。使用基于BERT的模型进行情感分析,并构建了基线与多模态两种LSTM架构进行对比。结果表明,社区情感总体中性但偏积极,且融合情感等多模态特征能显著提升预测性能。

Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control

第一作者: Al Bashir · 方向: 策略学习 · 来源: cs.RO

Abstract:This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vis

论文介绍 本文提出一个闭环草莓采摘机器人系统,结合了鲁棒视觉模块、仿真训练的深度强化学习控制和基于ROS的真机执行。视觉模块改进了YOLO26-seg以提升杂乱场景分割精度。控制模块在仿真中训练策略并部署到真实机械臂。该方法减少了硬件依赖,降低了开发成本。

Point Tracking Improves World Action Models

第一作者: Jiarui Guan · 方向: 策略学习 · 来源: cs.RO

Abstract:Robot policy learning benefits from world-action models that capture environment dynamics, but pixel-level prediction entangles dynamics with nuisance factors such as lighting and texture, making learned representations vulnerable to task-irrelevant visual variation. We propose JOPAT, a JOint Pixel-And-Track World-Action Model that predicts latent visual observations, 2D point tracks with visibility, and actions in a single denoising diffusion transformer. The key insight is that tracks provide an explicit representation of motion that captures long-horizon dynamics and remains robust under occlusion or partial out-of-frame motion, offering greater utility than modeling pixel appearance alone. On LIBERO and real-world LeRobot tasks, JOPAT improves over pixel-based baselines, with the largest gains on long-horizon tasks involving occlusion, object interaction, and off-screen moti

论文介绍 机器人策略学习的世界模型若仅建模像素,会受光照、纹理等无关因素干扰。本文提出JOPAT模型,在单一的去噪扩散Transformer中联合预测潜在视觉观测、2D点跟踪和动作。关键洞见是点跟踪提供了显式的运动表示,对遮挡和部分出界运动更鲁棒,比仅建模像素外观更有效。

Instrumentation for Imitation Learning: Enhancing Training Datasets for Clothes Hanger Insertion

第一作者: Remko Proesmans · 方向: 机器人操作 · 来源: cs.RO

Abstract:Large behaviour models have transformed the field of robotic manipulation, but prohibitive data requirements have thus far prevented a revolution similar to vision language models. We believe that instrumentation, i.e. sensor integration in objects, can provide invaluable state information and enable efficient learning for robotic manipulation. In this paper, we present instrumented imitation learning of clothes hanger insertion. Using 180 teleoperated demonstrations, we train diffusion policies with and without access to instrumentation data. Results show that policies leveraging instrumentation outperform vision-only counterparts by 14-25 %pt and exhibit greater task awareness. Crucially, a black-box imitation learning policy learns to prioritise instrumentation signals without explicit guidance. In addition, enhancing the teleoperation dataset with rollouts from an instrument

论文介绍 大规模行为模型面临数据需求高的瓶颈。本文探索通过仪器化(在物体中集成传感器)提供丰富的状态信息,以提升模仿学习的数据效率。以衣架插入任务为例,研究表明,使用仪器化数据训练的扩散策略显著优于仅视觉的策略,并且策略能自行学会优先利用仪器化信号。

SFG-ROS: A Resource-Aware Framework for Dense Multi-Agent Perception

第一作者: Constantin Blessing · 方向: 具身智能 · 来源: cs.RO

Abstract:Deploying heterogeneous multi-agent robot fleets for collaborative perception requires robust data exchange and scalable software architectures. However, standard ROS 2 implementations often suffer from network saturation, namespace collisions, and severe computational overhead when distributing dense sensor streams across devices. To address these bottlenecks, we present SFG-ROS, a resource-aware multi-agent software framework designed for dynamic fleet deployments. SFG-ROS addresses these challenges through three primary contributions. First, schema-driven traffic routing isolates high-frequency intra-agent traffic from the global network using a programmatic fully qualified name schema and targeted Fast DDS routing. Second, an on-demand centralized decoding pipeline automatically offloads high-bandwidth sensor data decompression, eliminating redundant processing across local

论文介绍 部署异构多智能体机器人车队进行协作感知时,ROS 2的标准实现常面临网络拥塞和计算开销问题。本文提出SFG-ROS资源感知框架,通过模式驱动的流量路由、按需的集中式解码流水线以及动态的通信链路规划来解决这些瓶颈,旨在支持动态车队部署下的密集传感器数据流交换。

Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

第一作者: Constant Roux · 方向: 模仿学习 · 来源: cs.RO

Abstract:Imitation Learning from monocular video demonstrations provides a scalable approach for teaching complex skills to humanoid robots. However, translating human motion to humanoids requires overcoming significant morphological mismatches. Standard approaches rely on Geometric Retargeting or Indirect Dynamic Retargeting pipelines. We identify that these intermediate kinematic projections introduce a geometric bias, restricting the search space and yielding suboptimal dynamic behaviors. In this paper, we propose Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. By formulating the problem in the task space and leveraging a sampling-based Model Predictive Control solver within a physics simulator, DDR natively optimizes over complex contact sequences while mitigating input drift

论文介绍 从单目视频学习人形机器人动作需要解决形态不匹配问题。传统方法依赖中间运动学投影,会引入几何偏差。本文提出直接动态重定向(DDR)框架,直接从专家视频生成高保真、动力学可行的轨迹。通过在任务空间建模问题,并利用基于采样的模型预测控制求解器,优化了接触序列,减少了输入漂移。

Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking

第一作者: Ming Yang · 方向: 具身智能 · 来源: cs.RO

Abstract:Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist to a new humanoid embodiment with only a small amount of data and compute. Any2Any first performs kinematic alignment between source and target humanoids, aligning their input and output spaces so that the pretrained source policy can be meaningfully reused on the target embodiment.Any2Any then performs dynamics adaptation by applying lightweight parameter-efficient fine-tuning (PEFT) com

论文介绍 训练人形机器人全身跟踪模型需要大量数据。本文提出Any2Any范式,旨在以少量数据和算力将预训练的全身跟踪专家模型高效迁移到新的人形机器人平台。该方法首先进行运动学对齐以复用源策略,然后通过轻量级的参数高效微调进行动力学适应,实现快速部署。

How Many Training Samples Are Needed for the Inverse Kinematics Solutions by Artificial Neural Networks

第一作者: Dong-Won Lim · 方向: 具身智能 · 来源: cs.RO

Abstract:Inverse Kinematics (IK) plays a critical role in robotic motion planning and control. The IK solutions of a robot manipulator could be done by conventional ways such as geometric, algebraic, or Jacobian methods, which have drawbacks. The Artificial Neural Networks (ANNs) have become a promising alternative for approximating IK solutions due to their generalization ability and computational efficiency. This approach basically trains only a few samples of the end effector that are recorded for the solution of the IK problem. However, a fundamental question remains: how many training samples are sufficient to achieve reliable and accurate IK predictions? This study investigates the mathematical framework of relating the size of training datasets and the accuracy of ANN-based IK solvers. Using an articulated robotic manipulator, we generate varying amounts of joint-position pairs to

论文介绍 人工神经网络是求解机器人逆运动学(IK)问题的有效方法,但需要多少训练样本才能获得可靠预测?本研究建立了训练数据集大小与基于ANN的IK求解器精度之间的数学关系。通过一个关节操作臂生成不同数量的关节-位置对进行训练,探究了样本量对预测精度的影响。

Semantically Structured Mixture-of-Experts for Compositional Robotic Manipulation

第一作者: Chengyu Deng · 方向: 机器人操作 · 来源: cs.RO

Abstract:Diffusion-based policies have established a new standard for precise robotic manipulation but face a critical scalability bottleneck: high-performance models are computationally expensive, while lightweight alternatives often fail to generalize across diverse multi-task environments. Mixture-of-Experts (MoE) architectures offer a promising path to efficiency by activating only a subset of parameters. However, existing MoE routing mechanisms typically rely on low-level noise or latent statistics, ignoring the compositional nature of manipulation tasks. This can fragment reusable behaviors across experts, limiting interpretability and transferability. We introduce Semantically Structured Mixture-of-Experts Diffusion Policy (SMoDP) for compositional robotic manipulation, a framework that grounds expert specialization in semantic task structure. SMoDP leverages a lightweight, infere

论文介绍 扩散策略在机器人操作中表现出色,但存在计算成本高的问题。混合专家(MoE)架构可提升效率,但现有路由机制忽略了任务的组合性。本文提出语义结构化混合专家扩散策略(SMoDP),基于语义任务结构引导专家专业化。该框架使用轻量、可解释的路由器来激活专家,在保持性能的同时提高效率。

Droneulator: A Portable UAV Simulator for Agricultural Workflows with RotorPy and Godot 4

第一作者: Jacob Swindell · 方向: 策略学习 · 来源: cs.RO

Abstract:Agricultural UAV research requires simulators that integrate realistic 3D scenes, high-fidelity vehicle dynamics, and robotics middleware, while remaining practical to deploy across heterogeneous development machines. We present Droneulator, a portable UAV simulator architecture that combines RotorPy for multirotor dynamics with Godot 4 for rendering and sensor generation. Droneulator exposes both PX4-based control and a lightweight WebSocket command path, and publishes synchronised visual and state streams through a Zenoh-based ROS~2-compatible pipeline. This integration enables a single stack to support inspection-oriented data capture, ROS~2/PX4 local planning, and reinforcement learning experiments without modifying the simulator infrastructure. We present quantified validation of the current system across three agricultural UAV workflows: tree-scale image collection for 3D

论文介绍 农业无人机研究需要集成逼真场景、高保真动力学和机器人中间件的模拟器,且需便于部署。本文提出Droneulator便携式模拟器架构,结合RotorPy进行多旋翼动力学模拟和Godot 4进行渲染与传感器生成。它支持PX4控制、ROS 2通信,可用于数据采集、局部规划和强化学习实验。

Multi-Floor Exploration for Ground Robots via an Incremental Reachable Graph and Structural Priors

第一作者: Zhiwen Zhu · 方向: 具身智能 · 来源: cs.RO

Abstract:Autonomous exploration of multi-floor buildings remains challenging for ground robots because conventional 2D and 2.5D maps cannot represent overlapping traversable surfaces such as stairs, ramps, and multiple reachable elevations. This letter presents a multi-floor exploration framework based on an incremental reachable graph. Built as a sparse graph over reachable support surfaces, the graph preserves potentially valid connectivity through tentative graph elements under sparse observations and enables stable, physically reachable frontier detection. To guide exploration beyond the currently mapped floor, we project task-zone priors from an explored floor to initialize a hypothetical graph on the target floor and reconcile it incrementally with incoming observations. A hierarchical planner then jointly reasons over confirmed and hypothetical structures for global guidance. In s

论文介绍 地面机器人探索多楼层建筑具有挑战性,因为传统地图无法表示重叠的可通行表面(如楼梯)。本文提出一种基于增量可达图的多楼层探索框架。该图稀疏表示可达支撑面,并能在观测稀疏时保留潜在连接。通过将已探索楼层的任务区域先验投影到目标楼层,初始化假设图,以引导全局探索。

Sparse Compositional Flow Matching by geometric assembly from motion primitives

第一作者: Yan Tang · 方向: 具身智能 · 来源: cs.RO

Abstract:Embodied trajectories, such as the executable motion sequences of robotic manipulators, underwater vehicles, and mobile robots, are a fundamental output of embodied AI. Modern generative models often treat them as a dense, monolithic signal generated point by point, fitting an intricate high-dimensional posterior while leaving the data's latent structure unmodeled, the same sample inefficiency long identified by the structured generative model literature. We argue that a compositional latent structure is a natural choice: many embodied tasks share recurring motion fragments that can be made explicit as a finite repertoire of reusable motion primitives, and compositional units naturally align with subtask boundaries to support task decomposition. Existing compositional generators, however, compose in a latent space and rely on post-hoc decoding to relate sampled units to actual t

论文介绍 现有生成模型将机器人运动轨迹视为密集信号,忽略了其潜在的组合结构。本文提出稀疏组合流匹配方法,通过从有限的可复用运动基元库中组装来生成轨迹。该方法将运动基元的组装显式化,使其自然对齐子任务边界,支持任务分解,有望提升生成效率和可解释性。

6G Communication Networks Enabling Embodied Agents: Architecture and Prototype

第一作者: Lipeng Dai · 方向: 具身智能 · 来源: cs.RO

Abstract:Embodied agents, which couple intelligent decision-making with physical actuation in the real world, impose far more stringent and heterogeneous communication requirements than purely software-based agents. While 6G promises sub-millisecond latency, ultra-high reliability, native intelligence, and integrated sensing, systematic studies on how to exploit these capabilities for embodied agent communication remain limited. This article investigates 6G-enabled communication systems for embodied agents from both conceptual and engineering perspectives. First, we review the concept, embodiment value of embodied agents, and clarify their distinctions from disembodied agents. Then, we analyse the symbiotic relationship between embodied agents and 6G networks. We highlight how key 6G enablers can support the stringent requirements of human-robot interaction. Furthermore, we demonstrate t

论文介绍 具身智能体对通信提出了严苛且异构的需求。6G网络有望提供亚毫秒级延迟、超高可靠性、原生智能和集成感知。本文从概念和工程角度研究使能具身智能体的6G通信系统,分析了两者之间的共生关系,并通过原型演示了6G关键技术如何支持严格的人机交互需求。

Signal Temporal Logic Motion Planning via Graphs of Convex Sets

第一作者: Yu Chen · 方向: 导航与运动 · 来源: cs.RO

Abstract:This paper investigates continuous-time motion planning under Signal Temporal Logic (STL) specifications. The goal is to generate smooth robot trajectories that satisfy high-level logical and timing requirements while respecting low-level motion constraints. To this end, we propose an efficient framework that combines timed-automata reasoning with graphs of convex sets (GCS). An STL specification is first represented by a timed automaton, which is then coupled with a convex decomposition of the configuration space to form a joint transition system encoding both task progress and region occupancy. Based on this joint transition system, the STL motion-planning problem is reformulated as a shortest-path problem over a GCS, whose solution induces a smooth Bézier-spline trajectory satisfying the STL specification, smoothness requirements, and velocity bounds. We establish the soundne

论文介绍 本文研究满足信号时序逻辑(STL)规范的连续时间运动规划问题。框架将STL规范转化为时间自动机,并与配置空间的凸分解结合,形成联合转换系统。进而将规划问题重新表述为凸集图(GCS)上的最短路径问题,其解可生成满足STL规范、平滑性要求和速度约束的贝塞尔样条轨迹。

Autonomous Frontier-Based Exploration with VLM Guidance

第一作者: Aarush Aitha · 方向: 多模态具身 · 来源: cs.RO

Abstract:Autonomous robotic exploration of unknown and hazardous environments, a long-standing challenge, can be significantly improved by leveraging the advanced reasoning of Vision-Language Models (VLMs). We introduce a novel exploration pipeline where a VLM performs high-level strategic decision-making, guiding a conventional low-level robotics control stack. At decision points, the robot generates a multimodal prompt with its current map and visual imagery of potential paths, or frontiers. The VLM analyzes this prompt to select the most promising frontier, replacing simple geometric heuristics with contextual spatial reasoning. This approach, validated in simulation across six indoor environments, improves map coverage by up to 24\% over existing methods. Our pipeline is lightweight, training-free, and easily transferable to any robot with standard sensors and an internet connection.

论文介绍 本文提出一种新颖的探索流水线,利用视觉语言模型(VLM)进行高层战略决策,引导传统的低层机器人控制栈。在决策点,机器人生成包含当前地图和路径视觉信息的多模态提示,VLM分析后选择最有希望的边界,用上下文空间推理替代简单的几何启发式,显著提升了地图覆盖率。

Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping

第一作者: Nitin Vegesna · 方向: 多模态具身 · 来源: cs.RO

Abstract:We present Semantic-Aware Guided Exploration, SAGE, a system for open-vocabulary exploration in unknown 3D indoor environments that preserves coverage-oriented behavior while allowing semantic cues to reprioritize frontier selection. Building on the FALCON volumetric explorer, SAGE integrates Contrastive Language-Image Pre-training (CLIP) via four key components: object-centric embedding storage, a temporal cache that projects recent observations onto the free-unknown boundary, object frontiers for high-similarity detections, and a unified semantic-geometric planning cost. This cost function bounds semantic reweighting influence, ensuring frontiers are prioritized without sacrificing total coverage. In Matterport3D-based simulations, SAGE outperforms FALCON and a semantic-only ablation in object discovery across map-query pairs. Compared to Finding Things in the Unknown (FTU), S

论文介绍 本文提出SAGE系统,用于未知3D室内环境的开放词汇探索。它在保持覆盖探索行为的同时,允许语义线索重新调整边界选择的优先级。系统基于FALCON体积探索器,集成了CLIP模型,通过对象中心嵌入、时间缓存、对象边界和统一的语义-几何规划成本函数,在对象发现方面优于现有方法。

$π_0$-EqM: Equilibrium Matching for Closed-Loop Vision-Language-Action Control

第一作者: Huanming Liu · 方向: VLA 通用模型 · 来源: cs.RO

Abstract:Currently, Vision-Language-Action (VLA) models have become the most adopted paradigm for robotic manipulation for its great potential for task generalization. While most generative flow-matching action decoders for VLA control are often deployed with fixed sampling horizons, limiting state-dependent compute and temporal reuse across control cycles. We present $\pi_0$-EqM, which replaces the flow-matching expert in $\pi_0$ with an Equilibrium Matching (EqM) decoder while leaving the upstream VLA stack unchanged. Under a matched 300-step budget, $\pi_0$-EqM improves RoboTwin average success from 40.4% to 50.2% across 19 tasks and remains competitive on LIBERO, with its clearest gain on LIBERO-10 (87.0%). Two threshold scans reveal a task-dependent non-monotonic relation between residual and success, which we term the stationarity--executability gap. The results suggest that infere

论文介绍 当前VLA模型中基于流匹配的动作解码器通常使用固定的采样时间,限制了计算自适应性。本文提出$\pi_0$-EqM,用平衡匹配解码器替换$\pi_0$中的流匹配专家,在相同计算预算下,显著提升了任务成功率。研究揭示了任务相关的残差与成功率之间的非单调关系。

Four Simple Proprioceptive Estimators for Legged Robots

第一作者: Frank Dellaert · 方向: 具身智能 · 来源: cs.RO

Abstract:Legged robots carry an IMU, but the inertial solution drifts because consumer-grade IMUs are noisy. However, the feet create intermittent contacts with the environment that can be used to mitigate that drift. This report develops a sequence of increasingly expressive legged robot state estimators that leverage this. In all cases, the floating-base state comprises attitude, position, velocity, and IMU biases. To model foot contacts, we start from the contact-aided invariant EKF of Hartley et al., albeit at a reduced contact update rate. This is then augmented by replacing the measurement update by a small factor graph. Finally, we turn the same factors into a fixed-lag smoother with contact-episode footholds, with and without an evolving IMU bias. To facilitate reproducibility and further research in proprioceptive legged odometry, all four variants are available in GTSAM (Dellae

论文介绍 消费级IMU存在漂移,但足式机器人的脚部与环境的间歇性接触可用于缓解此问题。本文开发了四个递进式的足式机器人状态估计器,从接触辅助的不变扩展卡尔曼滤波开始,逐步引入因子图和平滑器,并利用接触回合信息。所有变体均在GTSAM中实现,以促进可复现研究。

UfM*: Uncertainty from Motion* for DNN Depth Estimation Using Gaussians

第一作者: Soumya Sudhakar · 方向: 具身智能 · 来源: cs.RO

Abstract:Reliable uncertainty estimation is critical for deploying monocular depth deep neural networks (DNNs) in safety-critical robotic systems. Conventional uncertainty methods such as ensembles and sampling-based approaches require multiple inferences per image, incurring substantial compute and memory overhead. Moreover, uncertainty predicted from a single image misses out on measuring disagreement between predictions across views of the same region. We propose Uncertainty from Motion* (UfM*), an uncertainty estimation algorithm that measures multiview disagreement efficiently by comparing previous and current views using a compact Gaussian mixture, requiring only a single DNN inference per image. Using Gaussians to compute multiview disagreement is not only more compute- and memory-efficient than a prior approach using a point cloud, but also improves uncertainty by measuring disag

论文介绍 可靠的不确定性估计对安全关键机器人系统至关重要。传统方法需多次推理,开销大。本文提出UfM*算法,通过使用紧凑的高斯混合模型比较前后帧视角来高效测量多视角分歧,仅需单次深度神经网络推理。该方法比基于点云的方法更高效,并通过测量分歧提升了不确定性估计。

PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

第一作者: Zexin Li · 方向: 机器人操作 · 来源: cs.RO

Abstract:Recent research has demonstrated the potential of reinforcement learning in effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interest and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents PIMbot, a framework that manipulates outcomes via two complementary levers: (i) incentive manipulation of the reward channel and (ii) policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers in an online manner. Our work introduces a novel approach to manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our prop

论文介绍 多机器人协作面临通信和对手干扰等环境因素影响。本文提出PIMbot框架,通过两个互补杠杆操纵结果:对奖励通道的激励操纵和对智能体自身行为的策略操纵。一个自适应多目标控制器在线平衡这两个杠杆。该方法为多智能体强化学习社会困境中的操纵问题提供了新思路。

Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations

第一作者: Helena Merker · 方向: 具身智能 · 来源: cs.RO

Abstract:Learning reward functions from demonstrations assumes that demonstrations provide adequate supervision over all features -- or task-relevant aspects of behavior. In practice, demonstrations are often imperfect: humans may under-emphasize certain features due to cognitive load or physical difficulty, or the training regime may fail to sufficiently cover all relevant situations. In either case, important features may be underspecified, leading to ambiguity in the learned reward function and misaligned behavior at deployment. We propose a framework that detects such underspecified features and actively solicits targeted corrective demonstrations. Our key insight is that demonstrations implicitly reveal which features are well specified: features that are consistently optimized show little variation across demonstrations, while features that are underspecified vary widely. We levera

论文介绍 从演示中学习奖励函数假设演示对所有特征提供了充分监督,但实际上往往不然。本文提出一个框架,能检测这种特征指定不足的问题,并主动向人类寻求针对性的纠正性演示。关键洞见是:演示中变化大的特征可能是被低估的。该方法能生成信息丰富的询问以减少歧义。

Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models

第一作者: Ruofan Jin · 方向: VLA 通用模型 · 来源: cs.RO

Abstract:Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensive demonstrations. We introduce Agentic-VLA, an agentic training framework that enables VLAs to efficiently adapt online through three key innovations: (1) Adaptive Reward Synthesis, which dynamically generates and adjusts reward functions based on the VLA's current capabilities and task complexity, decomposing complex tasks into learnable sub-goals for curriculum learning; (2) Language-Guided Exploration, where a critic model provides structured guidance for systematic exploration rather than random sampling; and (3) Experience Memory,which stores and retrieves ta

论文介绍 当前VLA训练方法存在泛化能力差和训练效率低的问题。本文提出Agentic-VLA框架,通过三个关键创新使VLA能高效在线自适应:基于当前能力和任务复杂度动态生成奖励的自适应奖励合成、由评论家模型引导探索的语言引导探索,以及存储和检索任务特定经验的体验记忆机制。

Remote Teleoperation of Endovascular Intervention Robots: A Systematic Review

第一作者: Xingyu Chen · 方向: 模仿学习 · 来源: cs.RO

Abstract:Remote robotic-assisted endovascular intervention offers a promising approach to reduce clinician radiation exposure and physical strain, while extending specialized vascular care to geographically distant regions. Despite advancements, teleoperated endovascular intervention remains underexplored, especially for time-sensitive interventions like mechanical thrombectomy for acute stroke. The aim of the current review was to determine the evidence regarding teleoperated endovascular robotic systems, covering technical feasibility, communication infrastructure, and clinical outcomes. The review further identified research gaps and future directions. Following PRISMA guidelines, 16 studies were included that met the inclusion criteria out of 2501 initial search results. We found that teleoperated catheters and guidewires, driven by mechanical or electromagnetic systems, can be navig

论文介绍 远程机器人辅助血管介入可减少医生辐射暴露并扩展专业护理范围。本文遵循PRISMA指南,系统综述了相关研究,涵盖技术可行性、通信基础设施和临床结果。综述发现,遥控导管和导丝在技术上是可行的,尤其在时间敏感的机械取栓等干预中具有潜力,并指出了研究空白。

ChainFlow-VLA: Causal Flow Planning with Vision-Language Models

第一作者: Xiyang Wang · 方向: VLA 通用模型 · 来源: cs.RO

Abstract:Current end-to-end autonomous driving systems are fundamentally limited by a mismatch between temporal causal reasoning and global trajectory consistency. Autoregressive (AR) models capture interaction-aware temporal dependencies via causal factorization, but their step-wise decoding leads to error accumulation and suboptimal global structure. In contrast, diffusion models optimize trajectories globally but lack explicit causal constraints, making them unreliable in interactive and safety-critical scenarios. This dichotomy reveals a deeper issue: existing methods treat causal modeling and global optimization as separate paradigms, without a principled way to unify them within a single trajectory distribution. To address this, we propose ChainFlow-VLA, which unifies causal generation and global refinement within a unified probabilistic framework. We formulate planning as a mixtur

论文介绍 当前端到端自动驾驶系统在时间因果推理和全局轨迹一致性之间存在矛盾。自回归模型有因果性但全局性差,扩散模型全局性好但缺因果约束。本文提出ChainFlow-VLA,在一个统一的概率框架内融合因果生成和全局细化,通过混合建模将规划重新表述为混合分布的前向流匹配问题。

IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

第一作者: Lin Qian · 方向: 导航与运动 · 来源: cs.RO

Abstract:Existing object navigation benchmarks usually tell an embodied agent which object category to find, such as microwave or chair. Human-facing embodied AI is often asked something less direct: "I need something to warm this food" or "the room feels stuffy." The agent must infer the object that can satisfy the need, find a scene-grounded instance, and decide whether the goal has been reached. We study this setting as intent-driven object navigation and introduce IntentionNav, a diagnostic benchmark for active object search from implicit human instructions. Each episode provides a free-text intent, RGB-D observations, and pose, but withholds the target object name. IntentionNav contains 500 intents over 176 Isaac Sim scenes and 64 target categories. Each intent is rewritten in four controlled instruction styles and annotated with one of four intent modes, separating surface phrasing

论文介绍 现有物体导航基准通常直接给出目标物体类别。本文研究「意图驱动」的物体导航设置,智能体需从隐式的人类指令(如「我需要热食物的东西」)中推断目标物体。为此引入了IntentionNav基准,包含多种意图模式、控制指令风格,旨在诊断智能体从隐含意图中推理并寻找物体的能力。

SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-Based Humanoid Control

第一作者: Jingyan Zhang · 方向: 策略学习 · 来源: cs.RO

Abstract:Controlling physics-based humanoids from natural-language instructions is a critical step toward general-purpose embodied agents. However, existing methods remain constrained by a tension between semantic expressiveness and physical feasibility, often failing to jointly achieve faithful instruction following, high-quality motion, and stable long-horizon control. We propose SCRIPT, a scalable diffusion policy with a multi-stage training framework for language-driven physics-based humanoid control. The core of SCRIPT is a Joint Action-State-Text Diffusion Transformer (JAST-DiT), which represents actions, physical states, and text as dedicated token streams and couples them through joint attention, enabling direct interaction between language semantics and control dynamics. To stabilize autoregressive control, we introduce a nonlinear history conditioning mechanism, which preserves

论文介绍 从自然语言指令控制物理仿真中的人形机器人是实现通用具身智能的关键。本文提出SCRIPT,一个可扩展的扩散策略多阶段训练框架。其核心是联合动作-状态-文本扩散Transformer,通过联合注意力机制使语言语义与控制动力学直接交互。非线性历史条件化机制用于稳定自回归控制。

GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation

第一作者: Kaichen Zhou · 方向: 机器人操作 · 来源: cs.RO

Abstract:Video world models can generate realistic futures from a single instruction, but they often fail to preserve consistent point-level motion over time. As a result, the generated videos appear plausible, yet lack the physical grounding required for reliable action execution, such as robot manipulation. We present GEM-4D, a geometry-grounded video world model that resolves this limitation by injecting dense 4D correspondence supervision, distilled from a pretrained geometry foundation model, into the video generative backbone during training. This supervision enables the model to jointly capture appearance and geometric structure while retaining a single-stream architecture with no additional inference cost. We further introduce an inverse dynamics module that converts correspondence-consistent video rollouts into executable robot trajectories, enabling direct deployment in both re

论文介绍 视频世界模型常无法保持一致的点级运动,生成的视频虽逼真但缺乏可靠执行所需的物理基础。本文提出GEM-4D,一个几何增强的视频世界模型,通过在训练中注入从预训练几何基础模型提炼的密集4D对应监督,来联合捕获外观和几何结构。逆动力学模块可将视频轨迹转换为可执行的机器人轨迹。

市场总览

综合技术面观察,市场呈现分化格局。美股主要指数如S&P 500和Nasdaq 100 ETF均处于上升趋势,价格接近52周高点,但MACD均出现死叉信号,且Nasdaq 100的RSI已达71.4进入超买区域,显示短期上行动能可能面临考验。加密货币市场整体情绪偏恐慌,恐慌贪婪指数为30,总市值2.66万亿美元下,比特币和以太坊价格承压,技术指标如RSI均低于50,且多数处于空头排列,但比特币价格在关键均线附近震荡,暂未形成单边突破。中概股板块技术面普遍偏弱,阿里巴巴、拼多多等个股均呈现空头排列,RSI位于40-45的弱势区间,下跌趋势较为明显。商品与外汇方面,黄金和原油期货RSI均低于50,原油近五日跌幅显著;美元指数接近52周高点,处于多头排列;10年期美债收益率亦呈多头排列,反映部分避险与利率预期。

今日关注

AAPL Apple
偏上行

当前价格308.82,显著高于20日均线289.35和50日均线270.55,维持多头排列。RSI14为78.4,处于超买区域。MACD线9.97高于信号线8.99,显示短期动量偏强。价格距离52周高点仅0.83%,显示强劲的上升趋势和动量。

PDD 拼多多 (PDD)
偏下行

当前价格94.52,低于20日均线97.92和50日均线99.63,形成空头排列。RSI14为41.7,接近超卖区域。MACD线为-1.09,信号线为-0.95,处于死叉状态。价格接近52周低点仅2.11%,各项指标均显示技术性疲软和下行压力。

BTC-USD Bitcoin
中性

当前价格77283.93,略低于20日均线78835.84但高于50日均线76937.55,均线系统方向不明。RSI14为48.2,接近中性50水平。MACD线为-198.80,信号线为232.83,差值较大但柱状图未明显扩张。整体技术指标未呈现明确的趋势方向。

USDCNY=X 美元 / 人民币
中性

当前价格6.78,低于20日、50日及200日均线(6.81, 6.84, 7.00),形成空头排列。RSI14为36.9,偏低。MACD线为-0.0124,信号线为-0.0133,刚形成金叉。价格接近52周低点仅0.06%。空头排列与MACD金叉信号冲突,方向性不明。

全部资产

^VIX

VIX 恐慌指数

$16.65 -0.30%
5 日
-6.57%
距 52w 高
-52.8%
RSI(14)
40.5
趋势
中性
SMA 20 / 50 / 200
17.56 / 20.57 / 18.36
MACD / 信号
-0.786 / -0.851

^TNX

10Y 美债收益率 (%)

$4.56 -0.61%
5 日
-0.81%
距 52w 高
-8.8%
RSI(14)
60.1
趋势
多头
SMA 20 / 50 / 200
4.46 / 4.37 / 4.20
MACD / 信号
0.073 / 0.062
多头排列

DX-Y.NYB

美元指数 DXY

$98.96 -0.36%
5 日
-0.01%
距 52w 高
-1.7%
RSI(14)
52.8
趋势
多头
SMA 20 / 50 / 200
98.63 / 98.94 / 98.56
MACD / 信号
0.138 / 0.039
接近 52 周高多头排列

SPY

S&P 500 ETF

$745.64 +0.39%
5 日
+0.88%
距 52w 高
-0.5%
RSI(14)
68.8
趋势
多头
SMA 20 / 50 / 200
731.58 / 696.68 / 678.85
MACD / 信号
12.353 / 13.312
MACD 死叉 (4 天前)接近 52 周高多头排列

QQQ

Nasdaq 100 ETF

$717.54 +0.42%
5 日
+1.21%
距 52w 高
-0.6%
RSI(14)
71.4
趋势
多头
SMA 20 / 50 / 200
694.91 / 642.10 / 614.66
MACD / 信号
20.430 / 21.777
MACD 死叉 (3 天前)RSI 超买接近 52 周高多头排列

AAPL

Apple

$308.82 +1.26%
5 日
+2.86%
距 52w 高
-0.8%
RSI(14)
78.4
趋势
多头
SMA 20 / 50 / 200
289.35 / 270.55 / 261.55
MACD / 信号
9.971 / 8.993
RSI 超买接近 52 周高多头排列

MSFT

Microsoft

$418.57 -0.12%
5 日
-0.79%
距 52w 高
-24.6%
RSI(14)
54.4
趋势
中性
SMA 20 / 50 / 200
416.61 / 400.44 / 460.40
MACD / 信号
3.772 / 4.140

NVDA

Nvidia

$215.33 -1.90%
5 日
-4.43%
距 52w 高
-9.0%
RSI(14)
53.7
趋势
多头
SMA 20 / 50 / 200
214.75 / 196.81 / 187.03
MACD / 信号
6.906 / 7.777
MACD 死叉 (1 天前)多头排列

GOOGL

Alphabet

$382.97 -1.21%
5 日
-3.48%
距 52w 高
-6.3%
RSI(14)
57.5
趋势
多头
SMA 20 / 50 / 200
385.48 / 341.14 / 296.19
MACD / 信号
13.615 / 17.149
MACD 死叉 (4 天前)多头排列

TSLA

Tesla

$426.01 +1.95%
5 日
+0.89%
距 52w 高
-14.6%
RSI(14)
58.3
趋势
中性
SMA 20 / 50 / 200
409.26 / 388.33 / 410.03
MACD / 信号
10.171 / 10.819
MACD 死叉 (2 天前)

META

Meta

$610.26 +0.47%
5 日
-0.65%
距 52w 高
-23.4%
RSI(14)
45.3
趋势
空头
SMA 20 / 50 / 200
619.10 / 617.81 / 669.43
MACD / 信号
-7.099 / -6.071
空头排列
加密恐慌贪婪
30
恐慌
加密总市值
$2.66 T
+0.60% / 24h
BTC 主导率
58.2%
ETH 9.6%
24h 成交量
$66.9 B
活跃币 17,390

BTC-USD

Bitcoin

$77,283.93 +0.39%
5 日
-0.22%
距 52w 高
-38.8%
RSI(14)
48.2
趋势
中性
SMA 20 / 50 / 200
78,835.84 / 76,937.55 / 80,405.70
MACD / 信号
-198.804 / 232.827

ETH-USD

Ethereum

$2,114.17 +0.77%
5 日
-0.60%
距 52w 高
-57.3%
RSI(14)
39.6
趋势
空头
SMA 20 / 50 / 200
2,210.88 / 2,264.12 / 2,540.82
MACD / 信号
-51.445 / -39.063
空头排列

SOL-USD

Solana

$85.84 +0.69%
5 日
-0.23%
距 52w 高
-66.1%
RSI(14)
46.8
趋势
空头
SMA 20 / 50 / 200
88.93 / 86.51 / 106.77
MACD / 信号
-0.556 / 0.001
空头排列

BABA

阿里巴巴 (BABA)

$130.00 -1.12%
5 日
-1.95%
距 52w 高
-32.5%
RSI(14)
44.1
趋势
空头
SMA 20 / 50 / 200
135.08 / 131.77 / 149.50
MACD / 信号
-0.084 / 0.803
MACD 死叉 (4 天前)空头排列

PDD

拼多多 (PDD)

$94.52 -3.34%
5 日
-1.37%
距 52w 高
-32.2%
RSI(14)
41.7
趋势
空头
SMA 20 / 50 / 200
97.92 / 99.63 / 113.77
MACD / 信号
-1.093 / -0.952
MACD 死叉 (今天)接近 52 周低空头排列

JD

京东 (JD)

$30.52 -3.02%
5 日
-4.65%
距 52w 高
-17.2%
RSI(14)
47.7
趋势
中性
SMA 20 / 50 / 200
30.96 / 29.95 / 30.49
MACD / 信号
0.536 / 0.612
MACD 死叉 (今天)

0700.HK

腾讯控股 (0700.HK)

HK$441.40 +0.55%
5 日
-3.29%
距 52w 高
-35.4%
RSI(14)
34.8
趋势
空头
SMA 20 / 50 / 200
464.77 / 494.09 / 576.98
MACD / 信号
-14.023 / -12.911
接近 52 周低空头排列

GC=F

黄金期货

$4,523.20 +0.05%
5 日
-0.64%
距 52w 高
-19.0%
RSI(14)
40.2
趋势
中性
SMA 20 / 50 / 200
4,603.03 / 4,658.28 / 4,354.13
MACD / 信号
-49.050 / -39.612

CL=F

WTI 原油期货

$96.60 +0.00%
5 日
-11.10%
距 52w 高
-19.1%
RSI(14)
46.9
趋势
中性
SMA 20 / 50 / 200
101.00 / 98.34 / 71.68
MACD / 信号
0.728 / 1.598
MACD 死叉 (2 天前)

USDCNY=X

美元 / 人民币

¥6.78 -0.27%
5 日
-0.37%
距 52w 高
-5.9%
RSI(14)
36.9
趋势
空头
SMA 20 / 50 / 200
6.81 / 6.84 / 7.00
MACD / 信号
-0.012 / -0.013
MACD 金叉 (4 天前)接近 52 周低空头排列
风险提示

本报告基于公开行情数据计算的技术指标,仅供技术指标解读参考。技术分析基于历史价格数据,过去走势不代表未来表现,不构成任何投资建议。

Iran War Live Updates: Peace Deal Could Take Days to Nail Down

Oil prices fell sharply Monday, as negotiations between the United States and Iran appeared to continue. But both sides have offered conflicting accounts of the emerging agreement.

中文摘要 周一油价大幅下跌,美伊谈判持续进行,但双方对新兴协议描述矛盾。

Oil prices slide on hopes of US-Iran peace deal

Trump said on Saturday that an agreement would include the reopening of the Strait of Hormuz, without giving further details.

中文摘要 油价下滑,因美伊和平协议希望;特朗普称协议将包括重开霍尔木兹海峡,但未详述。

Middle East crisis live: Trump suggests Gulf countries should sign Abraham accords recognising Israel under any deal

Writing on social media, US president says countries should make settlement with Iran ‘a far more Historic Event’ Iran denies deal with US is imminent, despite some progress Ebrahim Rezaei, the spokesperson of the Iranian parliament’s national security and foreign policy commission, has said that ti

中文摘要 特朗普建议海湾国家签署亚伯拉罕协议承认以色列;伊朗否认协议即将达成,但承认有进展。

Iran denies deal with US is imminent despite some progress

Tehran says ‘contradictory statements’ from US and Israeli interference hindering negotiations Middle East crisis – live updates Iran has poured cold water on suggestions that a deal with the US is imminent, pointing to the confusion in US positions and Israeli interference as key factors in why a c

中文摘要 伊朗否认与美国协议即将达成,指出美国立场矛盾及以色列干涉阻碍谈判。

Oil prices fall below $100 a barrel on hopes of Iran peace deal

Brent crude futures down 6% to lowest level in two weeks and stock markets rise Oil prices fell below $100 a barrel on Monday and stock markets rose on hopes that the US and Iran are inching closer to a peace deal. Brent crude futures, the global oil benchmark, were down 6% to $97.28 a barrel, the l

中文摘要 周一油价跌破每桶100美元,布伦特原油期货下跌6%至两周低点,因美伊和平协议进展,股市上涨。

习近平会见巴基斯坦总理谢里夫 互夸美伊调解角色

习近平周一在北京会见了巴基斯坦总理夏巴兹·谢里夫,表示“双方要继续保持密切沟通和协调,共同反对单边主义和冷战思维”。

中文摘要 习近平在北京会见巴基斯坦总理谢里夫,双方强调保持沟通协调,共同反对单边主义和冷战思维,并互相称赞在美伊冲突中的调解角色。

Pope Leo Warns of Risks From A.I. in 42,300-Word Encyclical

The document marks a powerful foray by the leader of the Roman Catholic Church into the debate about the misuse or overuse of artificial intelligence.

中文摘要 教皇利奥在长达42,300字的通谕中警告人工智能的风险,标志著罗马天主教会领袖正式介入关于AI滥用或过度使用的辩论。

Cubans Cook With Charcoal and Wood Fires to Survive During Energy Crisis

The U.S. oil blockade has left millions without cooking gas. In Santiago de Cuba, the cradle of the Cuban revolution, apartment tower residents resort to charcoal and firewood.

中文摘要 美国石油封锁使数百万古巴人失去烹饪燃气,在革命发源地圣地亚哥,居民被迫使用木炭和柴火做饭。

Trump Tower in Georgia to be built on land part-owned by son of US sanctions-hit leader

Links between Trump Organization and Ivanishvili family for Tbilisi skyscraper raise new conflict of interest concerns A Trump Tower planned for the Georgian capital, Tbilisi, is to be built on land currently part-owned by the son of the US-sanctioned leader of the country, according to official rec

中文摘要 计划在格鲁吉亚首都第比利斯建造的特朗普大厦,将建在美国制裁领导人之子部分拥有的土地上,引发新的利益冲突担忧。

中国成为2025在德投资项目最多的国家

2025年,德国的外国投资项目数量降至多年来的最低水平。从来源国看,中国超过美国,成为来德投资最多的国家。

中文摘要 2025年德国外国投资项目数量降至多年最低,中国超越美国成为在德投资项目最多的国家。

Oil prices slide on hopes of US-Iran peace deal

Trump said on Saturday that an agreement would include the reopening of the Strait of Hormuz, without giving further details.

中文摘要 因美国与伊朗达成和平协议的希望增加,油价下跌。特朗普表示协议将包括重新开放霍尔木兹海峡,但未提供更多细节。

Pope Leo says AI ‘needs to be disarmed’

Pontiff warns of dangers of a technological revolution driven by ‘the idolatry of profit’

中文摘要 教皇利奥表示,人工智能「需要被解除武装」。他警告说,由「利润崇拜」驱动的技术革命存在危险。

Iran War Squeezes India’s Gas Power Supply as Demand Hits Record

India’s gas power generation has plunged to the lowest level in at least six years, with the Iran war hitting fuel shipments, straining electricity supplies at a time when a scorching summer is pushing demand to record highs.

中文摘要 伊朗战争冲击燃料运输,导致印度天然气发电量骤降至至少六年来的最低水平。与此同时,酷暑正将电力需求推向创纪录的高位,供应紧张加剧。

Italian Stocks Hit First Record in 26 Years Led by Energy, Chips

Italy’s benchmark equity index rose past its all-time closing high set in 2000, with a recent rally in energy and chip stocks supercharging it to record levels.

中文摘要 意大利基准股指突破2000年创下的历史收盘高点,26年来首次创下纪录。近期能源和芯片股的强劲涨势推动了这一创纪录的表现。

Zepto Said to Plan June Public Filing for $1 Billion India IPO

Rapid-commerce firm Zepto Ltd. is preparing to publicly file in the first half of June for an initial public offering that may raise up to $1 billion, according to people familiar with the matter.

中文摘要 据知情人士透露,快商务公司Zepto Ltd.正准备在6月上旬公开提交首次公开募股文件,此次IPO可能筹集至多10亿美元。

Italian bank CDP to raise stake in payments group Nexi after CVC weighs bid

Cassa Depositi e Prestiti’s move comes as buyout firm had considered €9bn bid for Milan-listed group

中文摘要 意大利国有银行Cassa Depositi e Prestiti计划增持支付集团Nexi的股份。此举发生之际,收购公司CVC曾考虑对这家米兰上市公司提出90亿欧元的竞购要约。

Iran Says US Deal Not Imminent | The Pulse 5/25/2026

Iran has denied that a deal to end the war with the US is imminent, after US officials said they were closing in on a deal. Secretary of State Marco Rubio said an agreement could come as soon as Monday, but Iran’s Foreign Ministry Spokesman Esmail Baghaei said “It is true that a consensus was reache

中文摘要 伊朗否认与美国达成结束战争的协议迫在眉睫。此前美国官员称双方正接近达成协议,美国国务卿马尔科·鲁比奥表示协议最早可能在周一达成。

FirstFT: Trump strikes more cautious tone on peace talks with Iran

Also in today’s newsletter: US consumers face spending squeeze, the biopic threatening to derail Flávio Bolsonaro’s presidential run

中文摘要 据《金融时报》报道,特朗普在与伊朗的和平谈判中采取了更为谨慎的语气。此外,该通讯还涉及美国消费者面临支出压力等内容。

Iran Says US Deal Not Imminent | The Opening Trade 5/25/2026

Iran's Foreign Ministry spokesman says a deal to open the Strait of Hormuz is not imminent, but that consensus has been reached on many issues. Earlier, oil fell after senior US officials said Washington and Tehran were closing in on an agreement to open the vital waterway. The Opening Trade has eve

中文摘要 伊朗外交部发言人称,开放霍尔木兹海峡的协议并非迫在眉睫,但已在许多问题上达成共识。此前,有美国高级官员称华盛顿与德黑兰正接近达成协议,油价因此下跌。

Swiss Trader Had Lucrative Role Getting Iraqi Oil Through Hormuz

A little-known Swiss trading company played a key role in the transit through the Strait of Hormuz of an oil supertanker whose stop-start journey captivated the oil market earlier this month, according to people familiar with the matter.

中文摘要 据知情人士透露,一家名不见经传的瑞士交易公司在本月早些时候吸引石油市场关注的一艘油轮运输中,扮演了关键角色,负责将伊拉克石油运经霍尔木兹海峡。

AI guardrails stripped from Meta and Google models in minutes

Software designed to remove safety protections creates systems that provide responses on biological weapons and malware

中文摘要 旨在移除AI安全护栏的软件能在几分钟内破解Meta和谷歌的模型,使其生成有关生物武器和恶意软件等危险内容的回应。

不要用工艺站生😍图,如果还有此类事情发生,将改为邀请制。

后台所有聊天和生图错误全部都有记录,我可以随时登录查看错误原因。 这不是个例,记录错误是为了方便排查问题优化代码,但是大量的错误都是生图导致的,非常浪费我的时间。 如果后面还是有大量这种错误请求,我就将聊天站改为邀请制。 如图所示,错误原因全是因为生成图。再次强调,禁止生成图。 为了防止这类事情再次发生,已经改为登录后可聊天。 28 个帖子 - 27 位参与者 阅读完整话题

星辰AI ClaudeMax特惠分组上线 0.88

星辰AI ClaudeMax特惠分组上线 0.88 没啥,星辰AI就是上线了,感谢anthropic!!! 该渠道为羊毛,请按需充值。 说大白话就是少充点,用多少充多少 所谓有求而不得,人心欲壑,可填沧海~ 12 个帖子 - 12 位参与者 阅读完整话题

感觉自己被 AI 的虚假成就感骗了,再见了gpt的羊毛

前段时间一直在疯狂薅 GPT 的羊毛,每天都想着额度别浪费、token 得用完,于是开很多线程,搞很多需求。看起来好像每天都很忙,项目优化、文档整理、方案分析、学习规划,啥都让 AI 来一遍。并且自己一直都在让ai帮我写项目学习文档,写来写去,但是从未开始。并且在薅羊毛这件事上浪费我大量时间,分散我的注意力。 但最近突然感觉不太对劲。自己在三月订的四月份开始找实习的计划现在都只能说才刚开始,八股基本没背,项目也没理解透,于是开始反思了一下自己最近做的事情。 很多所谓的需求,其实根本不是我真的需要,只是我为了把额度用掉,硬找出来的一堆事。AI 确实能很快产出东西,但问题是,那些东西很多都没有真正

用AI给摆摊的老板做了菜单招牌,老板要给我磕一个😨(更新下提示词)

第一次这么直观的感受到AI给非技术人员的一些小震撼 GPT生成的招牌如下: 很多佬友问提示词,但我也不是很专业的去写,我说一下我的处理过程: 结合我上传的图片 扣出来食物真实的素材,将汤锅、料锅透明化处理作为局部背景,生成一份诱人是食物介绍和价格表,加料区的素材现在网上找 名字前面要加实物图,生成一个9:16的 再生成一个16:9的 再生成一个简笔画风格 再生成一个XXX都推荐之类的自媒体风格。内容上如果布局充裕 再加料区附近可以加一个经典搭配或推荐搭配,注意一定要有我上传的面的实物素材,保障食物的真实样子 然后生成了很多版本,比较喜欢的是这个 其他的版本不尽人意就不展示了 再然后突然奇想来个

我看很多人都不知道如何写一个注册机吧,开个零基础代码靠ai写浏览器自动化的教程吧(不是协议注册)

这次带来的教程是零基础教程,没错是零基础,注意有代码基础的就不要喷我了,我是一个0代码基础的男人 或者是谷歌浏览器里面的f12- 把你动作录制下来,然后下次在你要自动化的界面进行播放就好了 红框的内容是元素选择器,点击这个来选择我们要自动化输入的位置,等等相当于定位 你会发现元素选择选择网页的时候右边的元素代码会跟着动,你把鼠标指针在右边代码来回移动你会发现左边的页面也跟着动,好你在右边的代码里面选择你要定位的输入位置右键复制 把复制的outerhtml 丢给ai 扩展里面的开发者选项一定要打开,然后加载这个解压扩展就相当于是你让ai写扩展的哪个文件夹 推荐你写的时候让ai帮你加入:浏览器右侧

「开源」AI 小说创作桌面应用

本帖使用社区开源推广,符合推广要求。我申明并遵循社区要求的以下内容: 我的帖子已经打上 开源推广 标签: 是 我的开源项目完整开源,无未开源部分: 是 我的开源项目已链接认可 LINUX DO 社区: 是 我帖子内的项目介绍,AI生成、润色内容部分已截图发出: 是 以上选择我承诺是永久有效的,接受社区和佬友监督: 是 以下为项目介绍正文内容,AI生成、润色内容已使用截图方式发出 最近token有点富足,于是尝试用AI从零开发一个工具。 GitHub上其实已经有不少开源的小说创作Skills,比如 oh-story-claudecode 项目,里面凝结了大量优秀的 prompt 工程经验:怎么写

【前沿慢讯】Claude Opus4.8 传言或将明天发布

OpenAI GPT5.6也即将发布,目前有可靠第三方信源透露Claude Opus 4.8,可能要领先GPT5.6,提前发布,双方新模型竞争先发优势进入白热化,拭目以待吧。 37 个帖子 - 35 位参与者 阅读完整话题

公司不让用国外大模型了

严禁全体员工在公司办公网络、办公设备及业务场景中, 以任何形式(包括网页、API 、客户端、中转服务等)访问或使用境外 AI 大模型(包 括但不限于 ChatGPT 、Claude 、Gemini 等)。违者将依规严肃处理。 83 个帖子 - 52 位参与者 阅读完整话题