Auto-filtered. Here's why — so you know you're not missing out:
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
→ Single-source paper, low reader value
Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs
→ Single-source paper, low reader value
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
→ Single-source paper, low reader value
Temporal Preference Concepts and their Functions in a Large Language Model
→ Single-source paper, low reader value
Evaluating Agentic Configuration Repair for Computer Networks
→ Single-source paper, low reader value
Benchmark Everything Everywhere All at Once
→ Single-source paper, low reader value
Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming
→ Single-source paper, low reader value
Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation
→ Single-source paper, low reader value