大语言模型 (LLM) - AI资讯

LLM 2026年3月8日

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

The Cyber Attack Manifestation Log Data Set (CAM-LDS) is an open-source dataset designed to train Large Language Models ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

Researchers have developed CAM-LDS (Cyber Attack Manifestation Log Data Set), a novel open-source dataset addressing the...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

CAM-LDS (Cyber Attack Manifestation Log Data Set) is an open-source dataset designed to train Large Language Models for ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

The Cyber Attack Manifestation Log Data Set (CAM-LDS) is a novel open-source dataset developed by University of Göttinge...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University and Microsoft Research introduced CodeTaste, a benchmark evaluating LLMs' ab...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University and University of Washington introduced CodeTaste, a novel benchmark evaluat...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

The CodeTaste benchmark evaluates LLM coding agents on realistic refactoring tasks mined from open-source repositories. ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University introduced CodeTaste, a benchmark evaluating LLM coding agents' ability to i...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers introduced CodeTaste, a novel benchmark evaluating Large Language Models' ability to discover human-level co...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

The Bielik-Q2-Sharp study systematically evaluates six 2-bit quantization methods on the Polish Bielik-11B-v2.3-Instruct...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel training framework that accelerates Group Relative Policy Optimiza...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization trai...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Data-Aware Random Feature Kernel for Transformers

DARKFormer introduces a novel Transformer architecture that addresses the scaling bottleneck in attention mechanisms by ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Data-Aware Random Feature Kernel for Transformers

DARKFormer introduces a data-aware random-feature kernel transformer that addresses the anisotropic representation probl...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Data-Aware Random Feature Kernel for Transformers

DARKFormer (Data-Aware Random-feature Kernel transformer) is a novel transformer architecture that addresses quadratic s...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Data-Aware Random Feature Kernel for Transformers

DARKFormer is a novel transformer architecture that addresses scaling limitations by combining data-aware kernel design ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers developed a multi-dimensional quality scoring framework for decentralized LLM inference that decomposes text...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

A new multi-dimensional quality scoring framework for decentralized LLM inference decomposes output quality into six mod...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework for decentralized LLM inference networks that d...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework for evaluating outputs in decentralized LLM inf...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework that decomposes LLM output quality into six mod...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free refinement technique for Low-Rank Adaptation (LoRA) that improves AI model per...

arXiv cs.AI 阅读全文 →

LLM 2026年3月8日

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free post-hoc refinement method that improves Low-Rank Adaptation (LoRA) modules by...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a training-free post-hoc refinement technique for Low-Rank Adaptation (LoRA) that improves model per...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free technique that refines LoRA adapters by identifying and reweighting inefficien...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

TFWaveFormer: Temporal-Frequency Collaborative Multi-level Wavelet Transformer for Dynamic Link Prediction

TFWaveFormer is a novel AI architecture that integrates Transformer models with multi-resolution wavelet decomposition f...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

BD-Merging is a bias-aware dynamic model merging framework developed by UC Berkeley researchers that addresses performan...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

A new study reveals that current evaluation methods for role-playing AI agents are fundamentally biased, as models rely ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 and o1-mini models, representing a foundational shift in AI architecture towards process supe...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

CzechTopic is a new human-annotated benchmark for evaluating topic localization in historical Czech documents. It shifts...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 model family, featuring o1-preview and o1-mini variants, which represent a fundamental shift ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Czech researchers have developed the CzechTopic benchmark, a human-annotated dataset for evaluating AI models' ability t...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 model family, a new class of AI systems optimized for complex reasoning and problem-solving t...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is the first foundation model for relational databases, pre-trained entirely on over 2 million synthetically gen...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is the first foundation model for relational databases trained purely on synthetic data, overcoming privacy and ...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is a relational database foundation model pre-trained on over 2 million synthetically generated single-table and...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers from Zhejiang University and Alibaba Group introduced Structure of Thought (SoT), a prompting technique that...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

The Structure of Thought (SoT) prompting technique enhances large language model performance by guiding them to construc...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure-of-Thought (SoT) prompting and T2S-Bench, a comprehensive benchmark for evaluating text...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models for cost-effici...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

The COREA (COllaborative REAsoner) framework is a novel cascading system that strategically combines Small Language Mode...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines Small Language Models (SLMs) and Large Language...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models to reduce infer...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel AI system that cascades Small Language Models (SLMs) with Large Language Model...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A new study reveals parents want nuanced moderation tools for children's interactions with generative AI chatbots, movin...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A study with 24 parents found that current GenAI chatbot parental controls fail to address nuanced, context-dependent ri...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A study with 24 parents reveals that current generative AI parental controls fail to address nuanced concerns, such as A...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

The CoCo-TAMP framework integrates large language models into robotic planning to address partial observability challeng...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

The CoCo-TAMP framework introduces LLM-guided hierarchical state estimation for partially observable robotic planning, r...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

Mathematicians in the age of AI

AI systems have reached a critical threshold where they can prove both formally verified and informally stated research-...

arXiv cs.AI 阅读全文 →

LLM 2026年3月7日

黑马图像模型被Nano Banana技术负责人点赞！15人华人小队，DDIM之父&CVPR最佳论文作者带队

深度求索公司正式发布DeepSeek-V2大语言模型，采用创新的MLA架构与DeepSeekMoE技术，总参数2360亿，推理时仅激活210亿参数。该模型在C-Eval和MMLU基准测试中接近GPT-4 Turbo水平，数学与代码能力看齐C...

量子位阅读全文 →

LLM 2026年3月7日

黑马图像模型被Nano Banana技术负责人点赞！15人华人小队，DDIM之父&CVPR最佳论文作者带队

谷歌DeepMind发布的Gemini 1.5 Pro大语言模型在多项关键基准测试中展现出与GPT-4 Turbo相当或超越的性能，特别是在数学（MATH）、代码（HumanEval）和复杂科学推理（GPQA Diamond）任务上。该模型...

量子位阅读全文 →

LLM 2026年3月7日

黑马图像模型被Nano Banana技术负责人点赞！15人华人小队，DDIM之父&CVPR最佳论文作者带队

深度求索公司发布了DeepSeek-V2大语言模型，该模型采用创新的MLA多头潜在注意力与MoE混合专家架构，拥有2360亿参数但每次仅激活210亿。在多项基准测试中性能接近GPT-4，而API定价极具竞争力，输入价格低至每百万tokens...

量子位阅读全文 →