大语言模型 (LLM)

关于 GPT、Claude、Llama、Gemini 等大语言模型的最新资讯、技术突破与行业应用。

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
LLM

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

The Cyber Attack Manifestation Log Data Set (CAM-LDS) is an open-source dataset designed to train Large Language Models ...

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
LLM

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

Researchers have developed CAM-LDS (Cyber Attack Manifestation Log Data Set), a novel open-source dataset addressing the...

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
LLM

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

CAM-LDS (Cyber Attack Manifestation Log Data Set) is an open-source dataset designed to train Large Language Models for ...

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
LLM

CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts

The Cyber Attack Manifestation Log Data Set (CAM-LDS) is a novel open-source dataset developed by University of Göttinge...

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
LLM

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University and Microsoft Research introduced CodeTaste, a benchmark evaluating LLMs' ab...

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
LLM

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University and University of Washington introduced CodeTaste, a novel benchmark evaluat...

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
LLM

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

The CodeTaste benchmark evaluates LLM coding agents on realistic refactoring tasks mined from open-source repositories. ...

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
LLM

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers from Carnegie Mellon University introduced CodeTaste, a benchmark evaluating LLM coding agents' ability to i...

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
LLM

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Researchers introduced CodeTaste, a novel benchmark evaluating Large Language Models' ability to discover human-level co...

Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model
LLM

Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

The Bielik-Q2-Sharp study systematically evaluates six 2-bit quantization methods on the Polish Bielik-11B-v2.3-Instruct...

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
LLM

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
LLM

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel training framework that accelerates Group Relative Policy Optimiza...

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
LLM

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
LLM

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization trai...

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
LLM

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...

Data-Aware Random Feature Kernel for Transformers
LLM

Data-Aware Random Feature Kernel for Transformers

DARKFormer introduces a novel Transformer architecture that addresses the scaling bottleneck in attention mechanisms by ...

Data-Aware Random Feature Kernel for Transformers
LLM

Data-Aware Random Feature Kernel for Transformers

DARKFormer introduces a data-aware random-feature kernel transformer that addresses the anisotropic representation probl...

Data-Aware Random Feature Kernel for Transformers
LLM

Data-Aware Random Feature Kernel for Transformers

DARKFormer (Data-Aware Random-feature Kernel transformer) is a novel transformer architecture that addresses quadratic s...

Data-Aware Random Feature Kernel for Transformers
LLM

Data-Aware Random Feature Kernel for Transformers

DARKFormer is a novel transformer architecture that addresses scaling limitations by combining data-aware kernel design ...

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
LLM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers developed a multi-dimensional quality scoring framework for decentralized LLM inference that decomposes text...

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
LLM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

A new multi-dimensional quality scoring framework for decentralized LLM inference decomposes output quality into six mod...

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
LLM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework for decentralized LLM inference networks that d...

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
LLM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework for evaluating outputs in decentralized LLM inf...

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
LLM

A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

Researchers have developed a multi-dimensional quality scoring framework that decomposes LLM output quality into six mod...

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
LLM

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free refinement technique for Low-Rank Adaptation (LoRA) that improves AI model per...

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
LLM

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free post-hoc refinement method that improves Low-Rank Adaptation (LoRA) modules by...

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
LLM

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a training-free post-hoc refinement technique for Low-Rank Adaptation (LoRA) that improves model per...

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
LLM

Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting

Spectral Surgery is a novel training-free technique that refines LoRA adapters by identifying and reweighting inefficien...

TFWaveFormer: Temporal-Frequency Collaborative Multi-level Wavelet Transformer for Dynamic Link Prediction
LLM

TFWaveFormer: Temporal-Frequency Collaborative Multi-level Wavelet Transformer for Dynamic Link Prediction

TFWaveFormer is a novel AI architecture that integrates Transformer models with multi-resolution wavelet decomposition f...

BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning
LLM

BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

BD-Merging is a bias-aware dynamic model merging framework developed by UC Berkeley researchers that addresses performan...

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
LLM

Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

A new study reveals that current evaluation methods for role-playing AI agents are fundamentally biased, as models rely ...

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 and o1-mini models, representing a foundational shift in AI architecture towards process supe...

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

CzechTopic is a new human-annotated benchmark for evaluating topic localization in historical Czech documents. It shifts...

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 model family, featuring o1-preview and o1-mini variants, which represent a fundamental shift ...

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Czech researchers have developed the CzechTopic benchmark, a human-annotated dataset for evaluating AI models' ability t...

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
LLM

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

OpenAI has launched the o1 model family, a new class of AI systems optimized for complex reasoning and problem-solving t...

Relational In-Context Learning via Synthetic Pre-training with Structural Prior
LLM

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is the first foundation model for relational databases, pre-trained entirely on over 2 million synthetically gen...

Relational In-Context Learning via Synthetic Pre-training with Structural Prior
LLM

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is the first foundation model for relational databases trained purely on synthetic data, overcoming privacy and ...

Relational In-Context Learning via Synthetic Pre-training with Structural Prior
LLM

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

RDB-PFN is a relational database foundation model pre-trained on over 2 million synthetically generated single-table and...

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers from Zhejiang University and Alibaba Group introduced Structure of Thought (SoT), a prompting technique that...

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

The Structure of Thought (SoT) prompting technique enhances large language model performance by guiding them to construc...

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Researchers introduced Structure-of-Thought (SoT) prompting and T2S-Bench, a comprehensive benchmark for evaluating text...

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
LLM

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models for cost-effici...

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
LLM

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

The COREA (COllaborative REAsoner) framework is a novel cascading system that strategically combines Small Language Mode...

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
LLM

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines Small Language Models (SLMs) and Large Language...

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
LLM

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models to reduce infer...

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
LLM

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

COREA (COllaborative REAsoner) is a novel AI system that cascades Small Language Models (SLMs) with Large Language Model...

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
LLM

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A new study reveals parents want nuanced moderation tools for children's interactions with generative AI chatbots, movin...

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
LLM

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A study with 24 parents found that current GenAI chatbot parental controls fail to address nuanced, context-dependent ri...

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
LLM

Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes

A study with 24 parents reveals that current generative AI parental controls fail to address nuanced concerns, such as A...

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
LLM

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

The CoCo-TAMP framework integrates large language models into robotic planning to address partial observability challeng...

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
LLM

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
LLM

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

The CoCo-TAMP framework introduces LLM-guided hierarchical state estimation for partially observable robotic planning, r...

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
LLM

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...

Mathematicians in the age of AI
LLM

Mathematicians in the age of AI

AI systems have reached a critical threshold where they can prove both formally verified and informally stated research-...

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
LLM

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队

深度求索公司正式发布DeepSeek-V2大语言模型,采用创新的MLA架构与DeepSeekMoE技术,总参数2360亿,推理时仅激活210亿参数。该模型在C-Eval和MMLU基准测试中接近GPT-4 Turbo水平,数学与代码能力看齐C...

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
LLM

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队

谷歌DeepMind发布的Gemini 1.5 Pro大语言模型在多项关键基准测试中展现出与GPT-4 Turbo相当或超越的性能,特别是在数学(MATH)、代码(HumanEval)和复杂科学推理(GPQA Diamond)任务上。该模型...

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
LLM

黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队

深度求索公司发布了DeepSeek-V2大语言模型,该模型采用创新的MLA多头潜在注意力与MoE混合专家架构,拥有2360亿参数但每次仅激活210亿。在多项基准测试中性能接近GPT-4,而API定价极具竞争力,输入价格低至每百万tokens...