大语言模型 (LLM)
关于 GPT、Claude、Llama、Gemini 等大语言模型的最新资讯、技术突破与行业应用。
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
The Cyber Attack Manifestation Log Data Set (CAM-LDS) is an open-source dataset designed to train Large Language Models ...
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
Researchers have developed CAM-LDS (Cyber Attack Manifestation Log Data Set), a novel open-source dataset addressing the...
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
CAM-LDS (Cyber Attack Manifestation Log Data Set) is an open-source dataset designed to train Large Language Models for ...
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts
The Cyber Attack Manifestation Log Data Set (CAM-LDS) is a novel open-source dataset developed by University of Göttinge...
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Researchers from Carnegie Mellon University and Microsoft Research introduced CodeTaste, a benchmark evaluating LLMs' ab...
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Researchers from Carnegie Mellon University and University of Washington introduced CodeTaste, a novel benchmark evaluat...
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
The CodeTaste benchmark evaluates LLM coding agents on realistic refactoring tasks mined from open-source repositories. ...
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Researchers from Carnegie Mellon University introduced CodeTaste, a benchmark evaluating LLM coding agents' ability to i...
CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Researchers introduced CodeTaste, a novel benchmark evaluating Large Language Models' ability to discover human-level co...
Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model
The Bielik-Q2-Sharp study systematically evaluates six 2-bit quantization methods on the Polish Bielik-11B-v2.3-Instruct...
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Dynamic Pruning Policy Optimization (DPPO) is a novel training framework that accelerates Group Relative Policy Optimiza...
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization trai...
Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Dynamic Pruning Policy Optimization (DPPO) is a novel framework that accelerates Group Relative Policy Optimization (GRP...
Data-Aware Random Feature Kernel for Transformers
DARKFormer introduces a novel Transformer architecture that addresses the scaling bottleneck in attention mechanisms by ...
Data-Aware Random Feature Kernel for Transformers
DARKFormer introduces a data-aware random-feature kernel transformer that addresses the anisotropic representation probl...
Data-Aware Random Feature Kernel for Transformers
DARKFormer (Data-Aware Random-feature Kernel transformer) is a novel transformer architecture that addresses quadratic s...
Data-Aware Random Feature Kernel for Transformers
DARKFormer is a novel transformer architecture that addresses scaling limitations by combining data-aware kernel design ...
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
Researchers developed a multi-dimensional quality scoring framework for decentralized LLM inference that decomposes text...
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
A new multi-dimensional quality scoring framework for decentralized LLM inference decomposes output quality into six mod...
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
Researchers have developed a multi-dimensional quality scoring framework for decentralized LLM inference networks that d...
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
Researchers have developed a multi-dimensional quality scoring framework for evaluating outputs in decentralized LLM inf...
A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality
Researchers have developed a multi-dimensional quality scoring framework that decomposes LLM output quality into six mod...
Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
Spectral Surgery is a novel training-free refinement technique for Low-Rank Adaptation (LoRA) that improves AI model per...
Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
Spectral Surgery is a novel training-free post-hoc refinement method that improves Low-Rank Adaptation (LoRA) modules by...
Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
Spectral Surgery is a training-free post-hoc refinement technique for Low-Rank Adaptation (LoRA) that improves model per...
Spectral Surgery: Training-Free Refinement of LoRA via Gradient-Guided Singular Value Reweighting
Spectral Surgery is a novel training-free technique that refines LoRA adapters by identifying and reweighting inefficien...
TFWaveFormer: Temporal-Frequency Collaborative Multi-level Wavelet Transformer for Dynamic Link Prediction
TFWaveFormer is a novel AI architecture that integrates Transformer models with multi-resolution wavelet decomposition f...
BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning
BD-Merging is a bias-aware dynamic model merging framework developed by UC Berkeley researchers that addresses performan...
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
A new study reveals that current evaluation methods for role-playing AI agents are fundamentally biased, as models rely ...
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
OpenAI has launched the o1 and o1-mini models, representing a foundational shift in AI architecture towards process supe...
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
CzechTopic is a new human-annotated benchmark for evaluating topic localization in historical Czech documents. It shifts...
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
OpenAI has launched the o1 model family, featuring o1-preview and o1-mini variants, which represent a fundamental shift ...
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
Czech researchers have developed the CzechTopic benchmark, a human-annotated dataset for evaluating AI models' ability t...
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents
OpenAI has launched the o1 model family, a new class of AI systems optimized for complex reasoning and problem-solving t...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
RDB-PFN is the first foundation model for relational databases, pre-trained entirely on over 2 million synthetically gen...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
RDB-PFN is the first foundation model for relational databases trained purely on synthetic data, overcoming privacy and ...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
RDB-PFN is a relational database foundation model pre-trained on over 2 million synthetically generated single-table and...
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Researchers from Zhejiang University and Alibaba Group introduced Structure of Thought (SoT), a prompting technique that...
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Researchers introduced Structure of Thought (SoT), a prompting technique that guides LLMs to construct intermediate text...
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
The Structure of Thought (SoT) prompting technique enhances large language model performance by guiding them to construc...
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Researchers introduced Structure-of-Thought (SoT) prompting and T2S-Bench, a comprehensive benchmark for evaluating text...
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models for cost-effici...
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
The COREA (COllaborative REAsoner) framework is a novel cascading system that strategically combines Small Language Mode...
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
COREA (COllaborative REAsoner) is a novel cascading system that combines Small Language Models (SLMs) and Large Language...
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
COREA (COllaborative REAsoner) is a novel cascading system that combines small and large language models to reduce infer...
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
COREA (COllaborative REAsoner) is a novel AI system that cascades Small Language Models (SLMs) with Large Language Model...
Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
A new study reveals parents want nuanced moderation tools for children's interactions with generative AI chatbots, movin...
Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
A study with 24 parents found that current GenAI chatbot parental controls fail to address nuanced, context-dependent ri...
Understanding Parents' Desires in Moderating Children's Interactions with GenAI Chatbots through LLM-Generated Probes
A study with 24 parents reveals that current generative AI parental controls fail to address nuanced concerns, such as A...
Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
The CoCo-TAMP framework integrates large language models into robotic planning to address partial observability challeng...
Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...
Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
The CoCo-TAMP framework introduces LLM-guided hierarchical state estimation for partially observable robotic planning, r...
Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning
CoCo-TAMP is a novel framework that integrates large language models (LLMs) into robotic task and motion planning for pa...
Mathematicians in the age of AI
AI systems have reached a critical threshold where they can prove both formally verified and informally stated research-...
黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
深度求索公司正式发布DeepSeek-V2大语言模型,采用创新的MLA架构与DeepSeekMoE技术,总参数2360亿,推理时仅激活210亿参数。该模型在C-Eval和MMLU基准测试中接近GPT-4 Turbo水平,数学与代码能力看齐C...
黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
谷歌DeepMind发布的Gemini 1.5 Pro大语言模型在多项关键基准测试中展现出与GPT-4 Turbo相当或超越的性能,特别是在数学(MATH)、代码(HumanEval)和复杂科学推理(GPQA Diamond)任务上。该模型...
黑马图像模型被Nano Banana技术负责人点赞!15人华人小队,DDIM之父&CVPR最佳论文作者带队
深度求索公司发布了DeepSeek-V2大语言模型,该模型采用创新的MLA多头潜在注意力与MoE混合专家架构,拥有2360亿参数但每次仅激活210亿。在多项基准测试中性能接近GPT-4,而API定价极具竞争力,输入价格低至每百万tokens...