Memory Compression - 検索 News

6 日

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...

17 日

New KV cache compaction technique cuts LLM memory 50x without accuracy loss

MIT researchers developed Attention Matching, a KV cache compaction technique that compresses LLM memory by 50x in seconds — without the hours of GPU training that prior methods required.

一部の結果でアクセス不可の可能性があるため、非表示になっています。

アクセス不可の結果を表示する

Nvidia says it can shrink LLM memory 20x without changing model weights

New KV cache compaction technique cuts LLM memory 50x without accuracy loss

現在のトレンド