gpt-oss-20b 模型结构

发布于:2025-08-30 ⋅ 阅读:(21) ⋅ 点赞:(0)

模型加载

from transformers import AutoTokenizer, GptOssForCausalLM
model = GptOssForCausalLM.from_pretrained("openai/gpt-oss-20b")
/home/six/Zhou/test_source/gpt-oss-unsloth/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16
Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00,  4.84it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:20<00:00,  6.87s/it]

模型结构

model
GptOssForCausalLM(
  (model): GptOssModel(
    (embed_tokens): Embedding(201088, 2880, padding_idx=199999)
    (layers): ModuleList(
      (0-23): 24 x GptOssDecoderLayer(
        (self_attn): GptOssAttention(
          (q_proj): Linear(in_features=2880, out_features=4096, bias=True)
          (k_proj): Linear(in_features=2880, out_features=512, bias=True)
          (v_proj): Linear(in_features=2880, out_features=512, bias=True)
          (o_proj): Linear(in_features=4096, out_features=2880, bias=True)
        )
        (mlp): GptOssMLP(
          (router): GptOssTopKRouter()
          (experts): GptOssExperts()
        )
        (input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)
        (post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)
      )
    )
    (norm): GptOssRMSNorm((2880,), eps=1e-05)
    (rotary_emb): GptOssRotaryEmbedding()
  )
  (lm_head): Linear(in_features=2880, out_features=201088, bias=False)
)

mermaid图

graph TD
    A[GptOssForCausalLM] --> B[model: GptOssModel]
    A --> C[lm_head: Linear<br/>in_features=2880, out_features=201088]
    
    B --> D[embed_tokens: Embedding<br/>201088->2880, padding_idx=199999]
    B --> E[layers: ModuleList]
    B --> F[norm: GptOssRMSNorm<br/>2880 features, eps=1e-05]
    B --> G[rotary_emb: GptOssRotaryEmbedding]
    
    E --> H[24 x GptOssDecoderLayer]
    
    subgraph H[GptOssDecoderLayer]
        H1[self_attn: GptOssAttention]
        H2[mlp: GptOssMLP]
        H3[input_layernorm: GptOssRMSNorm<br/>2880 features]
        H4[post_attention_layernorm: GptOssRMSNorm<br/>2880 features]
        
        H1 --> H1_1[q_proj: Linear<br/>2880->4096, bias]
        H1 --> H1_2[k_proj: Linear<br/>2880->512, bias]
        H1 --> H1_3[v_proj: Linear<br/>2880->512, bias]
        H1 --> H1_4[o_proj: Linear<br/>4096->2880, bias]
        
        H2 --> H2_1[router: GptOssTopKRouter]
        H2 --> H2_2[experts: GptOssExperts]
    end

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style H fill:#e8f5e8

请添加图片描述


网站公告

今日签到

点亮在社区的每一天
去签到