► Keras 3 API 文档 / Keras自然语言处理（KerasNLP） / KerasNLP 模型

KerasNLP 模型

KerasNLP 包含流行模型架构的端到端实现。这些模型可以通过两种方式创建：

通过 from_preset() 构造函数，该函数使用预训练的配置、词汇表和（可选的）权重实例化一个对象。
通过用户控制的自定义配置。

以下是库中所有可用预设的列表。有关更详细的用法，请浏览特定类的文档字符串。有关我们 API 的深入介绍，请参见入门指南。

背骨预设

以下预设名称对应于模型背骨的配置、权重和词汇表。这些预设尚未准备好进行推断，必须针对特定任务进行微调！

以下名称可以与给定模型的任何 from_preset() 构造函数一起使用。

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased")
backbone = keras_nlp.models.BertBackbone.from_preset("bert_tiny_en_uncased")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased")

Preset name	Model	Parameters	Description
albert_base_en_uncased	ALBERT	11.68M	12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_large_en_uncased	ALBERT	17.68M	24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_extra_large_en_uncased	ALBERT	58.72M	24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
albert_extra_extra_large_en_uncased	ALBERT	222.60M	12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bart_base_en	BART	139.42M	6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. Model Card
bart_large_en	BART	406.29M	12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. Model Card
bart_large_en_cnn	BART	406.29M	The `bart_large_en` backbone model fine-tuned on the CNN+DM summarization dataset. Model Card
bert_tiny_en_uncased	BERT	4.39M	2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_small_en_uncased	BERT	28.76M	4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_medium_en_uncased	BERT	41.37M	8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_en_uncased	BERT	109.48M	12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_en	BERT	108.31M	12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card
bert_base_zh	BERT	102.27M	12-layer BERT model. Trained on Chinese Wikipedia. Model Card
bert_base_multi	BERT	177.85M	12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages Model Card
bert_large_en_uncased	BERT	335.14M	24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
bert_large_en	BERT	333.58M	24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card
bloom_560m_multi	BLOOM	559.21M	24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages. Model Card
bloom_1.1b_multi	BLOOM	1.07B	24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages. Model Card
bloom_1.7b_multi	BLOOM	1.72B	24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages. Model Card
bloom_3b_multi	BLOOM	3.00B	30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages. Model Card
bloomz_560m_multi	BLOOMZ	559.21M	24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_1.1b_multi	BLOOMZ	1.07B	24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_1.7b_multi	BLOOMZ	1.72B	24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset. Model Card
bloomz_3b_multi	BLOOMZ	3.00B	30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset. Model Card
deberta_v3_extra_small_en	DeBERTaV3	70.68M	12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_small_en	DeBERTaV3	141.30M	6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_base_en	DeBERTaV3	183.83M	12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_large_en	DeBERTaV3	434.01M	24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card
deberta_v3_base_multi	DeBERTaV3	278.22M	12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset. Model Card
distil_bert_base_en_uncased	DistilBERT	66.36M	6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card
distil_bert_base_en	DistilBERT	65.19M	6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card
distil_bert_base_multi	DistilBERT	134.73M	6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages Model Card
electra_small_discriminator_uncased_en	ELECTRA	13.55M	12-layer small ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_small_generator_uncased_en	ELECTRA	13.55M	12-layer small ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_base_discriminator_uncased_en	ELECTRA	109.48M	12-layer base ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_base_generator_uncased_en	ELECTRA	33.58M	12-layer base ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_large_discriminator_uncased_en	ELECTRA	335.14M	24-layer large ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
electra_large_generator_uncased_en	ELECTRA	51.07M	24-layer large ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus. Model Card
f_net_base_en	FNet	82.86M	12-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card
f_net_large_en	FNet	236.95M	24-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card
falcon_refinedweb_1b_en	Falcon	1.31B	24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset. Model Card
gemma_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, base Gemma model. Model Card
gemma_instruct_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model. Model Card
gemma_1.1_instruct_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality. Model Card
code_gemma_1.1_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality. Model Card
code_gemma_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. Model Card
gemma_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, base Gemma model. Model Card
gemma_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model. Model Card
gemma_1.1_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality. Model Card
code_gemma_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. Model Card
code_gemma_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. Model Card
code_gemma_1.1_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality. Model Card
gemma2_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, base Gemma model. Model Card
gemma2_instruct_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, instruction tuned Gemma model. Model Card
gemma2_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, base Gemma model. Model Card
gemma2_instruct_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, instruction tuned Gemma model. Model Card
gemma2_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, base Gemma model. Model Card
gemma2_instruct_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, instruction tuned Gemma model. Model Card
shieldgemma_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, ShieldGemma model. Model Card
shieldgemma_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, ShieldGemma model. Model Card
shieldgemma_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, ShieldGemma model. Model Card
gpt2_base_en	GPT-2	124.44M	12-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_medium_en	GPT-2	354.82M	24-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_large_en	GPT-2	774.03M	36-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_extra_large_en	GPT-2	1.56B	48-layer GPT-2 model where case is maintained. Trained on WebText. Model Card
gpt2_base_en_cnn_dailymail	GPT-2	124.44M	12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
llama3_8b_en	LLaMA 3	8.03B	LLaMA 3 8B Base model Model Card
llama3_instruct_8b_en	LLaMA 3	8.03B	LLaMA 3 8B Instruct model Model Card
llama2_7b_en	LLaMA 2	6.74B	LLaMA 2 7B Base model Model Card
llama2_instruct_7b_en	LLaMA 2	6.74B	LLaMA 2 7B Chat model Model Card
mistral_7b_en	Mistral	7.24B	Mistral 7B base model Model Card
mistral_instruct_7b_en	Mistral	7.24B	Mistral 7B instruct model Model Card
mistral_0.2_instruct_7b_en	Mistral	7.24B	Mistral 7B instruct Version 0.2 model Model Card
opt_125m_en	OPT	125.24M	12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_1.3b_en	OPT	1.32B	24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_2.7b_en	OPT	2.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
opt_6.7b_en	OPT	6.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card
pali_gemma_3b_mix_224	PaliGemma	2.92B	image size 224, mix fine tuned, text sequence length is 256 Model Card
pali_gemma_3b_mix_448	PaliGemma	2.92B	image size 448, mix fine tuned, text sequence length is 512 Model Card
pali_gemma_3b_224	PaliGemma	2.92B	image size 224, pre trained, text sequence length is 128 Model Card
pali_gemma_3b_448	PaliGemma	2.92B	image size 448, pre trained, text sequence length is 512 Model Card
pali_gemma_3b_896	PaliGemma	2.93B	image size 896, pre trained, text sequence length is 512 Model Card
phi3_mini_4k_instruct_en	Phi-3	3.82B	3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Model Card
phi3_mini_128k_instruct_en	Phi-3	3.82B	3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Model Card
roberta_base_en	RoBERTa	124.05M	12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card
roberta_large_en	RoBERTa	354.31M	24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card
xlm_roberta_base_multi	XLM-RoBERTa	277.45M	12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card
xlm_roberta_large_multi	XLM-RoBERTa	558.84M	24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card
t5_small_multi	T5	0	8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
t5_base_multi	T5	0	12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
t5_large_multi	T5	0	24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
flan_small_multi	T5	0	8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
flan_base_multi	T5	0	12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
flan_large_multi	T5	0	24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4). Model Card
whisper_tiny_en	Whisper	37.18M	4-layer Whisper model. Trained on 438,000 hours of labelled English speech data. Model Card
whisper_base_en	Whisper	124.44M	6-layer Whisper model. Trained on 438,000 hours of labelled English speech data. Model Card
whisper_small_en	Whisper	241.73M	12-layer Whisper model. Trained on 438,000 hours of labelled English speech data. Model Card
whisper_medium_en	Whisper	763.86M	24-layer Whisper model. Trained on 438,000 hours of labelled English speech data. Model Card
whisper_tiny_multi	Whisper	37.76M	4-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. Model Card
whisper_base_multi	Whisper	72.59M	6-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. Model Card
whisper_small_multi	Whisper	241.73M	12-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. Model Card
whisper_medium_multi	Whisper	763.86M	24-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. Model Card
whisper_large_multi	Whisper	1.54B	32-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. Model Card
whisper_large_multi_v2	Whisper	1.54B	32-layer Whisper model. Trained for 2.5 epochs on 680,000 hours of labelled multilingual speech data. An improved of `whisper_large_multi`. Model Card

注意: 提供的链接将指向模型卡或官方 README，如果作者未提供模型卡。

分类预设

以下预设名称对应于模型 分类器 的配置、权重和词汇表。这些模型已准备好进行推断，但可以进一步微调（如果需要）。

以下名称可以与分类模型和预处理层的 from_preset() 构造函数一起使用。

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased_sst2")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased_sst2")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased_sst2")