diff --git a/Practical_sessions/Session_7/Subject_7_LLM.ipynb b/Practical_sessions/Session_7/Subject_7_LLM.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..ca9fb93f78e42a0092550d4575e53a7df3d45602 --- /dev/null +++ b/Practical_sessions/Session_7/Subject_7_LLM.ipynb @@ -0,0 +1,923 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### **_Deep Learning - Bsc Data Science for Responsible Business - Centrale Lyon_**\n", + "\n", + "2024-2025\n", + "\n", + "Emmanuel Dellandréa\t " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Practical Session 7 – Large Language Models\n", + "\n", + "The objective of this tutorial is to learn to work with LLM models for sentence generation and classification. The pretrained models and tokenizers will be obtained from the [Hugging Face platform](https://huggingface.co/).\n", + "\n", + "This notebook contains 8 parts:\n", + "1. Using a Hugging Face text generation model\n", + "2. Using Pipeline of Hugging Face for text classification\n", + "3. Using Pipeline with a specific model and tokenizer of Hugging Face\n", + "4. Experimenting with models from Hugging Face\n", + "5. Training a LLM for sentence classification using the **Trainer** class\n", + "6. Fine tuning a LLM model with a custom head\n", + "7. Sharing a model on Hugging Face platform\n", + "8. Further experiments\n", + "\n", + "Before going further into experiments, you work is to understand the provided code, that gives an overview of using LLM with Hugging Face.\n", + "\n", + "**This code is intentionally not commented. It is your responsibility to add all the necessary comments to ensure your proper understanding of the code.**\n", + "\n", + "\n", + "---\n", + "\n", + "\n", + "\n", + "\n", + "As the computation can be heavy, particularly during training, we encourage you to use a GPU. If your laptob is not equiped, you may use one of these remote jupyter servers, where you can select the execution on GPU :\n", + "\n", + "1) [jupyter.mi90.ec-lyon.fr](https://jupyter.mi90.ec-lyon.fr/)\n", + "\n", + "This server is accessible within the campus network. If outside, you need to use a VPN. Before executing the notebook, select the kernel \"Python PyTorch\" to run it on GPU and have access to PyTorch module.\n", + "\n", + "2) [Google Colaboratory](https://colab.research.google.com/)\n", + "\n", + "Before executing the notebook, select the execution on GPU : \"Runtime\" -> \"Change runtime type\" --> \"T4 GPU\". " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installing required librairies " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install huggingface_hub\n", + "!pip install ipywidgets\n", + "!pip install transformers\n", + "!pip install datasets\n", + "!pip install accelerate\n", + "!pip install scikit-learn\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Then login to Hugging Face" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from huggingface_hub import notebook_login\n", + "notebook_login()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 1 - Using a Hugging Face text generation model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer, AutoModelForCausalLM\n", + "\n", + "# model_name = \"mistralai/Mistral-7B\"\n", + "# model_name = \"deepseek-ai/DeepSeek-R1\"\n", + "# model_name = \"meta-llama/Llama-3.2-3B-Instruct\"\n", + "# model_name = \"homebrewltd/AlphaMaze-v0.2-1.5B\"\n", + "model_name = \"gpt2\"\n", + "\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", + "model = AutoModelForCausalLM.from_pretrained(model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "input_text = \"Hello. Who are you ?\"\n", + "encoded_input = tokenizer(input_text, return_tensors=\"pt\")\n", + "\n", + "output = model.generate(\n", + " input_ids=encoded_input[\"input_ids\"],\n", + " attention_mask=encoded_input[\"attention_mask\"],\n", + " max_length=100,\n", + " temperature=0.8,\n", + " pad_token_id=tokenizer.pad_token_id\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "generated_text = tokenizer.decode(output[0], skip_special_tokens=True)\n", + "print(generated_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 2 - Using Pipeline of Hugging Face for text classification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import pipeline\n", + "\n", + "classifier = pipeline(\"text-classification\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier(\"We are very happy to welcome you at Centrale Lyon.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "results = classifier([\"We are very happy to welcome you at Centrale Lyon.\", \"We hope you don't hate it.\"])\n", + "for result in results:\n", + " print(f\"label: {result['label']}, with score: {round(result['score'], 4)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 3 - Using Pipeline with a specific model and tokenizer of Hugging Face" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_name = \"nlptown/bert-base-multilingual-uncased-sentiment\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n", + "\n", + "model = AutoModelForSequenceClassification.from_pretrained(model_name)\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)\n", + "classifier(\"We are very hapy to present you this incredible model.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 4 - Experimenting with models from Hugging Face" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer\n", + "\n", + "model_name = \"nlptown/bert-base-multilingual-uncased-sentiment\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "encoding = tokenizer(\"We are very happy to welcome you at Centrale Lyon.\")\n", + "print(encoding)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "batch = tokenizer(\n", + " [\"We are very happy to welcome you at Centrale Lyon.\", \"We hope you don't hate it.\"],\n", + " padding=True,\n", + " truncation=True,\n", + " max_length=512,\n", + " return_tensors=\"pt\",\n", + ")\n", + "print(batch)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoModelForSequenceClassification\n", + "\n", + "model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=\"auto\")\n", + "print(model)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "outputs = model(**batch)\n", + "print(outputs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from torch import nn\n", + "\n", + "predictions = nn.functional.softmax(outputs.logits, dim=-1)\n", + "print(predictions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "save_directory = \"./save_pretrained\"\n", + "tokenizer.save_pretrained(save_directory)\n", + "model.save_pretrained(save_directory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loaded_model = AutoModelForSequenceClassification.from_pretrained(\"./save_pretrained\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 5 - Training a LLM for sentence classification using the **Trainer** class" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoModelForSequenceClassification\n", + "\n", + "model_name = \"distilbert/distilbert-base-uncased\"\n", + "\n", + "model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=\"auto\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import TrainingArguments\n", + "\n", + "training_args = TrainingArguments(\n", + " output_dir=\"save_folder/\",\n", + " learning_rate=2e-5,\n", + " per_device_train_batch_size=8,\n", + " per_device_eval_batch_size=8,\n", + " num_train_epochs=2,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "\n", + "dataset = load_dataset(\"rotten_tomatoes\") " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def tokenize_dataset(dataset):\n", + " return tokenizer(dataset[\"text\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataset = dataset.map(tokenize_dataset, batched=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import DataCollatorWithPadding\n", + "\n", + "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import Trainer\n", + "\n", + "trainer = Trainer(\n", + " model=model,\n", + " args=training_args,\n", + " train_dataset=dataset[\"train\"],\n", + " eval_dataset=dataset[\"test\"],\n", + " processing_class=tokenizer,\n", + " data_collator=data_collator,\n", + ") " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "save_directory = \"./tomatoes_save_pretrained\"\n", + "tokenizer.save_pretrained(save_directory)\n", + "model.save_pretrained(save_directory)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = AutoModelForSequenceClassification.from_pretrained(save_directory)\n", + "tokenizer = AutoTokenizer.from_pretrained(save_directory)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import pipeline\n", + "classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = dataset['test'][345]\n", + "print(t)\n", + "classifier(t['text'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 6 - Fine tuning a LLM model with a custom head" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "from transformers import DistilBertTokenizer, DistilBertModel\n", + "import torch\n", + "from torch.utils.data import DataLoader\n", + "from torch.optim import AdamW\n", + "from sklearn.metrics import accuracy_score, precision_recall_fscore_support\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataset = load_dataset(\"imdb\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def tokenize_function(examples):\n", + " return tokenizer(examples[\"text\"], padding=\"max_length\", truncation=True, max_length=512)\n", + "\n", + "tokenized_datasets = dataset.map(tokenize_function, batched=True)\n", + "\n", + "\n", + "tokenized_datasets = tokenized_datasets.remove_columns([\"text\"])\n", + "tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")\n", + "tokenized_datasets.set_format(\"torch\")\n", + "\n", + "train_dataset = tokenized_datasets[\"train\"]\n", + "test_dataset = tokenized_datasets[\"test\"]\n", + "\n", + "train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)\n", + "test_loader = DataLoader(test_dataset, batch_size=8)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n", + "\n", + "for param in bert_model.parameters():\n", + " param.requires_grad = False\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class CustomBERTModel(torch.nn.Module):\n", + " def __init__(self, bert_model):\n", + " super(CustomBERTModel, self).__init__()\n", + " self.bert = bert_model\n", + " self.custom_head = torch.nn.Sequential(\n", + " torch.nn.Linear(self.bert.config.hidden_size, 128),\n", + " torch.nn.ReLU(),\n", + " torch.nn.Dropout(0.1),\n", + " torch.nn.Linear(128, 2) \n", + " )\n", + "\n", + " def forward(self, input_ids, attention_mask):\n", + " outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)\n", + " outputs = self.custom_head(outputs.last_hidden_state[:, 0, :]) # Use [CLS] token output\n", + " return outputs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n", + "\n", + "for param in bert_model.parameters():\n", + " param.requires_grad = False\n", + "\n", + "model = CustomBERTModel(bert_model)\n", + "\n", + "# device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "device = torch.device(\"mps\")\n", + "\n", + "model.to(device)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "optimizer = AdamW(model.parameters(), lr=2e-5)\n", + "criterion = torch.nn.CrossEntropyLoss()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def train_epoch(model, data_loader, optimizer, criterion, device):\n", + " model.train()\n", + " total_loss = 0\n", + " for batch in data_loader:\n", + " optimizer.zero_grad()\n", + " input_ids = batch[\"input_ids\"].to(device)\n", + " attention_mask = batch[\"attention_mask\"].to(device)\n", + " labels = batch[\"labels\"].to(device)\n", + "\n", + " outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n", + " loss = criterion(outputs, labels)\n", + " loss.backward()\n", + " optimizer.step()\n", + "\n", + " total_loss += loss.item()\n", + " return total_loss / len(data_loader)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def evaluate(model, data_loader, criterion, device):\n", + " model.eval()\n", + " total_loss = 0\n", + " all_predictions = []\n", + " all_labels = []\n", + " \n", + " with torch.no_grad():\n", + " for batch in data_loader:\n", + " input_ids = batch[\"input_ids\"].to(device)\n", + " attention_mask = batch[\"attention_mask\"].to(device)\n", + " labels = batch[\"labels\"].to(device)\n", + "\n", + " outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n", + " loss = criterion(outputs, labels)\n", + " total_loss += loss.item()\n", + "\n", + " predictions = torch.argmax(outputs, dim=-1)\n", + " all_predictions.extend(predictions.cpu().numpy())\n", + " all_labels.extend(labels.cpu().numpy())\n", + " \n", + " accuracy = accuracy_score(all_labels, all_predictions)\n", + " precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_predictions, average=\"binary\")\n", + " return total_loss / len(data_loader), accuracy, precision, recall, f1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_epochs = 3\n", + "\n", + "for epoch in range(num_epochs):\n", + " print(f\"Epoch {epoch + 1}/{num_epochs}\")\n", + " \n", + " train_loss = train_epoch(model, train_loader, optimizer, criterion, device)\n", + " print(f\"Train Loss: {train_loss:.4f}\")\n", + " \n", + " val_loss, val_accuracy, val_precision, val_recall, val_f1 = evaluate(model, test_loader, criterion, device)\n", + " print(f\"Validation Loss: {val_loss:.4f}\")\n", + " print(f\"Accuracy: {val_accuracy:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}\")\n", + "\n", + " torch.save(model.state_dict(), f\"custom_bert_epoch_{epoch + 1}.pth\")\n", + "\n", + "\n", + "# (After 76 minutes of training)\n", + "# Epoch 1/3\n", + "# Train Loss: 0.6708\n", + "# Validation Loss: 0.6415\n", + "# Accuracy: 0.7917, Precision: 0.8218, Recall: 0.7450, F1 Score: 0.7815\n", + "# Epoch 2/3\n", + "# Train Loss: 0.6172\n", + "# Validation Loss: 0.5825\n", + "# Accuracy: 0.8051, Precision: 0.8142, Recall: 0.7907, F1 Score: 0.8023\n", + "# Epoch 3/3\n", + "# Train Loss: 0.5634\n", + "# Validation Loss: 0.5300\n", + "# Accuracy: 0.8098, Precision: 0.8339, Recall: 0.7738, F1 Score: 0.8027" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_save_path = \"custom_bert_model.pth\"\n", + "\n", + "torch.save(model.state_dict(), model_save_path)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "loadedbert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n", + "\n", + "loaded_model = CustomBERTModel(loadedbert_model)\n", + "\n", + "loaded_model.load_state_dict(torch.load(model_save_path))\n", + "\n", + "loaded_model.to(device)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "batch = next(iter(test_loader))\n", + "\n", + "ids = batch['input_ids'][0]\n", + "attention_mask = batch['attention_mask'][0]\n", + "label = batch['labels'][0]\n", + "\n", + "ids = ids.to(device)\n", + "attention_mask = attention_mask.to(device)\n", + "\n", + "text = tokenizer.decode(ids, skip_special_tokens=True)\n", + "print(text)\n", + "print(label)\n", + "\n", + "loaded_model.eval()\n", + "output = model(input_ids=ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0))\n", + "\n", + "output = output.squeeze(0)\n", + "print(output)\n", + "prediction = torch.argmax(output, dim=-1)\n", + "print(prediction)\n", + "print(label)\n", + "print(prediction == label)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 7 - Sharing a model on Hugging Face platform" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import DistilBertPreTrainedModel, DistilBertModel\n", + "import torch.nn as nn\n", + "\n", + "class CustomDistilBERTModel(DistilBertPreTrainedModel):\n", + " def __init__(self, config, freeze_backbone=True):\n", + " super().__init__(config)\n", + " self.distilbert = DistilBertModel(config)\n", + " self.classifier = nn.Sequential(\n", + " nn.Linear(config.hidden_size, 128),\n", + " nn.ReLU(),\n", + " nn.Dropout(0.1),\n", + " nn.Linear(128, config.num_labels) # Binary classification\n", + " )\n", + " self.init_weights()\n", + "\n", + " # Freeze DistilBERT backbone if specified\n", + " if freeze_backbone:\n", + " for param in self.distilbert.parameters():\n", + " param.requires_grad = False\n", + "\n", + " def forward(self, input_ids, attention_mask=None, labels=None):\n", + " outputs = self.distilbert(input_ids=input_ids, attention_mask=attention_mask)\n", + " logits = self.classifier(outputs.last_hidden_state[:, 0, :]) # Use [CLS] token output\n", + " return logits\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoConfig\n", + "AutoConfig.register(\"custom-distilbert\", AutoConfig)\n", + "AutoModel.register(CustomDistilBERTModel, \"custom-distilbert\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import DistilBertTokenizer\n", + "\n", + "# Initialize the configuration with custom attributes\n", + "config = AutoConfig.from_pretrained(\"distilbert-base-uncased\", num_labels=2)\n", + "config.architectures = [\"CustomDistilBERTModel\"]\n", + "\n", + "# Initialize the model and tokenizer\n", + "model = CustomDistilBERTModel(config)\n", + "tokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n", + "\n", + "# Save locally\n", + "model.save_pretrained(\"custom_distilbert_model\")\n", + "tokenizer.save_pretrained(\"custom_distilbert_model\")\n", + "\n", + "print(\"Custom model and tokenizer saved locally!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "device = \"mps\"\n", + "model = model.to(device)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_epochs = 3\n", + "\n", + "for epoch in range(num_epochs):\n", + " print(f\"Epoch {epoch + 1}/{num_epochs}\")\n", + " \n", + " train_loss = train_epoch(model, train_loader, optimizer, criterion, device)\n", + " print(f\"Train Loss: {train_loss:.4f}\")\n", + " \n", + " val_loss, val_accuracy, val_precision, val_recall, val_f1 = evaluate(model, test_loader, criterion, device)\n", + " print(f\"Validation Loss: {val_loss:.4f}\")\n", + " print(f\"Accuracy: {val_accuracy:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}\")\n", + "\n", + " torch.save(model.state_dict(), f\"custom_bert_epoch_{epoch + 1}.pth\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.push_to_hub(\"custom-distilbert-model\")\n", + "tokenizer.push_to_hub(\"custom-distilbert-model\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoTokenizer, AutoModel\n", + "loaded_tokenizer = AutoTokenizer.from_pretrained(\"your_hf_id/custom-distilbert-model\")\n", + "loaded_model = AutoModel.from_pretrained(\"your_hf_id/custom-distilbert-model\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Part 8 - Further experiments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you know the basics for manipulating LLM through Hugging Face platform, it is time to experiment with:\n", + "- different [NLP tasks](https://huggingface.co/tasks)\n", + "- different [models](https://huggingface.co/models?pipeline_tag=text-classification&sort=trending)\n", + "- different [datasets](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending)\n", + "\n", + "... and to share your finetuned models on the platform.\n", + "\n", + "Besides, don't forget to monitor your trainings through [Weights & Biases](https://wandb.ai/home)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "td_llm", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}