{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **_Deep Learning  - Bsc Data Science for Responsible Business - Centrale Lyon_**\n",
    "\n",
    "2024-2025\n",
    "\n",
    "Emmanuel Dellandréa\t  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practical Session 7 – Large Language Models\n",
    "\n",
    "The objective of this tutorial is to learn to work with LLM models for sentence generation and classification. The pretrained models and tokenizers will be obtained from the [Hugging Face platform](https://huggingface.co/).\n",
    "\n",
    "This notebook contains 8 parts:\n",
    "1. Using a Hugging Face text generation model\n",
    "2. Using Pipeline of Hugging Face for text classification\n",
    "3. Using Pipeline with a specific model and tokenizer of Hugging Face\n",
    "4. Experimenting with models from Hugging Face\n",
    "5. Training a LLM for sentence classification using the **Trainer** class\n",
    "6. Fine tuning a LLM model with a custom head\n",
    "7. Sharing a model on Hugging Face platform\n",
    "8. Further experiments\n",
    "\n",
    "Before going further into experiments, you work is to understand the provided code, that gives an overview of using LLM with Hugging Face.\n",
    "\n",
    "**This code is intentionally not commented. It is your objective to add all the necessary comments to ensure your proper understanding of the code.**\n",
    "\n",
    "You might frequently rely on [Hugging Face’s documentation](https://huggingface.co/docs).\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "As the computation can be heavy, particularly during training, we encourage you to use a GPU. If your laptob is not equiped, you may use one of these remote jupyter servers, where you can select the execution on GPU :\n",
    "\n",
    "1) [jupyter.mi90.ec-lyon.fr](https://jupyter.mi90.ec-lyon.fr/)\n",
    "\n",
    "This server is accessible within the campus network. If outside, you need to use a VPN. Before executing the notebook, select the kernel \"Python PyTorch\" to run it on GPU and have access to PyTorch module.\n",
    "\n",
    "2) [Google Colaboratory](https://colab.research.google.com/)\n",
    "\n",
    "Before executing the notebook, select the execution on GPU : \"Runtime\" -> \"Change runtime type\" --> \"T4 GPU\". "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installing required librairies "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install huggingface_hub\n",
    "!pip install ipywidgets\n",
    "!pip install transformers\n",
    "!pip install datasets\n",
    "!pip install accelerate\n",
    "!pip install scikit-learn\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Log in to Hugging Face\n",
    "\n",
    "First, you need to create an account on [Hugging Face platform](https://huggingface.co/join).\n",
    "\n",
    "Then you can log in to your account directly from the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login\n",
    "notebook_login()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 1 - Using a Hugging Face text generation model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "# model_name = \"mistralai/Mistral-7B\"\n",
    "# model_name = \"deepseek-ai/DeepSeek-R1\"\n",
    "# model_name = \"meta-llama/Llama-3.2-3B-Instruct\"\n",
    "# model_name = \"homebrewltd/AlphaMaze-v0.2-1.5B\"\n",
    "model_name = \"gpt2\"\n",
    "\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
    "model = AutoModelForCausalLM.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_text = \"Hello. Who are you ?\"\n",
    "encoded_input = tokenizer(input_text, return_tensors=\"pt\")\n",
    "\n",
    "output = model.generate(\n",
    "    input_ids=encoded_input[\"input_ids\"],\n",
    "    attention_mask=encoded_input[\"attention_mask\"],\n",
    "    max_length=100,\n",
    "    temperature=0.8,\n",
    "    pad_token_id=tokenizer.pad_token_id\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "generated_text = tokenizer.decode(output[0], skip_special_tokens=True)\n",
    "print(generated_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 2 - Using Pipeline of Hugging Face for text classification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "classifier = pipeline(\"text-classification\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classifier(\"We are very happy to welcome you at Centrale Lyon.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = classifier([\"We are very happy to welcome you at Centrale Lyon.\", \"We hope you don't hate it.\"])\n",
    "for result in results:\n",
    "    print(f\"label: {result['label']}, with score: {round(result['score'], 4)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 3 - Using Pipeline with a specific model and tokenizer of Hugging Face"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = \"nlptown/bert-base-multilingual-uncased-sentiment\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
    "\n",
    "model = AutoModelForSequenceClassification.from_pretrained(model_name)\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)\n",
    "classifier(\"We are very hapy to present you this incredible  model.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 4 - Experimenting with models from Hugging Face"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "model_name = \"nlptown/bert-base-multilingual-uncased-sentiment\"\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "encoding = tokenizer(\"We are very happy to welcome you at Centrale Lyon.\")\n",
    "print(encoding)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batch = tokenizer(\n",
    "    [\"We are very happy to welcome you at Centrale Lyon.\", \"We hope you don't hate it.\"],\n",
    "    padding=True,\n",
    "    truncation=True,\n",
    "    max_length=512,\n",
    "    return_tensors=\"pt\",\n",
    ")\n",
    "print(batch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForSequenceClassification\n",
    "\n",
    "model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=\"auto\")\n",
    "print(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "outputs = model(**batch)\n",
    "print(outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch import nn\n",
    "\n",
    "predictions = nn.functional.softmax(outputs.logits, dim=-1)\n",
    "print(predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_directory = \"./save_pretrained\"\n",
    "tokenizer.save_pretrained(save_directory)\n",
    "model.save_pretrained(save_directory)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loaded_model = AutoModelForSequenceClassification.from_pretrained(\"./save_pretrained\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 5 - Training a LLM for sentence classification using the **Trainer** class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForSequenceClassification\n",
    "\n",
    "model_name = \"distilbert/distilbert-base-uncased\"\n",
    "\n",
    "model = AutoModelForSequenceClassification.from_pretrained(model_name, torch_dtype=\"auto\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import TrainingArguments\n",
    "\n",
    "training_args = TrainingArguments(\n",
    "    output_dir=\"save_folder/\",\n",
    "    learning_rate=2e-5,\n",
    "    per_device_train_batch_size=8,\n",
    "    per_device_eval_batch_size=8,\n",
    "    num_train_epochs=2,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"rotten_tomatoes\") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize_dataset(dataset):\n",
    "    return tokenizer(dataset[\"text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = dataset.map(tokenize_dataset, batched=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import DataCollatorWithPadding\n",
    "\n",
    "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import Trainer\n",
    "\n",
    "trainer = Trainer(\n",
    "    model=model,\n",
    "    args=training_args,\n",
    "    train_dataset=dataset[\"train\"],\n",
    "    eval_dataset=dataset[\"test\"],\n",
    "    processing_class=tokenizer,\n",
    "    data_collator=data_collator,\n",
    ") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainer.train()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_directory = \"./tomatoes_save_pretrained\"\n",
    "tokenizer.save_pretrained(save_directory)\n",
    "model.save_pretrained(save_directory)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = AutoModelForSequenceClassification.from_pretrained(save_directory)\n",
    "tokenizer = AutoTokenizer.from_pretrained(save_directory)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "classifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = dataset['test'][345]\n",
    "print(t)\n",
    "classifier(t['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 6 - Fine tuning a LLM model with a custom head"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from transformers import DistilBertTokenizer, DistilBertModel\n",
    "import torch\n",
    "from torch.utils.data import DataLoader\n",
    "from torch.optim import AdamW\n",
    "from sklearn.metrics import accuracy_score, precision_recall_fscore_support\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = load_dataset(\"imdb\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize_function(examples):\n",
    "    return tokenizer(examples[\"text\"], padding=\"max_length\", truncation=True, max_length=512)\n",
    "\n",
    "tokenized_datasets = dataset.map(tokenize_function, batched=True)\n",
    "\n",
    "\n",
    "tokenized_datasets = tokenized_datasets.remove_columns([\"text\"])\n",
    "tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")\n",
    "tokenized_datasets.set_format(\"torch\")\n",
    "\n",
    "train_dataset = tokenized_datasets[\"train\"]\n",
    "test_dataset = tokenized_datasets[\"test\"]\n",
    "\n",
    "train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)\n",
    "test_loader = DataLoader(test_dataset, batch_size=8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n",
    "\n",
    "for param in bert_model.parameters():\n",
    "    param.requires_grad = False\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class CustomBERTModel(torch.nn.Module):\n",
    "    def __init__(self, bert_model):\n",
    "        super(CustomBERTModel, self).__init__()\n",
    "        self.bert = bert_model\n",
    "        self.custom_head = torch.nn.Sequential(\n",
    "            torch.nn.Linear(self.bert.config.hidden_size, 128),\n",
    "            torch.nn.ReLU(),\n",
    "            torch.nn.Dropout(0.1),\n",
    "            torch.nn.Linear(128, 2) \n",
    "        )\n",
    "\n",
    "    def forward(self, input_ids, attention_mask):\n",
    "        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)\n",
    "        outputs = self.custom_head(outputs.last_hidden_state[:, 0, :])  # Use [CLS] token output\n",
    "        return outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n",
    "\n",
    "for param in bert_model.parameters():\n",
    "    param.requires_grad = False\n",
    "\n",
    "model = CustomBERTModel(bert_model)\n",
    "\n",
    "# device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "device = torch.device(\"mps\")\n",
    "\n",
    "model.to(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = AdamW(model.parameters(), lr=2e-5)\n",
    "criterion = torch.nn.CrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train_epoch(model, data_loader, optimizer, criterion, device):\n",
    "    model.train()\n",
    "    total_loss = 0\n",
    "    for batch in data_loader:\n",
    "        optimizer.zero_grad()\n",
    "        input_ids = batch[\"input_ids\"].to(device)\n",
    "        attention_mask = batch[\"attention_mask\"].to(device)\n",
    "        labels = batch[\"labels\"].to(device)\n",
    "\n",
    "        outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
    "        loss = criterion(outputs, labels)\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        total_loss += loss.item()\n",
    "    return total_loss / len(data_loader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(model, data_loader, criterion, device):\n",
    "    model.eval()\n",
    "    total_loss = 0\n",
    "    all_predictions = []\n",
    "    all_labels = []\n",
    "    \n",
    "    with torch.no_grad():\n",
    "        for batch in data_loader:\n",
    "            input_ids = batch[\"input_ids\"].to(device)\n",
    "            attention_mask = batch[\"attention_mask\"].to(device)\n",
    "            labels = batch[\"labels\"].to(device)\n",
    "\n",
    "            outputs = model(input_ids=input_ids, attention_mask=attention_mask)\n",
    "            loss = criterion(outputs, labels)\n",
    "            total_loss += loss.item()\n",
    "\n",
    "            predictions = torch.argmax(outputs, dim=-1)\n",
    "            all_predictions.extend(predictions.cpu().numpy())\n",
    "            all_labels.extend(labels.cpu().numpy())\n",
    "    \n",
    "    accuracy = accuracy_score(all_labels, all_predictions)\n",
    "    precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_predictions, average=\"binary\")\n",
    "    return total_loss / len(data_loader), accuracy, precision, recall, f1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs = 3\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    print(f\"Epoch {epoch + 1}/{num_epochs}\")\n",
    "    \n",
    "    train_loss = train_epoch(model, train_loader, optimizer, criterion, device)\n",
    "    print(f\"Train Loss: {train_loss:.4f}\")\n",
    "    \n",
    "    val_loss, val_accuracy, val_precision, val_recall, val_f1 = evaluate(model, test_loader, criterion, device)\n",
    "    print(f\"Validation Loss: {val_loss:.4f}\")\n",
    "    print(f\"Accuracy: {val_accuracy:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}\")\n",
    "\n",
    "    torch.save(model.state_dict(), f\"custom_bert_epoch_{epoch + 1}.pth\")\n",
    "\n",
    "\n",
    "# (After 76 minutes of training)\n",
    "# Epoch 1/3\n",
    "# Train Loss: 0.6708\n",
    "# Validation Loss: 0.6415\n",
    "# Accuracy: 0.7917, Precision: 0.8218, Recall: 0.7450, F1 Score: 0.7815\n",
    "# Epoch 2/3\n",
    "# Train Loss: 0.6172\n",
    "# Validation Loss: 0.5825\n",
    "# Accuracy: 0.8051, Precision: 0.8142, Recall: 0.7907, F1 Score: 0.8023\n",
    "# Epoch 3/3\n",
    "# Train Loss: 0.5634\n",
    "# Validation Loss: 0.5300\n",
    "# Accuracy: 0.8098, Precision: 0.8339, Recall: 0.7738, F1 Score: 0.8027"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_save_path = \"custom_bert_model.pth\"\n",
    "\n",
    "torch.save(model.state_dict(), model_save_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loadedbert_model = DistilBertModel.from_pretrained(\"distilbert-base-uncased\")\n",
    "\n",
    "loaded_model = CustomBERTModel(loadedbert_model)\n",
    "\n",
    "loaded_model.load_state_dict(torch.load(model_save_path))\n",
    "\n",
    "loaded_model.to(device)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batch = next(iter(test_loader))\n",
    "\n",
    "ids = batch['input_ids'][0]\n",
    "attention_mask = batch['attention_mask'][0]\n",
    "label = batch['labels'][0]\n",
    "\n",
    "ids = ids.to(device)\n",
    "attention_mask = attention_mask.to(device)\n",
    "\n",
    "text = tokenizer.decode(ids, skip_special_tokens=True)\n",
    "print(text)\n",
    "print(label)\n",
    "\n",
    "loaded_model.eval()\n",
    "output = model(input_ids=ids.unsqueeze(0), attention_mask=attention_mask.unsqueeze(0))\n",
    "\n",
    "output = output.squeeze(0)\n",
    "print(output)\n",
    "prediction = torch.argmax(output, dim=-1)\n",
    "print(prediction)\n",
    "print(label)\n",
    "print(prediction == label)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 7 - Sharing a model on Hugging Face platform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import DistilBertPreTrainedModel, DistilBertModel\n",
    "import torch.nn as nn\n",
    "\n",
    "class CustomDistilBERTModel(DistilBertPreTrainedModel):\n",
    "    def __init__(self, config, freeze_backbone=True):\n",
    "        super().__init__(config)\n",
    "        self.distilbert = DistilBertModel(config)\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Linear(config.hidden_size, 128),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(0.1),\n",
    "            nn.Linear(128, config.num_labels)  \n",
    "        )\n",
    "\n",
    "        if freeze_backbone:\n",
    "            for param in self.distilbert.parameters():\n",
    "                param.requires_grad = False\n",
    "\n",
    "    def forward(self, input_ids, attention_mask=None, labels=None):\n",
    "        outputs = self.distilbert(input_ids=input_ids, attention_mask=attention_mask)\n",
    "        logits = self.classifier(outputs.last_hidden_state[:, 0, :])  # Use [CLS] token output\n",
    "        return logits\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoConfig\n",
    "AutoConfig.register(\"custom-distilbert\", AutoConfig)\n",
    "AutoModel.register(CustomDistilBERTModel, \"custom-distilbert\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import DistilBertTokenizer\n",
    "\n",
    "config = AutoConfig.from_pretrained(\"distilbert-base-uncased\", num_labels=2)\n",
    "config.architectures = [\"CustomDistilBERTModel\"]\n",
    "\n",
    "model = CustomDistilBERTModel(config)\n",
    "tokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n",
    "\n",
    "model.save_pretrained(\"custom_distilbert_model\")\n",
    "tokenizer.save_pretrained(\"custom_distilbert_model\")\n",
    "\n",
    "print(\"Custom model and tokenizer saved locally!\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "device = \"mps\"\n",
    "model = model.to(device)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_epochs = 3\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    print(f\"Epoch {epoch + 1}/{num_epochs}\")\n",
    "    \n",
    "    train_loss = train_epoch(model, train_loader, optimizer, criterion, device)\n",
    "    print(f\"Train Loss: {train_loss:.4f}\")\n",
    "    \n",
    "    val_loss, val_accuracy, val_precision, val_recall, val_f1 = evaluate(model, test_loader, criterion, device)\n",
    "    print(f\"Validation Loss: {val_loss:.4f}\")\n",
    "    print(f\"Accuracy: {val_accuracy:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1 Score: {val_f1:.4f}\")\n",
    "\n",
    "    torch.save(model.state_dict(), f\"custom_bert_epoch_{epoch + 1}.pth\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.push_to_hub(\"custom-distilbert-model\")\n",
    "tokenizer.push_to_hub(\"custom-distilbert-model\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModel\n",
    "loaded_tokenizer = AutoTokenizer.from_pretrained(\"your_hf_id/custom-distilbert-model\")\n",
    "loaded_model = AutoModel.from_pretrained(\"your_hf_id/custom-distilbert-model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part 8 - Further experiments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you know the basics for manipulating LLM through Hugging Face platform, it is time to experiment with:\n",
    "- different [NLP tasks](https://huggingface.co/tasks)\n",
    "- different [models](https://huggingface.co/models?pipeline_tag=text-classification&sort=trending)\n",
    "- different [datasets](https://huggingface.co/datasets?task_categories=task_categories:text-classification&sort=trending)\n",
    "\n",
    "... and to share your finetuned models on the platform.\n",
    "\n",
    "Besides, don't forget to monitor your trainings through [Weights & Biases](https://wandb.ai/home)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "td_llm",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}