From Riddles to
Revelations- Unveil
Captivating Tech
Stories and Unlock
Business Prosperity!

Exposing all the expert insights to up-level your organization's brilliance.

Blogs

IceApple Tech Talks

Web Audio API — Part 1: Playing, Filtering, and Visualizing Sound

Senthil Kumaran Chinnathambi

September 29, 2025

5 min read

Web Audio API — Part 1: Building a Real-Time Audio Visualizer with Web Audio API and CanvasCreating beautiful frequency-based visualizations that dance to your music🎵 What We’re BuildingEver wondered how music streaming apps create those mesmerizing visual effects that pulse and dance with your favorite songs? Today, we’ll build a real-time audio visualizer using the Web Audio API and HTML5 Canvas that breaks down audio into frequency bands and visualizes their energy levels with colorful, animated bars.🔗 Live Demo | Source Code🚀 The Magic Behind the MusicOur visualizer uses a sophisticated approach called bandpass filtering to isolate specific frequency ranges from audio and visualize their energy levels. Here’s what makes it special:8 Frequency Bands: From deep bass (55Hz) to crisp highs (4000Hz)Real-time Processing: 60fps smooth animationsColor-coded Visualization: Each frequency gets its unique colorModular Architecture: Clean, reusable ES6 modulesMobile-friendly: Responsive design that works everywhere🎛️ The Technical ArchitectureOur visualizer follows a clean three-layer architecture:1. Audio Processing Layer (audio.js)Handles Web Audio API setup, frequency filtering, and audio analysis2. Visualization Layer (visualizer.js)Manages Canvas rendering and animation loops3. UI Controller (main.js)Coordinates user interactions and module communication🔧 Building the Audio Processing EngineThe heart of our visualizer lies in creating separate analysis chains for each frequency band:const frequencyBands = [ { frequency: 55, color: "#D5B3E5" }, // Sub-bass { frequency: 110, color: "#7F3CAC" }, // Bass { frequency: 220, color: "#22A722" }, // Low-mid { frequency: 440, color: "#F1892A" }, // Mid { frequency: 570, color: "#E84420" }, // High-mid { frequency: 960, color: "#F4CD00" }, // Presence { frequency: 2000, color: "#3E58E2" }, // Brilliance { frequency: 4000, color: "#F391C7" }, // Air];For each frequency band, we create a BiquadFilter and AnalyserNode chain:signals = frequencyBands.map(({ frequency, color }) => { const analyser = audioContext.createAnalyser(); analyser.fftSize = 2048; analyser.smoothingTimeConstant = 0.8;const filter = audioContext.createBiquadFilter(); filter.type = 'bandpass'; filter.frequency.value = frequency; filter.Q.value = 1.2; // Controls filter bandwidth // Create the audio processing chain sourceNode.connect(filter); filter.connect(analyser); return { analyser, filter, color, frequency };});BiquadFilter: Isolates specific frequency ranges using bandpass filteringAnalyserNode: Provides real-time audio analysis dataQ Factor (1.2): Balances frequency isolation vs. natural sound🎨 Real-Time Visualization MagicThe visualization engine transforms audio data into smooth, responsive graphics:function drawFrame() { signals.forEach((signal, i) => { // Get time-domain data for energy calculation signal.analyser.getFloatTimeDomainData(signal.data);// Calculate RMS energy for this frequency band const energy = rootMeanSquared(signal.data); // Scale energy to visual height const scaleFactor = 12; let barHeight = Math.min( cssH * 0.9, Math.max(2, energy * Math.min(cssW, cssH) * scaleFactor) ); // Draw the frequency bar ctx.fillStyle = signal.color; ctx.fillRect(x - w/2, cssH/2 - barHeight/2, w, barHeight); }); requestAnimationFrame(drawFrame); // Smooth 60fps animation}We use getFloatTimeDomainData() instead of getFrequencyData() because:Better Responsiveness: Time-domain gives instant energy readingsMusical Feel: RMS energy correlates better with perceived loudnessSmooth Animation: Less choppy than frequency bin analysis⚡ Performance Optimizations1. Efficient Canvas Renderingexport function resizeCanvasToDisplaySize(canvas) { const dpr = window.devicePixelRatio || 1; canvas.width = Math.floor(cssW * dpr); canvas.height = Math.floor(cssH * dpr); ctx.setTransform(dpr, 0, 0, dpr, 0, 0); // Handle retina displays}2. Smart Audio Context Managementexport async function ensureAudioContextResumed() { if (audioContext.state === 'suspended') { await audioContext.resume(); // Required for user gesture compliance }}3. Memory-Efficient Data Handling// Reuse Float32Array buffers instead of creating new onesconst data = new Float32Array(analyser.fftSize);🛠️ Key Implementation Challenges & SolutionsChallenge 1: Browser Audio PolicyProblem: Modern browsers require user gestures to start audio contexts. Solution: Lazy initialization after user interaction.Challenge 2: Frequency Band SelectionProblem: Choosing meaningful frequency ranges for music. Solution: Used musical note frequencies and psychoacoustic principles.Challenge 3: Responsive DesignProblem: Canvas visualizations breaking on different screen sizes. Solution: Dynamic layout calculations and device pixel ratio handling.📱 Making It Production-ReadyES6 Modules for Clean Architecture// main.js - Clean module importsimport { initAudio, togglePlayback, getSignals } from './audio.js';import { startVisualizer, stopVisualizer } from './visualizer.js';import { resizeCanvasToDisplaySize } from './utils.js';Error Handling & User Experiencetry { await ensureAudioContextResumed(); const playing = await togglePlayback(); updatePlayButton(playing);} catch (err) { statusEl.textContent = 'Playback error: ' + err.message;}Cleanup & Memory Managementwindow.addEventListener('beforeunload', () => { try { teardown(); } catch (e) {}});export function teardown() { if (audioContext) audioContext.close(); if (audioElement) audioElement.src = ''; // Clean up all references}The project is designed for zero-config deployment:GitHub Pages Ready: Just push and enable PagesNo Build Process: Pure ES6 modules work in modern browsersStatic Files Only: No server-side dependencies# Deploy to GitHub Pagesgit add .git commit -m "Add audio visualizer"git push origin main# Enable Pages in GitHub repo settings🎯 What’s Next?This visualizer opens doors to exciting possibilities:🎪 Different Visualization Modes: Circular, waveform, particle systems🎵 Dynamic Frequency Ranges: Auto-adjust based on music genre🎨 Customizable Themes: User-selectable color palettes📊 Audio Analysis Features: BPM detection, key signature analysis🎬 Recording Capabilities: Export visualizations as video🏁 Wrapping UpBuilding this audio visualizer taught us the power of combining Web Audio API’s sophisticated audio processing with Canvas’s flexible rendering capabilities. The key insights:Modular Architecture: Separating concerns makes code maintainable and testablePerformance Matters: Smart optimizations enable smooth 60fps animationsUser Experience First: Proper error handling and responsive design create polished applicationsWeb Standards Rock: Modern browser APIs provide professional-grade capabilitiesThe complete project weighs in at just ~200 lines of code yet delivers a feature-rich, performant audio visualizer that rivals native applications.🎉 Try It YourselfReady to build your own music visualizer?🔗 Fork the Repository🎵 Replace piano.mp3 with your favorite track🚀 Deploy to GitHub Pages in 30 seconds🎨 Customize colors and frequencies to match your styleFound this helpful? Give it a ⭐ on GitHub and share your creations!Want more web audio adventures? Follow me for tutorials on advanced audio processing, WebGL visualizations, and cutting-edge web technologies.Tags: #WebAudio #JavaScript #Canvas #Visualization #MusicTech #WebDev👉 In Part 2, we’ll explore frequency-domain data (getFloatFrequencyData) and build a full spectrum analyzer.Web Audio API — Part 1: Playing, Filtering, and Visualizing Sound was originally published in IceApple Tech Talks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Animating Objects on a SVG Layout Using React and D3

Nagi

September 28, 2025

3 min read

Tracking and animating vehicles like buses, trains, or trams on a station map can be achieved using React + D3. In this tutorial, we demonstrate a generic approach using a bus example.We’ll create a hardcoded, self-contained project that shows buses moving along predefined tracks.Moving busesProject Structurebus-station-d3/├─ package.json├─ tsconfig.json├─ public/│ └─ index.html├─ src/│ ├─ App.tsx│ ├─ index.tsx│ ├─ hooks/│ │ └─ useBusMovement.ts│ └─ components/│ └─ BusStation.tsx├─ README.md2. Installationa. Create a React project:npx create-react-app bus-station-d3 --template typescriptcd bus-station-d3npm install d3 styled-componentsb. Copy the files into the src folder according to the structure above.3. Generic Vehicle Movement Hook (useBusMovement.ts)import { useEffect } from "react";import * as d3 from "d3";// Generic vehicles and station layoutconst VEHICLES = [ { id: 1, routeId: 1, position: 0 }, { id: 2, routeId: 1, position: 0.3 }, { id: 3, routeId: 2, position: 0.5 },];const STATION_LAYOUT = { width: 800, height: 400, tracks: [ { id: 1, start: { x: 50, y: 100 }, end: { x: 750, y: 100 } }, { id: 2, start: { x: 50, y: 300 }, end: { x: 750, y: 300 } }, ],};const COLORS = ["#FF5733", "#33C1FF", "#FF33F6"];export function useBusMovement({ containerRef }: { containerRef: any }) { useEffect(() => { if (!containerRef.current) return; const svg = d3 .select(containerRef.current) .append("svg") .attr("width", STATION_LAYOUT.width) .attr("height", STATION_LAYOUT.height); // Draw tracks STATION_LAYOUT.tracks.forEach((track) => { svg .append("line") .attr("x1", track.start.x) .attr("y1", track.start.y) .attr("x2", track.end.x) .attr("y2", track.end.y) .attr("stroke", "#888") .attr("stroke-width", 4); }); // Draw vehicles const vehicles = svg .selectAll(".vehicle") .data(VEHICLES) .enter() .append("g") .attr("class", (d) => `vehicle vehicle-${d.id}`); vehicles .append("rect") .attr("width", 30) .attr("height", 15) .attr("fill", (d, i) => COLORS[i % COLORS.length]) .attr("rx", 3) .attr("ry", 3); vehicles .append("text") .text((d) => `Bus-${d.id}`) .attr("x", 15) .attr("y", -5) .attr("text-anchor", "middle") .attr("font-size", 10) .attr("fill", "#000"); // Animate vehicles function animate() { VEHICLES.forEach((v) => { const track = STATION_LAYOUT.tracks.find((t) => t.id === v.routeId); if (!track) return; v.position += 0.002; if (v.position > 1) v.position = 0; const x = track.start.x + v.position * (track.end.x - track.start.x); const y = track.start.y + v.position * (track.end.y - track.start.y); svg.select(`.vehicle-${v.id}`).attr("transform", `translate(${x - 15},${y - 7.5})`); }); requestAnimationFrame(animate); } animate(); return () => svg.remove(); }, [containerRef]);}4. BusStation Component (BusStation.tsx)import { useRef } from "react";import { useBusMovement } from "../hooks/useBusMovement";import styled from "styled-components";const StationContainer = styled.div` width: 800px; height: 400px; border: 1px solid #ccc;`;export default function BusStation() { const containerRef = useRef<HTMLDivElement>(null); useBusMovement({ containerRef }); return <StationContainer ref={containerRef}></StationContainer>;}5. App Entry (App.tsx)import BusStation from "./components/BusStation";function App() { return ( <div className="App"> <h2>Bus Station Vehicle Animation</h2> <BusStation /> </div> );}export default App;6. Sample Images / DiagramTrack 1: ---------------------> (Bus 1 & 2)Track 2: ---------------------> (Bus 3)Rectangle = BusLine = TrackBuses move along the tracks in real-time.7. Run the Projectnpm installnpm start8. Key FeaturesGeneric VEHICLES object can represent buses, trams, or trains.All tracks and positions are hardcoded for simplicity.Animation runs with requestAnimationFrame for smooth movement.Easy to extend: Add stops, tooltips, or dynamic routes.Animating Objects on a SVG Layout Using React and D3 was originally published in IceApple Tech Talks on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to start with terraform??

Arulkumar Palanisamy

August 26, 2024

3 min read

Prerequisite:Basic understanding of cloud concepts and hands on with any one of the cloud service like AWS, Azure, Google cloud.Visual studio codeWhat is Terraform?Terraform is an open-source tool created by HashiCorp. Terraform simplifies the cloud resource creation(Infrastructure as code) using the API’s given by the cloud provider. Terraform supports multiple cloud providers.Where to Start?Step 1: First install terraform tool from the hashicorp website (https://developer.hashicorp.com/terraform/install).Step 2: Create a directory and open it using Visual Studio Code.Step 3: Set up an AWS CLI profile that grants permission to provision resources. Initially, configure it with access keys and secrets for an administrator role.Steps 4: Create a basic and mandatory configuration file provider.tf to define which providers (like AWS, Azure, Google Cloud, etc.) your Terraform configuration will interact with. Basically all the terraform configuration files are created with .tf extension.provider sampleIn the provided configuration, the required_version property defines the minimum version of Terraform required, while required_providers specifies the specific cloud provider being utilized. Within the provider block, settings for the AWS CLI profile and region are specified, which dictate the configuration used for resource creation. Specify the profile name which we created in the previous step.Steps 5: Create a main.tf file and add a configuration to create an AWS S3 bucket. Terraform’s configuration language is declarative, we can refer the resource configuration syntax from Hashicorp documentation (https://registry.terraform.io/providers/hashicorp/aws/latest/docs ).Provided s3 resource configuration sample. resource blocks used to define the components.main.tfHow to run?The basic commands used to execute the terraform configurtions are terraform init, terraform validate, terraform fmt, terraform apply, terraform destroy.terraform init: When you create a new configuration or check out an existing configuration from version control you need to initialize the directory. Terraform downloads and installs the providers defined in the configuration in a subdirectory of your current working directory, named .terraform. In our case aws provider installed. Terraform also creates a lock file named .terraform.lock.hcl which specifies the exact provider versions used.terraform validate: It validates the configuration is syntactically valid and internally consistent.terraform fmt: Automatically format and update the configurations in the current directory for readability and consistency.terraform apply: Apply the configuration and creates the infrastructure. Before it applies any changes, Terraform prints out the execution plan which describes the actions Terraform will take in order to change your infrastructure to match the configuration. Execution plan can be printed using terraform plan command as well. Terraform wrote data into a file called terraform.tfstate. Terraform stores the IDs and properties of the resources it manages, so that it can update or destroy those resources going forward. State file is the only way Terraform can track which resources it manages, and often contains sensitive information, so you must store your state file securely.terraform destroy: This command terminates resources managed by your Terraform project. It terminates all the resources specified in your Terraform state. It does not destroy resources running elsewhere that are not managed by the current Terraform project.The commands described above can be executed sequentially to create an S3 bucket using the specified profile. Below are some example responses from these commands.Response sample for referenceIn the upcoming blog post, I’ll cover the configuration of modules, input and output variables, module organization, and the management of different environments etc.How to start with terraform?? was originally published in IceApple Tech Talks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Tuning LLM: Exploring the Nuances of Prompt Tuning and Fine Tuning

Karan Srinivasan

July 30, 2024

14 min read

Large Language Models (LLMs) have revolutionized the field of artificial intelligence, showing remarkable capabilities in understanding and generating human-like text. The ability of these models to perform a wide array of tasks, from casual conversations to complex problem-solving, is largely attributed to their sophisticated training methodologies. However, the performance of LLMs can often be enhanced through various tuning strategies.In the previous blog, we gained a basic understanding of prompt engineering and prompt tuning.What is Prompt? What is Prompt Engineering & Prompt Tuning?In this Blog, we delve into the nuances of prompting and fine-tuning, shedding light on how each method contributes to optimizing LLMs.Prompt Tuning: Influencing Outputs with InputsUnlike Prompt Engineering adding task-specific instructions along with the input, Here we introduce Soft Prompts, Soft prompts are task-specific instruction prompts, which are embedded in the input embedding layer of the model to guide towards desired behaviors and require minimal training at the input embedding layer by freezing the original weights of the model. Soft prompts are learned continuous embeddings, which are not necessarily interpretable as natural language but are used as input to the model. In simple, the soft prompts play a pivot role in how the input is presented to the model, This soft prompt embedding layer is learnable, In each iteration of training the soft prompt gets optimized over time.Strategies for Crafting Effective PromptsClarity and Conciseness: Clear and straightforward prompts are typically more effective than vague or overly complex ones. Aim for simplicity while keeping essential details.Contextual Framing: Providing context can improve the model’s ability to understand the desired output. Including examples or specific details can clarify the task.Iterative Refinement: Experiment with various prompts to determine which produces the best results. Tuning prompts involves iterative trial and error.Use of Instructional Language: Directly phrasing prompts as instructions, like “List the benefits of…”, can more effectively guide the model’s output.Benefits of Prompt TuningRapid Adaptation: Prompt tuning allows for quick adjustments to the model’s behavior without requiring extensive retraining, making it ideal for dynamic and real-time applications.Cost-Effective: Since it doesn’t involve large-scale data processing or lengthy training sessions, prompt tuning is a more cost-effective way to fine-tune models for specific tasks.Versatility: Prompt tuning can be applied across various domains and tasks, offering a flexible approach to customizing model outputs for different applications and contexts.Reduced Computational Resources: Compared to full model retraining, prompt tuning requires fewer computational resources, making it feasible for use on less powerful hardware.Let’s now dive into the coding, we can look into a sample Python code and see where this prompt tuning takes place for code generation tasks using soft prompts.import csvimport nltkimport randomfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenizeimport stringimport torchfrom tqdm import tqdmfrom torch.nn.utils.rnn import pad_sequenceimport torch.nn as nnfrom transformers import GPT2LMHeadModel, GPT2Tokenizer,GPT2Model, GPT2Config, AdamWfrom sklearn.model_selection import train_test_splitfirst, we import the required modulesdef preprocess_text(input_text): """ Processes the given text by converting it to lowercase, removing punctuation and digits, tokenizing the words, and filtering out stop words. Args: input_text (str): Text that needs to be processed. Returns: str: The processed text. """ # Convert text to lowercase input_text = input_text.lower() # Remove punctuation and digits input_text = input_text.translate(str.maketrans('', '', string.punctuation + string.digits)) # Tokenize the text into words words = word_tokenize(input_text) # Remove stop words filtered_words = [word for word in words if word not in stopwords.words('english')] # Join the words back into a single string processed_text = ' '.join(filtered_words) return processed_textwe gonna use a preprocessing function that would transform our text without stopping words and tokenizing.def tokenize_and_pad_sequences(data_pairs, max_len_article, max_len_highlights): """ Tokenizes the input text data and pads the sequences to the given maximum lengths. Args: data_pairs (list): A list of tuples with each tuple containing an article and its corresponding highlights. max_len_article (int): The maximum length for article sequences. max_len_highlights (int): The maximum length for highlight sequences. Returns: list: A list of tuples containing tokenized and padded sequences for articles and highlights. """ processed_data = [] for article, highlights in data_pairs: # Tokenize and convert text to indices article_indices = tokenizer.encode(article, add_special_tokens=True) highlights_indices = tokenizer.encode(highlights, add_special_tokens=True) # Padding the sequences to the specified maximum lengths padded_article = torch.tensor(article_indices + [tokenizer.pad_token_id] * (max_len_article - len(article_indices))) padded_highlights = torch.tensor(highlights_indices + [tokenizer.pad_token_id] * (max_len_highlights - len(highlights_indices))) # Ensure both tokenized sequences are not empty if len(article_indices) > 0 and len(highlights_indices) > 0: processed_data.append((padded_article, padded_highlights)) return processed_dataAfter preprocessing, now we tokenize the input data, convert the tokens to indices using the GPT-2 tokenizer, and pad sequences to specified lengths.def load_data_from_csv(file_path): """ Reads a CSV file and extracts columns for articles and highlights. Args: file_path (str): Path to the CSV file. Returns: list: A list of tuples containing articles and highlights. """ data = [] with open(file_path, 'r', encoding='utf-8') as f: reader = csv.DictReader(f) for row in reader: article = row.get('article', '') highlights = row.get('highlights', '') data.append((article, highlights)) return data# Load test datatest_file_path = 'test.csv'test_data = load_data_from_csv(test_file_path)# Load training datatrain_file_path = 'train.csv'train_data = load_data_from_csv(train_file_path)# Load validation datavalidation_file_path = 'validation.csv'validation_data = load_data_from_csv(validation_file_path)Here comes the dataset loading part, we try to read data from CSV files, extracting relevant columns (‘article’ and ‘highlights’) and storing them as tuples in separate lists for training, testing, and validation data.def sample_one_percent(data_list, seed): """ Randomly samples 1% of the data from the provided list for reproducibility. Args: data_list (list): The original data list. seed (int): Seed value for reproducibility. Returns: list: A list containing 1% of the sampled data. """ random.seed(seed) sample_size = int(0.01 * len(data_list)) return random.sample(data_list, sample_size)# Sample 1% of the test dataone_percent_test_data = sample_one_percent(test_data, seed=14)# Sample 1% of the training dataone_percent_train_data = sample_one_percent(train_data, seed=14)# Sample 1% of the validation dataone_percent_val_data = sample_one_percent(validation_data, seed=14)We randomly select 1% of the data from the training, testing, and validation datasets for effective processing during model building.# Apply preprocessing to the sampled dataprocessed_train_data = [(preprocess_text(article), preprocess_text(highlights)) for article, highlights in one_percent_train_data]processed_test_data = [(preprocess_text(article), preprocess_text(highlights)) for article, highlights in one_percent_test_data]processed_val_data = [(preprocess_text(article), preprocess_text(highlights)) for article, highlights in one_percent_val_data]This block processes the sampled data for the training, testing, and validation sets by applying text preprocessing (lowercasing, punctuation removal, tokenization, and stop word removal).# Initialize GPT-2 tokenizertokenizer = GPT2Tokenizer.from_pretrained("gpt2")# Set the padding token to the end-of-sequence token and add it to the tokenizerpad_token = tokenizer.eos_tokentokenizer.add_special_tokens({'pad_token': pad_token})# Define maximum lengths for articles and highlightsmax_article_len = 1021max_highlights_len = 1024# Apply tokenization and padding to the preprocessed datasetstokenized_train_data = tokenize_and_pad_sequences(processed_train_data, max_len_article=max_article_len, max_len_highlights=max_highlights_len)tokenized_test_data = tokenize_and_pad_sequences(processed_test_data, max_len_article=max_article_len, max_len_highlights=max_highlights_len)tokenized_val_data = tokenize_and_pad_sequences(processed_val_data, max_len_article=max_article_len, max_len_highlights=max_highlights_len)The GPT-2 tokenizer is loaded in this block, which also adds a unique token (“<pad>’’) to the tokenizer that will be used for padding sequences. Tokenized and padded datasets for training, testing, and validation are produced by this block by tokenizing and padding the preprocessed data to the designated maximum lengths for articles and highlights.# Initialize lists to store input and target IDs for training datainput_ids_train = []target_ids_train = []# Maximum lengths for articles and highlightsmax_article_len = 1021max_highlights_len = 1024# Iterate over the tokenized training datafor article_tokens, highlights_tokens in tokenized_train_data: # Truncate article tokens to the maximum article length truncated_article = article_tokens[:max_article_len] # Truncate highlights tokens to the maximum highlights length truncated_highlights = highlights_tokens[:max_highlights_len] # Add the truncated article tokens to the input list input_ids_train.append(truncated_article) # Add the truncated highlights tokens to the target list target_ids_train.append(truncated_highlights)# Convert training lists to PyTorch tensorsinput_ids_train = torch.stack(input_ids_train)target_ids_train = torch.stack(target_ids_train)This section trims the article and highlights tokens that are longer than the allowed limits before converting them into PyTorch tensors to prepare the training data.input_ids_val = []target_ids_val = []# Iterate over the tokenized validation datafor article_tokens, highlights_tokens in tokenized_val_data: # Truncate article tokens to the maximum article length truncated_article = article_tokens[:max_article_len] # Truncate highlights tokens to the maximum highlights length truncated_highlights = highlights_tokens[:max_highlights_len] # Add the truncated article tokens to the input list for validation input_ids_val.append(truncated_article) # Add the truncated highlights tokens to the target list for validation target_ids_val.append(truncated_highlights)# Convert validation lists to PyTorch tensorsinput_ids_val = torch.stack(input_ids_val)target_ids_val = torch.stack(target_ids_val)Similarly, we reduce the tokens for the validation set also and create PyTorch tensors# Load the GPT-2 model and tokenizermodel_name = "gpt2"tokenizer = GPT2Tokenizer.from_pretrained(model_name)gpt2_model = GPT2LMHeadModel.from_pretrained(model_name)# Define the number of prompt tokens and the embedding sizenum_prompt_tokens = 3 # For example, "summarize the following text"embedding_dim = gpt2_model.config.hidden_size# Specify the prompt sentenceprompt_sentence = "summarize"# Tokenize the prompt sentenceprompt_ids = tokenizer.encode(prompt_sentence, return_tensors='pt')# Obtain embeddings for the tokenized prompt sentence from the GPT-2 modelprompt_embeddings = gpt2_model.transformer.wte(prompt_ids)# Initialize an embedding layer for soft prompts with the prompt sentence embeddingssoft_prompt_layer = nn.Embedding(num_prompt_tokens, embedding_dim)soft_prompt_layer.weight.data.copy_(prompt_embeddings.squeeze(0))The GPT-2 model and tokenizer are loaded in this section. Tokenizing a particular sentence, retrieving embeddings from the GPT-2 model, initializing an embedding layer for soft prompts with the sentence embeddings, and defining the number of tokens in prompt and embedding size are all done inside this phase.class GPT2WithPromptTuning(nn.Module): def __init__(self, gpt2_model, soft_prompt_embeddings): super(GPT2WithPromptTuning, self).__init__() self.gpt2_model = gpt2_model self.soft_prompt_embeddings = soft_prompt_embeddings def forward(self, input_ids, soft_prompt_ids): # Obtain the embeddings for the input_ids from the GPT-2 model gpt2_embeddings = self.gpt2_model.transformer.wte(input_ids) # Obtain the embeddings for the soft prompts soft_prompt_embeds = self.soft_prompt_embeddings(soft_prompt_ids) # Concatenate the soft prompt embeddings with the input embeddings embeddings = torch.cat([soft_prompt_embeds, gpt2_embeddings], dim=1) # Pass the concatenated embeddings through the GPT-2 model outputs = self.gpt2_model(inputs_embeds=embeddings) return outputsThe one above concatenates soft prompt embeddings at the start of the input sequence, defining a unique GPT-2 model with prompt adjustments.# Initialize the modelmodel = GPT2WithPromptTuning(gpt2_model, soft_prompt_layer)# Freeze GPT-2 model weightsfor param in model.gpt2_model.parameters(): param.requires_grad = FalseHere we initiate the model with soft_promt_layer and also make the trainable parameters to freeze.# Define hyperparametersbatch_size = 8epochs = 2learning_rate = 2e-3gradient_clip_value = 1.0device = torch.device("cuda" if torch.cuda.is_available() else "cpu")# Move model to GPUmodel.to(device)# Define optimizer and criterionoptimizer = torch.optim.AdamW(model.soft_prompt_embeddings.parameters(), lr=learning_rate)criterion = nn.CrossEntropyLoss(ignore_index=-100)soft_prompt_ids = torch.tensor([0, 1, 2])This section initializes the AdamW optimizer and the cross-entropy loss function for training the model, as well as defines hyperparameters such as batch size, number of epochs, learning rate, and gradient clip value.# Training loopfor epoch in range(epochs): # Create a tqdm progress bar for the training data data_iterator = tqdm(zip(input_ids_train, target_ids_train), desc=f'Epoch {epoch + 1}', total=len(input_ids_train)) for input_ids, target_ids in data_iterator: optimizer.zero_grad() # Move input and target tensors to GPU input_ids, target_ids = input_ids.to(device), target_ids.to(device) outputs = model(input_ids, soft_prompt_ids.to(device)) logits = outputs.logits if hasattr(outputs, "logits") else outputs.last_hidden_state loss = criterion(logits, target_ids) loss.backward() # Gradient clipping to prevent exploding gradients torch.nn.utils.clip_grad_norm_(model.parameters(), gradient_clip_value) optimizer.step() # Update the progress bar description with the current loss data_iterator.set_postfix(loss=loss.item()) # Set the model back to training mode model.train()# Close the tqdm progress bardata_iterator.close()Here is an example of a sequence-to-sequence model’s training loop. Gradient descent is used in iterations over epochs to optimize the model’s parameters. To avoid gradient explosions, it computes the loss function, updates the model, and applies gradient clipping at each epoch.Fine-Tuning: The Model’s Weight Adjustment ApproachFine-tuning a Large Language Model (LLM) means adjusting or updating the pre-trained model parameters to improve performance over a specific task or domain. This process begins with a base model that was already trained on a huge dataset with a wide range of knowledge and linguistic patterns. During fine-tuning, the model is trained on the task-specific dataset, and its weights are adjusted using gradient descent to minimize the loss on this new dataset. This makes the model adapt its knowledge to effectively handle and understand the targeted task, such as summarization, question answering, code generation & debugging, or domain-specific text generation. By utilizing the vast knowledge gained from the pre-training phase, fine-tuning may significantly boost the model’s accuracy and relevance for particular tasks and create a balance between general understanding of language and specialized tasks.The Process of Fine-TuningDataset Preparation: A properly selected dataset that represents the target task is required for fine-tuning. For the model to effectively generalize, the chosen dataset should have a variety of contexts, styles, and approaches.Training Procedure: The model is trained on the new dataset, often for a reduced number of epochs. This training can be supervised (where correct outputs are provided) or unsupervised (learned from the data without explicit feedback).Evaluation and Adjustment: After fine-tuning, the model should be evaluated for performance on a validation set. Metrics such as accuracy, F1 score, or BLEU score can help gauge effectiveness. Based on these evaluations, further adjustments can be made.Benefits of Fine-TuningDomain Adaptation: Fine-tuning allows LLMs to specialize in niche domains, such as medical terminology or legal language, by adapting their understanding to specific jargon and context.Improved Performance: By training on relevant data, fine-tuned models often achieve better performance compared to their general-purpose counterparts.Flexibility: Fine-tuned models can efficiently handle a variety of tasks within their specialized domain, making them versatile for real-world applications.Let’s look at a sample Python code for Fine-Tuning GPT 2 LLM.import torchfrom torch.utils.data import DataLoaderfrom transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArgumentsfrom datasets import load_datasetdataset = load_dataset("wikitext", "wikitext-2-raw-v1")here we import the required libraries and load our wikitext-2 dataset, but you can use any dataset you prefer.model_name = "gpt2"tokenizer = AutoTokenizer.from_pretrained(model_name)def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length", truncation=True)tokenized_datasets = dataset.map(tokenize_function, batched=True)Now we tokenized our dataset along with padding for fine-tuning.train_dataset = tokenized_datasets["train"]eval_dataset = tokenized_datasets["validation"]train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=8)eval_dataloader = DataLoader(eval_dataset, batch_size=8)Now we create DataLoaders for training and evaluation, DataLoaders ensure efficient and effective data handling during the training and evaluation of machine learning models. They help in batch processing, shuffling, parallel data loading, and better memory management, all of which contribute to more efficient training and evaluation processes.model = AutoModelForCausalLM.from_pretrained(model_name)training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=3, weight_decay=0.01,)The code snippet loads a pre-trained causal language model using the AutoModelForCausalLM class from the Hugging Face Transformers library. It then sets up training arguments with specific parameters such as output directory, evaluation strategy, learning rate, batch sizes, number of training epochs, and weight decay using the TrainingArguments class.trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset,)trainer.train()model.save_pretrained("./fine-tuned-model")tokenizer.save_pretrained("./fine-tuned-model")The code initializes a Trainer to fine-tune the loaded model using the specified training arguments and datasets, then trains the model and saves the fine-tuned model and tokenizer to a specified directory.Navigating the Challenges of Prompt Tuning and Fine-TuningPrompt tuning faces several challenges. One significant issue is achieving the right balance between specificity and generality; overly specific prompts can limit the model’s versatility, while too general prompts may not yield precise results. Additionally, prompt tuning requires a deep understanding of the model’s behavior and the nuances of natural language, which can be complex and time-consuming. Ensuring consistency across different contexts and managing unexpected or biased outputs are also critical challenges that need to be addressed to harness the potential of prompt tuning effectively.Fine-tuning may result in major improvements in model performance and accuracy, but it also raises some challenges. Overfitting is an important problem wherein the model may perform poorly on unseen or new data due to its heavy tuning on the training set. Furthermore, obtaining the right datasets may require a significant time.Real-Time Applications of Prompt Tuning & Fine TuningPrompt tuning is useful when users require specific kinds of responses without the need for extensive model re-training (Fine Tuning) and have limited resources. For example, in customer service chatbots, well-engineered soft prompts can streamline interactions, and ensure that responses align with customer questions. In creative writing aids, prompts can inspire imaginative outputs while maintaining a corresponding narrative style.In fields where precise language and terminology are essential, such as healthcare, finance, and education, fine-tuning is frequently used. For instance, a medical record-tuned LLM can help healthcare workers with report writing, patient history analysis, and prescription recommendation generation.Here are some example scenarios for real-time applications to help choose between prompt tuning and fine-tuningConclusionLarge language model tuning is not a one-size-fits-all process; the decision between fine-tuning and prompt tuning is based on the unique requirements and limitations of the given task. Every tuning technique has advantages and disadvantages of its own, so a developer must know when and how to use each one.By carefully navigating such conditions and utilizing LLMs for their fundamental flexibility as well as their capacity to adapt and function well in a range of scenarios, we may be able to reduce the gap between human intentions and machine understanding.Tuning LLM: Exploring the Nuances of Prompt Tuning and Fine Tuning was originally published in IceApple Tech Talks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Feeling Inspired?

Take the first step towards your project's success and reach out to us today.

From Riddles toRevelations- UnveilCaptivating TechStories and UnlockBusiness Prosperity!