Innovation Innitiative | AI, ML & Quantum Research

Abstract

Now a days Deep Learning Models have become very crucial for high quality infrastructures from Face recognition to vehicle which are autonomous. The Robustness of the models against the malicious attack is very paramount. In this article We will investigate Adversarial Machine Learning (AML) Which focused on Inputs Which are Specially crafted to cheat The Neural Networks. We will explore The mathematical Explanation of The Fast Gradient Sign Method(FGSM) which will demonstrate that how a noise which is imperceptible can effect a misclassification which is catastrophic and we will also explore some defence strategies to make AI systems much harder against those Vulnerabilities.

Introduction

Let us Take an example a car Which is Autodrive mode is approaching to a red sign. For a human It basically Looks very normal But In a computer vision of the car System Some relevant Stickers on that sign can make it Watch like Sign Of speed limit 45 So if the car will accelerate It may cause an accident. Show this scenario Is not theoretical It is a type of attack Which is Model Evasion.

For long years Cyber Security is focusing on securing to secure The Firewalls And the patching bugs. But Now a days We are facing a new problem which is the mathematics. Neural Networks perform much sensitive for small trouble Into data Which is taking as input. These things are Invisible often to our eyes to push Input to the Models Decision boundary Which is high dimensional which will force A confident prediction But it is wrong.

The article will scrap all layers of the neural network to describe that how those attacks work and how we will calculate the perfect noise mathematically to break the model.

We are calling it as the Invisible Hack because the attack will exploit the fundamental differences between mathematical processing and biological vision. It is invisible to humans because the noise which was added to an image which is controlled by epsilon parameter is extremely small(Exa:-0.006). In our eyes, pixels shifts so much slightly that “Hacked Bear” will look exactly identical to “Normal Bear”.

Though humans ignore some small noises AI will calculate each and every single value of pixel. This attack pushes an image data to over the Decision Boundary which is mathematical in the vector space of model. AI does not just confused it will trick into being wrong confidentially.

Mathematical Description

We does not need the server access to hack any neural network. We will need to understand the Gradient Descent just.

Normally when we train our model we have to minimize loss function by adjusting weights

Training Goal: $\min_{\theta} J(\theta, x, y)$

In the adversarial attack when we flip secret bit we will freeze weights and modify the input image to maximum the loss. This techniques is called Gradient Ascent. We have used the common algorithm which is mostly used is Fast Gradient Sign Method(FGSM). This is introduced by famous Goodfellow.

Now we are going to calculate the noise as:

$\eta = \epsilon \cdot \text{sign}(\nabla_x J(\theta, x, y))$

Where:

$\theta$ : The fixed model parameters.
$x$ : The original input (e.g., an image of a Panda).
$y$ : The correct label ("Panda").
$\epsilon$ (Epsilon): A small multiplier ensuring the noise remains invisible to humans.
$\nabla_x J$ : The gradient of the loss with respect to the input pixels.

The final "Adversarial Image" $x'$ becomes:

$x' = x + \eta$

Code

This pseudocode describes that how the attack is generated by using the pretrained model

import torch
import torch.nn.functional as F

# 1. The FGSM Attack Function
def fgsm_attack(image, epsilon, data_grad):
    """
    Generates adversarial noise to deceive the model.
    """
    # Get the sign of the gradients (direction to increase error)
    sign_data_grad = data_grad.sign()

    # Create the noise (perturbation)
    # Equation: eta = epsilon * sign(gradient)
    perturbed_image = image + epsilon * sign_data_grad

    # Clip to maintain valid image range [0,1]
    perturbed_image = torch.clamp(perturbed_image, 0, 1)

    return perturbed_image

# 2. Execution Logic
def attack_model(model, data, target, epsilon):
    # Set require_grad attribute of tensor to True
    data.requires_grad = True

    # Forward pass: Ask the model what it sees
    output = model(data)

    # Calculate Loss (Cross Entropy)
    # We want to MAXIMIZE this loss
    loss = F.nll_loss(output, target)

    # Zero all existing gradients
    model.zero_grad()

    # Backward pass: Calculate gradients of loss w.r.t input data
    loss.backward()

    # Collect the data gradients (nabla_x)
    data_grad = data.grad.data

    # Call FGSM function
    perturbed_data = fgsm_attack(data, epsilon, data_grad)

    return perturbed_data

Defence

Now imagine if we are able to know that the models which are vulnerable for some special mathematical noise then the mostly used logical based defence will be to include that noise which come out and put it in the training Process. This process is also called Adversarial Training. Imagine it as a vaccine. We intentionally infecting the model with virus(Adversarial Example) in weekends during training so the immune system(Decision Boundary) of it starts to learning how to recognize and resist them.

Min-Max Game

If we train a model on standard basis then it will minimize the loss of clean data. By doing adversarial training we can transform it to a Min Max Game. We have to find weights to minimize loss by giving an adversary and then we have to try constantly to find permutation that maximize it.

$\min_{\theta} \mathbb{E}_{(x,y) \sim \mathcal{D}} \left[ \max_{\delta \in S} L(\theta, x + \delta, y) \right]$

Inner Maximization(max):- Here the attacker will try to search the noise which is so much noised to maximum the error.

Outer Maximization(min):- Here the defender will try to update the weights to minimum high error.

Code

This pseudocode describes that how to defence the attack

def train_robust_model(model, train_loader, optimizer, epsilon=0.1):
    model.train()
    for data, target in train_loader:
        # 1. Generate the Attack (The "Vaccine")
        data.requires_grad = True
        output = model(data)
        loss = F.nll_loss(output, target)
        model.zero_grad()
        loss.backward()

        # Create the adversarial image
        data_grad = data.grad.data
        perturbed_data = data + epsilon * data_grad.sign()
        perturbed_data = torch.clamp(perturbed_data, 0, 1)

        # 2. Train on the Attack (The "Immunity")
        optimizer.zero_grad()
        # Feed the BROKEN image to the model, but force it to learn the CORRECT label
        output_adv = model(perturbed_data)
        loss_adv = F.nll_loss(output_adv, target)
        loss_adv.backward()
        optimizer.step()
    print("Robust training epoch complete.")

Conclusion

Adversarial Machine Learning already exposed a hard reality which is that AI models are breakable. As we are integrating AI into Cybersecurity by Threat Detection and Zero Trust we should remember that defenders means AI models can make themselves tricked. In The next Few Years the cybersecurity will not just stucked to encryption or firewalls. It will also able to build Robust AI models which can fight back mathematical questions and functional dependencies though the world tries to cheat them.