1
$\begingroup$

There are several gradient-based attack methods. Let $J$ be the training error, then for instance the projected gradient attack is, $$ \widetilde{x} = \Pi( x + \epsilon \nabla_x J(\theta, x, y) ) $$

Fast signed gradient method is $$ \widetilde{x} = x + \epsilon \text{sign}( \nabla_x J(\theta, x, y) ) $$

These methods are all assuming that we are ADDING $\nabla_x J(\theta, x, y)$. This is under the assumption that $\nabla_x J(\theta, x, y)$ points towards the direction of maximum infinitestimal increase of $J$ with respect to $x$.

But this assumption is false, because $J$ is a non-convex function of $x$. So ADDING $\nabla_x J(\theta, x, y)$ does not necessarily produce $\widetilde x$ that yields a larger value of $J$.

Since most of these methods are one-step, therefore there is no guarantee that $\widetilde x$ increases the value of $J$, it might even decrease the value of $J$.

Is there a flaw in my reasoning?

$\endgroup$
1
  • 1
    $\begingroup$ At its core, the attack is really just gradient descent, except we're trying to maximize errors instead of minimize the loss. During training, (S)GD doesn't generally point in the direction of a (global or local) minimum (it can actually be nearly orthogonal from the descent direction), but it can still be an effective optimizer & reduce the loss despite that fact. Same idea here -- getting closer to the goal is good enough. $\endgroup$ Commented Oct 11, 2024 at 15:00

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.