Gradient clipping is a way of forcing regularization.

  • Gradient clipping is a way of forcing regularization.
    • In ML, one way of regularizing is keeping track of, for each weight, how precise you believe it to be.
      • That is, how often it has changed in the past.
      • When you need to update it, you update precise values less than imprecise values.
      • But this requires significantly more complexity.
    • Another approach is simply gradient clipping.
      • Simply cut off extreme values.
    • It's less precise individually, but on average, stochastically it is apparently equivalent.