Lenka Zdeborova theory ML talk - Shared screen with speaker view
Abdulkadir Canatar
Can you also consider Lp regularization in the loss here?
Haozhe Shan
Why do you need the Y term? Why not only use the tensor term in the loss function?
Jiaoyang Huang
For spiked tensor model, tensor unfolding performs better than AMP, can you do something like tensor unfolding in your setting?
Zhun Deng
If I remembered correctly(may be wrong), SGLD does not work well in practice for NN, while SGD does. For non-convex problem, SGLD does work under very strong assumptions, e.g. a paper from Rahklin, but they basically have strong assumptions make the problem very close to a convex one. And tensor problem is highly structured, which could also be unfolded into a matrix which is a convex one.
Sidak Pal Singh
Here it seems that the noise is isotropic. However, when using SGD the noise is highly anisotropic (and has a structure similar to Fisher). Maybe I am missing something, but it looks like this model will not capture that aspect.
can we say something about generalization here?
Preetum Nakkiran
Boaz Barak
Co authors are Behnam Neyshabur and Hanie Sedghi
Sidak Pal Singh
^^ In the framework of above paper, can we imagine/hope that the data augmentation be selected in a particular manner that emulates the online learning scenario?
Preetum Nakkiran
^^ Yes this would approximately hold for very strong data-aug, trained for few steps. But the surprising part is, in many of our experiments we train until the Train Error ~ 0. So, the model itself is not in the “online” regime (since it fits the train set), although the test error behaves as if it were in the online regime.
Preetum Nakkiran
(Feel free to email me with any questions about the above: preetum@cs.harvard.edu)
Sidak Pal Singh
Interesting, I will look more in the paper and ping you. Thanks!