Logo

Lenka Zdeborova theory ML talk - Shared screen with speaker view
Abdulkadir Canatar
28:03
Can you also consider Lp regularization in the loss here?
Haozhe Shan
30:13
Why do you need the Y term? Why not only use the tensor term in the loss function?
Jiaoyang Huang
45:10
For spiked tensor model, tensor unfolding performs better than AMP, can you do something like tensor unfolding in your setting?
Zhun Deng
46:27
If I remembered correctly(may be wrong), SGLD does not work well in practice for NN, while SGD does. For non-convex problem, SGLD does work under very strong assumptions, e.g. a paper from Rahklin, but they basically have strong assumptions make the problem very close to a convex one. And tensor problem is highly structured, which could also be unfolded into a matrix which is a convex one.
Sidak Pal Singh
48:28
Here it seems that the noise is isotropic. However, when using SGD the noise is highly anisotropic (and has a structure similar to Fisher). Maybe I am missing something, but it looks like this model will not capture that aspect.
Chinmay
01:32:39
can we say something about generalization here?
Preetum Nakkiran
01:41:19
https://arxiv.org/abs/2010.08127
Boaz Barak
01:42:10
Co authors are Behnam Neyshabur and Hanie Sedghi
Sidak Pal Singh
01:47:41
^^ In the framework of above paper, can we imagine/hope that the data augmentation be selected in a particular manner that emulates the online learning scenario?
Preetum Nakkiran
01:51:15
^^ Yes this would approximately hold for very strong data-aug, trained for few steps. But the surprising part is, in many of our experiments we train until the Train Error ~ 0. So, the model itself is not in the “online” regime (since it fits the train set), although the test error behaves as if it were in the online regime.
Preetum Nakkiran
01:52:01
(Feel free to email me with any questions about the above: preetum@cs.harvard.edu)
Sidak Pal Singh
01:58:01
Interesting, I will look more in the paper and ping you. Thanks!