Abdulkadir Canatar

28:03

Can you also consider Lp regularization in the loss here?

Haozhe Shan

30:13

Why do you need the Y term? Why not only use the tensor term in the loss function?

Jiaoyang Huang

45:10

For spiked tensor model, tensor unfolding performs better than AMP, can you do something like tensor unfolding in your setting?

Zhun Deng

46:27

If I remembered correctly(may be wrong), SGLD does not work well in practice for NN, while SGD does. For non-convex problem, SGLD does work under very strong assumptions, e.g. a paper from Rahklin, but they basically have strong assumptions make the problem very close to a convex one. And tensor problem is highly structured, which could also be unfolded into a matrix which is a convex one.

Sidak Pal Singh

48:28

Here it seems that the noise is isotropic. However, when using SGD the noise is highly anisotropic (and has a structure similar to Fisher). Maybe I am missing something, but it looks like this model will not capture that aspect.

Chinmay

01:32:39

can we say something about generalization here?

Preetum Nakkiran

01:41:19

https://arxiv.org/abs/2010.08127

Boaz Barak

01:42:10

Co authors are Behnam Neyshabur and Hanie Sedghi

Sidak Pal Singh

01:47:41

^^ In the framework of above paper, can we imagine/hope that the data augmentation be selected in a particular manner that emulates the online learning scenario?

Preetum Nakkiran

01:51:15

^^ Yes this would approximately hold for very strong data-aug, trained for few steps. But the surprising part is, in many of our experiments we train until the Train Error ~ 0. So, the model itself is not in the “online” regime (since it fits the train set), although the test error behaves as if it were in the online regime.

Preetum Nakkiran

01:52:01

(Feel free to email me with any questions about the above: preetum@cs.harvard.edu)

Sidak Pal Singh

01:58:01

Interesting, I will look more in the paper and ping you. Thanks!