Logo

Theory ML seminar: Zico Kolter (w lightning talk by Edelman & Shavit) - Shared screen with speaker view
Andrew Ross
56:54
Will “perform as well” just be defined in terms of accuracy, or also robustness?
Andrew Ross
57:43
Cool, thanks :)
Boaz Barak
01:03:12
So we can think of y(x) = argmin f(x,y) ?
Boaz Barak
01:03:39
I guess answered :)
Thibaut Horel
01:06:10
what if there are multiple solutions to this equation? this does not uniquely specify the output of the network, should I now think of a network as a set-valued function?
Thibaut Horel
01:06:58
ok, thanks :)
Jacob Noah Steinhardt
01:11:04
Don't we need to know y* to get these? (Which is what we're trying to get?)
Jacob Noah Steinhardt
01:12:00
Thanks, got it now :)
Boaz Barak
01:12:52
So you will really use this as a layer of the form find y s.t. f(x,w,y)=0 where w are the weight vectors/ parameters and differentiate wrt w
Jacob Noah Steinhardt
01:12:53
You might get to this later, but what about inequality constraints?
Andrew Ross
01:13:08
Is it straightforward to apply this procedure repeatedly to get higher-order derivatives out of implicit layers?
Jacob Noah Steinhardt
01:16:25
Do convolutions work out nicely with the linear system solving?
Boaz Barak
01:25:38
Could we reduce general NN to weight-tied by increasing the width by a factor of the number of layers and then having W include something like "shift from one block to another"?
Boaz Barak
01:26:16
I guess this would not be stable
Preetum Nakkiran
01:30:25
Is there an issue with finding trivial solutions? (Eg z* = U* = 0 )
Boaz Barak
01:45:08
Just to clarify - we still have the same issue as with DNN that there are many local minima - pairs (U,W) that minimize the loss, and some would generalize better than others?
Benjamin Edelman
01:45:32
Is the situation much worse if you remove input injection (the Ux term)?
Benjamin Edelman
01:46:34
ha, right
Andrew Ross
01:48:47
With the same number of parameters?
Boaz Barak
01:51:36
Maybe this will be adressed later in the talk, but in practical DEQ networks, is it the case that you can interpret some coordinates or linear functions of z_i as having different "virtual depth" and corresponding to more and less basic features such as edges, eyes, etc
Andrew Ross
01:52:37
This is totally against the spirit of your talk, but what if you alternated implicit and explicit layers?
Andrew Ross
01:52:49
Would that still be reducible to a single DEQ?
Andrew Ross
01:53:57
Thanks :)
Benjamin Edelman
01:54:04
Is there a nice intuition for why we should expect a DEQ to perform well that doesn't make reference to the deep network representation theorem? (just based on the defn of DEQ)
Benjamin Edelman
01:55:41
OK, thanks
Daniel Temko
02:07:37
How is the compact version of the DEQ derived in practice? Are there architecture-specific ways of doing this?
Andrew Ross
02:08:25
Do you feel that DEQs have different inductive biases than standard DNNs/resnets, even if you try to give them similar fs?
Kai Wang
02:08:56
Why is the training time (especially the backward path) only a constant-time worse? It requires to compute the matrix inverse (or solving linear equation), which sounds much more expensive than the standard back-propagation?
Kai Wang
02:11:27
Thanks!
Lucas Liebenwein
02:17:50
have you tried running Boyden’s method for less iteration so that DEQ has the same running time as a deep network and compared the performance? I realize you might not get to a fixed point but is it enough for god performance in practice?
Lucas Liebenwein
02:19:32
thank you :)
Preetum Nakkiran
02:19:43
It seems somewhat surprising that DEQs for images have the same “inductive biases” as deep convents — do you have thoughts/intuitions about this?
Andrew Ross
02:24:02
Tweaking my earlier question, I understand that mixing implicit and explicit layers may not increase expressivity, but could it change inductive biases?Also, I’m potentially mangling neuroscience here, but my understanding is that the brain contains both feedforward-ish and feedback loop components — structure in both space and time. It seems reasonable to suspect that an "optimal architecture" would combine both elements as well. Would you be willing to speculate on that? :)
Preetum Nakkiran
02:27:52
thanks!
Benjamin Edelman
02:28:09
Might DEQs be advantageous in terms of conduciveness to theory? i.e. might it be easier, or at least involve different tools, to reason about some properties of DEQs vs deep networks
Andrew Ross
02:31:17
Thanks :)