Logo

Harvard ML theory seminar - Sho Yaida - Shared screen with speaker view
Boaz Barak
03:20
hi we have some AV issues we'll start shortly
Daniel Adam Roberts
38:30
I think kernel is a fixed function of whatever dataset you have, independent of the data. To represent it as a matrix you’d need ~N^2 numbers, for N^2 datapoints. But then incorporating finite width corrections, you’d need N^4 numbers.
Daniel Adam Roberts
01:16:41
To Lucas question: yes since you have expressions for the whole ensemble, you can compute the mean behavior. Though the perturbation theory breaks down for for L/n ~ 1. Sho is pointing out that in this regime where perturbation theory breaks down, any network is typically untrainable, and you would need an unrealistic number of realizations in your ensemble to see the mean behavior.
Daniel Adam Roberts
01:20:00
You can evaluate these equations efficiently for small datasets and check, yes.
Lucas B Janson
01:20:36
Have people done it?
Daniel Adam Roberts
01:21:03
We evaluated them to convince ourselves that what we did is correct. But there’s no formal study in the literature.
Lucas B Janson
01:21:13
Ok thanks!
Lucas B Janson
02:01:31
I have to run on the hour, but thanks for the really interesting talk!
Saleh Elmohamed
02:08:48
great talk