Harvard ML theory seminar - Sho Yaida - Shared screen with speaker view
Boaz Barak
hi we have some AV issues we'll start shortly
Daniel Adam Roberts
I think kernel is a fixed function of whatever dataset you have, independent of the data. To represent it as a matrix you’d need ~N^2 numbers, for N^2 datapoints. But then incorporating finite width corrections, you’d need N^4 numbers.
Daniel Adam Roberts
To Lucas question: yes since you have expressions for the whole ensemble, you can compute the mean behavior. Though the perturbation theory breaks down for for L/n ~ 1. Sho is pointing out that in this regime where perturbation theory breaks down, any network is typically untrainable, and you would need an unrealistic number of realizations in your ensemble to see the mean behavior.
Daniel Adam Roberts
You can evaluate these equations efficiently for small datasets and check, yes.
Lucas B Janson
Have people done it?
Daniel Adam Roberts
We evaluated them to convince ourselves that what we did is correct. But there’s no formal study in the literature.
Lucas B Janson
Ok thanks!
Lucas B Janson
I have to run on the hour, but thanks for the really interesting talk!
Saleh Elmohamed
great talk