Question about the notations: the max objective is rho(1), then what’s rho(i) with the index (i) in the Output line?
Are there classes of problems where earlier layers don't converge faster than later layers (or types of inputs that this does worse on)?
Q: Do you know if early layers also stop moving in transfer learning? (Eg, when fine-tuning a network instead of training from scratch)
Have a quick Qns
Can I be unmuted
I have a question about something from way earlier: As a network learns, you mentioned how layers converge one layer at a time according to SVCCA. When a layer gets close to its final state according to SVCCA, have the parameters also usually mostly stopped moving?