The most massive galaxies in the universe are rare, but because of this, their formation history imposes some of the strongest constraints on models of galaxy formation. In the local universe, massive galaxies appear relatively dull, with elliptical morphologies, old stars, and little ongoing star formation. For decades, archeological studies predicted that most of the action during their formation must have occurred at much higher redshift (z > 2). With the first deep and wide field surveys of the near infrared sky coming online, we can now directly observe the progenitors of local massive galaxies as they are forming. I will present the latest observations of this process up to z ~ 4 from the UltraVISTA survey, where we are finding that the early stages of massive galaxy formation are in fact extremely dynamic, with huge bursts of dust-obscured star formation, ubiquitous AGN activity, and significant structural transformations. I will also show a fascinating and only recently-discovered population of massive galaxies at z ~ 2 that are extraordinarily dusty, and discuss how this population is related to the more well-studied submillimeter population. I will conclude with some results from gravitationally-lensed massive compact galaxies, which are allowing us to probe ~100 pc scales of these galaxies for the first time at high-redshift, and are giving us new insight into how the central regions of these compact galaxies may have formed.