In the ML reading group this Friday, we will look at a circuit model (or "architecture") called "transformer" that ostensibly underlies recent popular applications such as ChatGPT, Bard and friends. We, however, will focus on works (mostly empirical as of now) that attempt to understand its power in the more restricted setting of learning to "reason" in the setting of a restricted proof system. Two examples we will look at are
1) Straight Line Programs
2) Proofs in the Isabelle proof system.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention Is All You Need." In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., 2017. https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-....
Zhang, Yi, Arturs Backurs, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, and Tal Wagner. "Unveiling Transformers with LEGO: A Synthetic Reasoning Task." arXiv, February 17, 2023. http://arxiv.org/abs/2206.04301.
Mikuła, Maciej, Szymon Antoniak, Szymon Tworkowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, and Yuhuai Wu. "Magnushammer: A Transformer-Based Approach to Premise Selection." arXiv, March 8, 2023. https://doi.org/10.48550/arXiv.2303.04488.