References
[1]
Sepp Hochreiter and Jürgen Schmidhuber. Long
short-term memory. Neural computation, 9:1735–80,
12 1997. doi: 10.1162/neco.1997.9.8.1735.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent,
and Christian Janvin. A neural probabilistic language
model. J. Mach. Learn. Res., 3(null):1137–1155, mar
2003. ISSN 1532-4435.
[3]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Dean. Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781, 2013.
[4]
Jeffrey Pennington, Richard Socher, and Christo-
pher D Manning. Glove: Global vectors for word rep-
resentation. In Proceedings of the 2014 conference
on empirical methods in natural language processing
(EMNLP), pages 1532–1543, 2014.
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin,
and Tomas Mikolov. Enriching word vectors with
subword information, 2016. URL
https://arxi
v.org/abs/1607.04606.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. BERT: pre-training of deep
bidirectional transformers for language understand-
ing. CoRR, abs/1810.04805, 2018. URL
http:
//arxiv.org/abs/1810.04805.
[7]
Tom B. Brown, Benjamin Mann, Nick Ryder,
Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, Sandhini Agarwal, Ariel Herbert-
Voss, Gretchen Krueger, Tom Henighan, Rewon
Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey
Wu, Clemens Winter, Christopher Hesse, Mark Chen,
Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin
Chess, Jack Clark, Christopher Berner, Sam Mc-
Candlish, Alec Radford, Ilya Sutskever, and Dario
Amodei. Language models are few-shot learners.
CoRR, abs/2005.14165, 2020. URL
https://ar
xiv.org/abs/2005.14165.
[8]
Alec Radford, Karthik Narasimhan, Tim Salimans,
and Ilya Sutskever. Improving language understand-
ing by generative pre-training. 2018.
[9]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
Dario Amodei, Ilya Sutskever, et al. Language mod-
els are unsupervised multitask learners. OpenAI blog,
1(8):9, 2019.
[10]
Colin Raffel, Noam Shazeer, Adam Roberts, Kather-
ine Lee, Sharan Narang, Michael Matena, Yanqi
Zhou, Wei Li, and Peter J. Liu. Exploring the limits
of transfer learning with a unified text-to-text trans-
former. Journal of Machine Learning Research, 21
(140):1–67, 2020. URL
http://jmlr.org/p
apers/v21/20-074.html.
[11]
Susan Zhang, Stephen Roller, Naman Goyal, Mikel
Artetxe, Moya Chen, Shuohui Chen, Christopher De-
wan, Mona Diab, Xian Li, Xi Victoria Lin, Todor
Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster,
Daniel Simig, Punit Singh Koura, Anjali Sridhar,
Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-
trained transformer language models, 2022. URL
https://arxiv.org/abs/2205.01068.
[12]
Romal Thoppilan, Daniel De Freitas, Jamie Hall,
Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze
Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du,
et al. Lamda: Language models for dialog applica-
tions. arXiv preprint arXiv:2201.08239, 2022.
[13]
Aakanksha Chowdhery, Sharan Narang, Jacob De-
vlin, Maarten Bosma, Gaurav Mishra, Adam Roberts,
Paul Barham, Hyung Won Chung, Charles Sutton,
Sebastian Gehrmann, Parker Schuh, Kensen Shi,
Sasha Tsvyashchenko, Joshua Maynez, Abhishek
Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vin-
odkumar Prabhakaran, Emily Reif, Nan Du, Ben
Hutchinson, Reiner Pope, James Bradbury, Jacob
Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin,
Toju Duke, Anselm Levskaya, Sanjay Ghemawat,
Sunipa Dev, Henryk Michalewski, Xavier Garcia,
Vedant Misra, Kevin Robinson, Liam Fedus, Denny
Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim,
Barret Zoph, Alexander Spiridonov, Ryan Sepassi,
David Dohan, Shivani Agrawal, Mark Omernick, An-
drew M. Dai, Thanumalayan Sankaranarayana Pil-
lai, Marie Pellat, Aitor Lewkowycz, Erica Moreira,
Rewon Child, Oleksandr Polozov, Katherine Lee,
Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark
Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy
Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov,
and Noah Fiedel. Palm: Scaling language modeling
with pathways, 2022.
[14]
Alex Wang, Amapreet Singh, Julian Michael,
Felix Hill, Omer Levy, and Samuel R. Bow-
man. Glue: A multi-task benchmark and anal-
ysis platform for natural language understanding,
2018. URL
h t tp : / / a r x i v . o r g / a b s/
1804.07461
. cite arxiv:1804.07461Comment:
https://gluebenchmark.com/.
[15]
Alex Wang, Yada Pruksachatkun, Nikita Nangia,
Amanpreet Singh, Julian Michael, Felix Hill, Omer
Levy, and Samuel R. Bowman. Superglue: A stickier
benchmark for general-purpose language understand-
ing systems, 2020.
[16]
Edward Loper Bird, Steven and Ewan Klein. Nat-
ural Language Processing with Python. O’Reilly
Media Inc., 2009.
[17]
Matthew Honnibal and Ines Montani. spaCy 2:
Natural language understanding with Bloom embed-
dings, convolutional neural networks and incremental
parsing. To appear, 2017.
[18]
Lance A. Ramshaw and Mitchell P. Marcus.
Text chunking using transformation-based learning.
CoRR, cmp-lg/9505040, 1995. URL
http://ar
xiv.org/abs/cmp-lg/9505040.
5