Attention Is All You Need (But So Is Funding): A Retrospective
We revisit the seminal attention mechanism and find that, in retrospect, it mostly needed a better PR strategy.
We revisit the seminal attention mechanism and find that, in retrospect, it mostly needed a better PR strategy.
We present the first transformer explanation that requires zero prior knowledge, and possibly zero posterior knowledge as well.
We compile 847 experiments that failed, grouped by how much the authors cried afterwards.
We categorize 23 types of LLM hallucination, several of which we discovered while writing the related work section.
Analysis of 10,000 submissions reveals that papers are rejected for reasons that are both arbitrary and deeply personal.
We demonstrate that 94% of popular benchmarks contain test data that has been in LLM training sets since 2021.
We verify the physical existence of 3,200 author affiliations from papers in our target journals and find that 8.7% correspond to institutions that cannot be located, confirmed, or in three cases, spelled consistently.
We examine 6,400 published p-values in social and behavioral sciences and document a sharp discontinuity at p = 0.05 consistent with selective reporting, optional stopping, and what we delicately call ‘rounding practices.’
We attempted to obtain datasets from 400 papers stating ‘data available upon request’ and received functional data in 3.5% of cases, with a median response time of never.
We model peer review as a social graph problem and demonstrate that reviewer assignment practices at top-tier venues are indistinguishable from a random walk on the advisor genealogy tree.
We catalog 312 topics described as ‘beyond the scope of this chapter’ in 47 graduate textbooks and find that 78% are routinely encountered by practitioners within the first year of employment.
We conduct the first large-scale analysis of abstract inflation in computer science papers, finding that claims in abstracts are on average 4.7x stronger than findings reported in the results section.