Neuron Descriptions for Llama 3.1-8B-Instruct

Check out our writeup for details on how these descriptions were generated and scored!

Top 10 High-scoring NeuronsBottom 10 Low-scoring Neurons
DescriptionNeuronScoreDescriptionNeuronScore
tokens "Decide" indicating decision-making points in various contexts11448, +0.999activating tokens are presents in the context of occult themes, personal experiences, transformations, or vulnerabilities related to characters named "NAME_1" in narrative scenarios.7177, -0.060
mention of a date, particularly "26 Jul" in various contexts8993, -0.999segments with "轮" or "t" tokens1616, -0.082
occurrences of the token "Many" in various contexts6890, -0.999tokens that indicate identity or essence ("我则是坦��贝") and phrases suggesting high appeal or attractiveness ("颜值��高一眼心动")4294, +0.084
the token "Featured" in various contexts13472, +0.998activating tokens appear as isolated or part of consecutive strings, often in the context of grammatical structures or questions, indicated by tokens in a different script or language.702, -0.085
occurrences of "Max" in various contexts, often related to optimization or limits6136, +0.998activation occurs after a delimiter, indicating the start of a relevant segment.1597, -0.087
occurrences of the word "Throughout" in various contexts2283, -0.998occurrences of dates, articles containing systematic information, or research findings, including complex terminology or jargon related to various fields.10585, +0.088
instances of the token "Nothing" in different contexts of conversation and statements.8741, -0.998mentions of specific document types (e.g., "technical task", "борд").9082, -0.089
occurrences of the token "spare" in various contexts such as "spare" time, "spare change," "spare to smite," and "spare O rings."9151, +0.998the token "limitations" when discussing potential constraints or restrictions12425, -0.091
the word "instruction" appearing as "instruction"12261, +0.998activation occurs on Bengali emojis and special characters, specifically the token .2042, -0.096
occurrences of "Impressive" and "Impressed" in various contexts, often related to quality or performance.5045, -0.998the presence of complex and often non-latin characters5417, +0.106