model-transparency

# language-models → neural-activations → model-transparency

1 link tagged with all of: language-models + neural-activations + model-transparency

Click any tag below to further narrow down your results

Links

Signs of introspection in large language models

Researchers used a “concept injection” method to compare Claude’s self-reported thoughts with its actual neural activity. They found Claude Opus 4 and 4.1 sometimes detect and control injected concepts, suggesting limited but real introspective abilities that improve with model capacity.

Last saved Oct 30, 2025 · 6 min read

+ introspection + concept-injection neural-activations language-models model-transparency