February 20, 2025

309 - Baseline Credibility

Following another good daily paper discussion, I can recommend the latest paper from DeepSeek & the University of Washington, "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention."

This particular paper introduces a new way of achieving further 10x+ gains in terms of compute time requirements, a gain that increases with the context window size, while significantly reducing token counts without losing performance. The presentation is transparent and the method is fairly elegant, making it far higher quality than any "paper" produced by a US tech startup or big company in quite some time. Moreover, they continue to deliver this high-quality work consistently, and pretty rapidly.

As predicted, DeepSeek continues to make OpenAI, Anthropic, and others look like a sack full of chimpanzees covered in their own feces and high on cocaine. They still operate purely in the LLM domain of technology, but their organization reliably demonstrates a focus on viable methodologies, rather than the US tech industry standard approach of "fake it until it you make it", as demonstrated by SBF, Elizabeth Holmes, Amodei, Scam Altman, and others.

At present DeepSeek looks like the only company in the LLM space with both a collection of high-quality talent, and baseline credibility. From what I've read about their hiring practices and organization they put some actual thought into building the right organization and hiring the right people to apply scientific methods, and those methods offer a strategic advantage that is both cumulative and subsequently predictable.

For all of the geopolitical saber-rattling between the US and China, all that China has to do to win that competition right now is to leave DeepSeek to continue doing their thing unobstructed. The worst thing that the US can possibly do is what they're already doing, investing hundreds of billions of dollars in history's most obvious frauds, pretending that they're on the path to "AGI" rather than just covering themselves in feces and cocaine. The US is even gouging their own citizens with "tariffs" to pay for those massive investments in fraud, which any credible country would respond to with their own rendition of the French Revolution.

In 2020 it became sufficiently obvious that the US was on a steep downward slope, and in 2022 I left the country, permanently. 2024 and 2025 have only continued to deliver on those expectations, though the emergence of DeepSeek and their heavy focus on open-source did initially come as a surprise. The US has become virtually everything that they once accused China of being, and worse, with delusions of Silly-Con Valley's supremacy dead and buried for all practical purposes.

Building systems fundamentally capable of reasoning and understanding is entirely outside the scope of LLMs and RL, but in time DeepSeek may prove capable of building better tools for such systems to use.