← Back to memo

Methodology

How this data was collected, filtered, and categorized. No LLMs were used in the analysis — all classification is deterministic keyword matching.

Data sources

2,562+ data points from 6 public sources, collected April 2026. 18-month freshness window (October 2024 – April 2026).

SourceItems%
Reddit1,27050%
Hacker News97038%
GitHub Issues1747%
Hugging Face1024%
Stack Overflow271%
Twitter / X191%

Collection method

  • Reddit— Pullpush.io API (Pushshift successor). Posts + comments across r/LocalLLaMA, r/MachineLearning, r/MLOps, r/deeplearning, r/artificial, r/datascience, r/LangChain.
  • Hacker News— Algolia search API. Stories + comments. 14 search queries covering fine-tuning, evaluation, retraining, drift.
  • GitHub Issues— REST API search across thinking-machines-lab/tinker, huggingface/transformers, axolotl-ai-cloud/axolotl, unslothai/unsloth.
  • Stack Overflow— StackExchange API. stackoverflow.com, datascience.stackexchange.com, ai.stackexchange.com.
  • Twitter / X— Curated tweet IDs fetched via Twitter's public syndication API (cdn.syndication.twimg.com/tweet-result). High-signal tweets identified via web search across four themes: fine-tuning evaluation, catastrophic forgetting, training cost, and LoRA / adapters.
  • Hugging Face Forums— Discourse search API on discuss.huggingface.co.

Keyword matching (double-gate + proximity)

Every item must pass three checks to be counted:

  1. Fine-tuning context — text must contain either one strong signal (fine-tun*, finetun*, lora, qlora, sft, rlhf, dpo, grpo, training run, unsloth, axolotl, tinker, instruction tun*) or two weak signals (adapter, checkpoint, base model, huggingface, transformers, distill, training data).
  2. Theme keyword— text must match at least one keyword from one of 7 theme categories (evaluation, data quality, catastrophic forgetting, retraining triggers, incremental training, version comparison, iteration overhead).
  3. Proximity— a theme keyword and a context word must appear within 150 characters of each other. Prevents matches where the two concepts are mentioned in passing in unrelated parts of the text.

Items that pass are counted toward every matching theme. A noise filter rejects job listings, resumes, and self-promotional posts. Top quotes per theme are ranked by community score, boosted 2× for question/pain-point markers (“how do I”, “struggling with”, etc.) and deboosted to 0.1× for announcements (“releases”, “introducing”, etc.).

Deduplication

Two-pass dedup: (1) by unique item ID, (2) by normalized text (lowercase, whitespace-stripped, first 200 chars). Same post crossposted to multiple subreddits counts once.

Theme definitions

Did it actually get better?

Developers asking how to evaluate fine-tuned models, what benchmarks to use, how to compare against baseline.

Is my data clean?

Training data quality issues — duplicates, mislabeling, bias, data prep friction.

Did fine-tuning break something?

Catastrophic forgetting, capability regression, safety degradation after training.

When should I retrain?

Drift detection, staleness signals, retraining frequency decisions.

How do I update without starting over?

Incremental training, LoRA composition, adding new data to existing models.

Which version is best?

Model versioning, A/B comparison, rollback, experiment tracking.

The overhead of another training run

Cost, pipeline complexity, data refresh pain, time-to-retrain.

Known limitations

  • Discord communities (Unsloth, Axolotl, HuggingFace, EleutherAI) are high-signal sources but require authenticated bot access. Not included.
  • Pullpush.io scores are captured at archive time and may differ from live Reddit scores. Score thresholds are set low (1+) to account for this.
  • Keyword matchingcan over-match (generic ML posts that mention fine-tuning in passing) or under-match (discussions that use novel terminology). The double-gate filter reduces false positives but doesn't eliminate them.
  • No sentiment analysis— items are categorized by topic, not by whether the author was frustrated, satisfied, or neutral.

Reproducibility

Full source code is available in the schlacter-me repository. The scraper, keyword definitions, and analysis logic are in lib/tinker-flywheel.ts. Raw data is available via the export API.

← Back to memo