Methodology — Gemini Workspace AI Opportunity Map

Overview

The approach in one paragraph

Six scrapers collect public user feedback from Reddit, Hacker News, the App Store, the Play Store, Stack Overflow, and YouTube — no authentication, no paywalled data. A keyword-based matching engine (not ML) assigns each feedback item to one or more opportunity themes. Severity scores (1–5) are set manually based on user sentiment patterns and business impact. Raw data is stored in Upstash Redis KV. A 15-item curated baseline ensures key themes are always represented even when scrapers return low volume.

Sources

Themes

Keyword lists

Curated entries

1. Reddit

Public JSON API — no auth required

Searches 5 subreddits × 6 search terms = 30 API calls. Each call returns up to 25 posts sorted by relevance within the past year. Posts are deduplicated by Reddit post ID.

typescript

// Subreddits searched
const SUBREDDITS = [
  "GoogleWorkspace", "google", "artificial",
  "productivity", "ChatGPT"
];

// Search terms
const SEARCH_TERMS = [
  "gemini workspace", "gemini docs", "gemini gmail",
  "gemini sheets", "google ai workspace", "gemini side panel"
];

// Fetch via Reddit public JSON API (no auth required)
const url = `https://www.reddit.com/r/${sub}/search.json
  ?q=${encodeURIComponent(term)}
  &restrict_sr=1&sort=relevance&t=year&limit=25`;

const res = await fetch(url, {
  headers: { "User-Agent": "workspace-ai-analyzer/1.0" },
});

2. Hacker News

Algolia search API — no auth required

Searches for both stories and comments across 4 queries. Comments are included because they contain more specific technical feedback than top-level submissions. HTML tags are stripped from comment bodies.

typescript

// Uses Algolia HN search API — no auth required
const queries = [
  "gemini workspace", "google gemini docs",
  "gemini gmail", "google ai productivity"
];

// Fetch stories
fetch(`https://hn.algolia.com/api/v1/search
  ?query=${query}&tags=story&hitsPerPage=30`);

// Also fetch comments (richer feedback signal)
fetch(`https://hn.algolia.com/api/v1/search
  ?query=${query}&tags=comment&hitsPerPage=50`);

// Strip HTML tags from comment text
const text = hit.comment_text
  .replace(/<[^>]*>/g, " ")
  .replace(/\s+/g, " ")
  .trim();

3. Apple App Store

app-store-scraper npm package

Scrapes the 5 most relevant Google Workspace apps (Gmail, Docs, Sheets, Slides, Meet) — 5 pages each, sorted by most recent. Reviews are filtered to only those mentioning AI-related keywords to reduce noise.

typescript

// App Store IDs scraped (5 apps, 5 pages each)
const apps = [
  { id: 422689480, name: "gmail" },
  { id: 842842640, name: "docs" },
  { id: 842849113, name: "sheets" },
  { id: 879478102, name: "slides" },
  { id: 1013161476, name: "meet" },
];

// Only keep reviews mentioning AI-related keywords
const AI_KEYWORDS = [
  "ai", "gemini", "smart", "suggest", "summary",
  "compose", "write", "draft", "autocomplete",
  "hallucin", "context", "side panel", "useless",
  "broken", "buggy", "slow"
];

const reviews = await store.reviews({
  id: app.id,
  sort: store.sort.RECENT,
  page,   // pages 1–5
  country: "us",
});

4. Google Play Store

google-play-scraper npm package

Scrapes the 4 Android Workspace apps (Gmail, Docs, Sheets, Slides) — 30 most recent reviews each. Same AI-keyword filter applied as App Store.

typescript

// Play Store app IDs scraped
const apps = [
  { id: "com.google.android.gm",                      name: "gmail" },
  { id: "com.google.android.apps.docs",               name: "docs" },
  { id: "com.google.android.apps.docs.editors.sheets",name: "sheets" },
  { id: "com.google.android.apps.docs.editors.slides",name: "slides" },
];

// Filter: only reviews mentioning AI/Gemini
const text = (review.text ?? "").toLowerCase();
if (
  !text.includes("ai") &&
  !text.includes("gemini") &&
  !text.includes("smart") &&
  !text.includes("suggest")
) continue;

5. Stack Overflow

StackExchange public API v2.3 — no auth required

Searches both stackoverflow.com and webapps.stackexchange.com (the latter has far more Workspace-specific questions). Full question bodies are fetched via the filter=withbody parameter.

typescript

// Stack Overflow public API v2.3 (no auth)
// Searches both stackoverflow.com and webapps.stackexchange.com
const queries = [
  "gemini+google+workspace", "gemini+docs",
  "gemini+gmail", "gemini+sheets", "google+ai+workspace"
];

fetch(`https://api.stackexchange.com/2.3/search/advanced
  ?order=desc&sort=relevance&q=${query}
  &site=stackoverflow&pagesize=30&filter=withbody`);

// Also webapps.stackexchange.com (more Workspace questions)
fetch(`https://api.stackexchange.com/2.3/search/advanced
  ?q=${query}&site=webapps&pagesize=30&filter=withbody`);

6. YouTube

HTML scraping + youtube-transcript-api (Python)

YouTube has no public search API without quota limits. Instead, the scraper parses video IDs from the YouTube search results HTML (the ytInitialData JSON embedded in the page). Transcripts are fetched via the open-source youtube-transcript-api Python library — no API key needed. Up to 15 videos are transcribed per run.

typescript

// Step 1: Search YouTube HTML for video IDs
// No API key — parses ytInitialData from page HTML
const url = `https://www.youtube.com/results
  ?search_query=${query}&sp=CAISBAgCEAE`; // filter: this year

const ids = [...html.matchAll(
  /\"videoId\":\"([a-zA-Z0-9_-]{11})\"/g
)].map(m => m[1]);

// Step 2: Fetch transcript via youtube-transcript-api (Python)
// Requires: pip install youtube-transcript-api
execSync(`python3 -c "
from youtube_transcript_api import YouTubeTranscriptApi
api = YouTubeTranscriptApi()
transcript = api.fetch('${videoId}', languages=['en'])
text = ' '.join([s.text for s in transcript.snippets])
print(text[:2000])
"`);

7. Theme Analysis Engine

Keyword matching — no ML, fully auditable

Each raw feedback item is matched against 7 keyword lists via simple substring search. A single item can match multiple themes. Themes are ranked by severity × frequency — highest impact first. Severity is manually assessed; frequency is computed.

typescript

// Each theme has a keyword list
const THEME_KEYWORDS: Record<string, string[]> = {
  "hallucination": [
    "hallucinate", "hallucination", "made up",
    "fabricat", "incorrect", "wrong information",
    "inaccurate", "false", "imagin"
  ],
  "cross-app-memory": [
    "context", "memory", "remember", "forget",
    "loses context", "switch", "cross-app",
    "between apps", "side panel", "persistent"
  ],
  "mobile-voice": [
    "mobile", "phone", "voice", "hands-free",
    "android", "ios", "cramped", "small screen"
  ],
  // ... (see analysis.ts for full list)
};

// Match: simple keyword substring search (case-insensitive)
function analyzeFeedback(raw: RawFeedback[], themes: PainPointTheme[]) {
  for (const feedback of raw) {
    const lower = feedback.text.toLowerCase();
    for (const theme of themes) {
      const keywords = THEME_KEYWORDS[theme.id] ?? [];
      if (keywords.some(kw => lower.includes(kw))) {
        theme.frequency++;
        if (theme.quotes.length < 5) {
          theme.quotes.push({ text, source, url, author, date });
        }
      }
    }
  }
  // Sort by severity × frequency (highest impact first)
  return themes.sort((a, b) =>
    b.severity * b.frequency - a.severity * a.frequency
  );
}

Theme	Severity	Scope	Sample keywords
Trust & Grounding	5/5	Platform	hallucinate, fabricat, made up, incorrect
Cross-App Context & Memory	5/5	Platform	context, memory, forget, side panel, switch
Mobile & Voice-First AI	4/5	Platform	mobile, phone, voice, hands-free, ios
Context-Aware Writing	4/5	App-level	help me write, generic, tone, style, fluff
Deeper Spreadsheet Intelligence	3/5	App-level	formula, vlookup, array, pivot, sheets
Meeting Intelligence Upgrade	4/5	App-level	meeting, action item, speaker, attribution
Value Perception	3/5	Platform	pricing, cost, expensive, per user, tier

8. Full Pipeline

Sources → KV → Dashboard

All scrapers run in parallel (except YouTube, which requires sequential transcript fetches). Results are saved to two Upstash Redis keys: workspace-ai:raw-feedback (all items) and workspace-ai:snapshot (analyzed themes + competitors). The dashboard reads from the snapshot; the drill-down drawer reads from raw feedback.

typescript

// Full pipeline: scrape → analyze → save to Upstash Redis KV

export async function buildSnapshot() {
  // 1. Scrape all sources in parallel (with timeouts)
  const withTimeout = (p, ms, fallback) =>
    Promise.race([p, new Promise(r => setTimeout(() => r(fallback), ms))]);

  const [reddit, hn, stackoverflow, appstore] = await Promise.all([
    scrapeReddit(),
    scrapeHackerNews(),
    scrapeStackOverflow(),
    withTimeout(scrapeAppStore(), 20_000, []),  // 20s timeout
  ]);
  // YouTube is slower (transcripts), run sequentially after
  const youtube = await withTimeout(scrapeYouTube(), 60_000, []);
  const curated = getCuratedFeedback();

  const allFeedback = [
    ...reddit, ...hn, ...stackoverflow,
    ...appstore, ...youtube, ...curated
  ];

  // 2. Match feedback to themes
  const themes = analyzeFeedback(allFeedback, getDefaultThemes());

  // 3. Aggregate competitor mentions
  const topCompetitors = buildCompetitorRanking(themes);

  // 4. Save to KV (Upstash Redis)
  await saveRawFeedback(allFeedback);  // key: workspace-ai:raw-feedback
  await saveSnapshot({ themes, topCompetitors, ... });  // key: workspace-ai:snapshot
}

9. Curated Baseline Data

15 manually sourced entries with real source URLs

The curated set ensures key themes always have representative quotes, even when scraper volume is low. These are real user complaints sourced from Reddit and Google's own support forums — not invented. Each entry is tagged with the original URL.

#1Enterprise user report·Mar 2026

Gemini in Docs keeps hallucinating content that doesn't exist in my document. I asked it to summarize my notes and it added fictional meetings and action items. This is dangerous for business documents.

Overview

1. Reddit

2. Hacker News

3. Apple App Store

4. Google Play Store

5. Stack Overflow

6. YouTube

7. Theme Analysis Engine

8. Full Pipeline

9. Curated Baseline Data

10. Reproduce It

Trigger a fresh scrape

Download the raw dataset

Get the analyzed snapshot

Clone and run locally