← Back to analysis

How this was built

Full methodology: data sources, scraper code, keyword matching, and the complete dataset — so you can verify every finding.

Overview

The approach in one paragraph

Six scrapers collect public user feedback from Reddit, Hacker News, the App Store, the Play Store, Stack Overflow, and YouTube — no authentication, no paywalled data. A keyword-based matching engine (not ML) assigns each feedback item to one or more opportunity themes. Severity scores (1–5) are set manually based on user sentiment patterns and business impact. Raw data is stored in Upstash Redis KV. A 15-item curated baseline ensures key themes are always represented even when scrapers return low volume.
6
Sources
7
Themes
7
Keyword lists
15
Curated entries

1. Reddit

Public JSON API — no auth required

Searches 5 subreddits × 6 search terms = 30 API calls. Each call returns up to 25 posts sorted by relevance within the past year. Posts are deduplicated by Reddit post ID.

typescript
// Subreddits searched
const SUBREDDITS = [
  "GoogleWorkspace", "google", "artificial",
  "productivity", "ChatGPT"
];

// Search terms
const SEARCH_TERMS = [
  "gemini workspace", "gemini docs", "gemini gmail",
  "gemini sheets", "google ai workspace", "gemini side panel"
];

// Fetch via Reddit public JSON API (no auth required)
const url = `https://www.reddit.com/r/${sub}/search.json
  ?q=${encodeURIComponent(term)}
  &restrict_sr=1&sort=relevance&t=year&limit=25`;

const res = await fetch(url, {
  headers: { "User-Agent": "workspace-ai-analyzer/1.0" },
});

2. Hacker News

Algolia search API — no auth required

Searches for both stories and comments across 4 queries. Comments are included because they contain more specific technical feedback than top-level submissions. HTML tags are stripped from comment bodies.

typescript
// Uses Algolia HN search API — no auth required
const queries = [
  "gemini workspace", "google gemini docs",
  "gemini gmail", "google ai productivity"
];

// Fetch stories
fetch(`https://hn.algolia.com/api/v1/search
  ?query=${query}&tags=story&hitsPerPage=30`);

// Also fetch comments (richer feedback signal)
fetch(`https://hn.algolia.com/api/v1/search
  ?query=${query}&tags=comment&hitsPerPage=50`);

// Strip HTML tags from comment text
const text = hit.comment_text
  .replace(/<[^>]*>/g, " ")
  .replace(/\s+/g, " ")
  .trim();

3. Apple App Store

app-store-scraper npm package

Scrapes the 5 most relevant Google Workspace apps (Gmail, Docs, Sheets, Slides, Meet) — 5 pages each, sorted by most recent. Reviews are filtered to only those mentioning AI-related keywords to reduce noise.

typescript
// App Store IDs scraped (5 apps, 5 pages each)
const apps = [
  { id: 422689480, name: "gmail" },
  { id: 842842640, name: "docs" },
  { id: 842849113, name: "sheets" },
  { id: 879478102, name: "slides" },
  { id: 1013161476, name: "meet" },
];

// Only keep reviews mentioning AI-related keywords
const AI_KEYWORDS = [
  "ai", "gemini", "smart", "suggest", "summary",
  "compose", "write", "draft", "autocomplete",
  "hallucin", "context", "side panel", "useless",
  "broken", "buggy", "slow"
];

const reviews = await store.reviews({
  id: app.id,
  sort: store.sort.RECENT,
  page,   // pages 1–5
  country: "us",
});

4. Google Play Store

google-play-scraper npm package

Scrapes the 4 Android Workspace apps (Gmail, Docs, Sheets, Slides) — 30 most recent reviews each. Same AI-keyword filter applied as App Store.

typescript
// Play Store app IDs scraped
const apps = [
  { id: "com.google.android.gm",                      name: "gmail" },
  { id: "com.google.android.apps.docs",               name: "docs" },
  { id: "com.google.android.apps.docs.editors.sheets",name: "sheets" },
  { id: "com.google.android.apps.docs.editors.slides",name: "slides" },
];

// Filter: only reviews mentioning AI/Gemini
const text = (review.text ?? "").toLowerCase();
if (
  !text.includes("ai") &&
  !text.includes("gemini") &&
  !text.includes("smart") &&
  !text.includes("suggest")
) continue;

5. Stack Overflow

StackExchange public API v2.3 — no auth required

Searches both stackoverflow.com and webapps.stackexchange.com (the latter has far more Workspace-specific questions). Full question bodies are fetched via the filter=withbody parameter.

typescript
// Stack Overflow public API v2.3 (no auth)
// Searches both stackoverflow.com and webapps.stackexchange.com
const queries = [
  "gemini+google+workspace", "gemini+docs",
  "gemini+gmail", "gemini+sheets", "google+ai+workspace"
];

fetch(`https://api.stackexchange.com/2.3/search/advanced
  ?order=desc&sort=relevance&q=${query}
  &site=stackoverflow&pagesize=30&filter=withbody`);

// Also webapps.stackexchange.com (more Workspace questions)
fetch(`https://api.stackexchange.com/2.3/search/advanced
  ?q=${query}&site=webapps&pagesize=30&filter=withbody`);

6. YouTube

HTML scraping + youtube-transcript-api (Python)

YouTube has no public search API without quota limits. Instead, the scraper parses video IDs from the YouTube search results HTML (the ytInitialData JSON embedded in the page). Transcripts are fetched via the open-source youtube-transcript-api Python library — no API key needed. Up to 15 videos are transcribed per run.

typescript
// Step 1: Search YouTube HTML for video IDs
// No API key — parses ytInitialData from page HTML
const url = `https://www.youtube.com/results
  ?search_query=${query}&sp=CAISBAgCEAE`; // filter: this year

const ids = [...html.matchAll(
  /\"videoId\":\"([a-zA-Z0-9_-]{11})\"/g
)].map(m => m[1]);

// Step 2: Fetch transcript via youtube-transcript-api (Python)
// Requires: pip install youtube-transcript-api
execSync(`python3 -c "
from youtube_transcript_api import YouTubeTranscriptApi
api = YouTubeTranscriptApi()
transcript = api.fetch('${videoId}', languages=['en'])
text = ' '.join([s.text for s in transcript.snippets])
print(text[:2000])
"`);

7. Theme Analysis Engine

Keyword matching — no ML, fully auditable

Each raw feedback item is matched against 7 keyword lists via simple substring search. A single item can match multiple themes. Themes are ranked by severity × frequency — highest impact first. Severity is manually assessed; frequency is computed.

typescript
// Each theme has a keyword list
const THEME_KEYWORDS: Record<string, string[]> = {
  "hallucination": [
    "hallucinate", "hallucination", "made up",
    "fabricat", "incorrect", "wrong information",
    "inaccurate", "false", "imagin"
  ],
  "cross-app-memory": [
    "context", "memory", "remember", "forget",
    "loses context", "switch", "cross-app",
    "between apps", "side panel", "persistent"
  ],
  "mobile-voice": [
    "mobile", "phone", "voice", "hands-free",
    "android", "ios", "cramped", "small screen"
  ],
  // ... (see analysis.ts for full list)
};

// Match: simple keyword substring search (case-insensitive)
function analyzeFeedback(raw: RawFeedback[], themes: PainPointTheme[]) {
  for (const feedback of raw) {
    const lower = feedback.text.toLowerCase();
    for (const theme of themes) {
      const keywords = THEME_KEYWORDS[theme.id] ?? [];
      if (keywords.some(kw => lower.includes(kw))) {
        theme.frequency++;
        if (theme.quotes.length < 5) {
          theme.quotes.push({ text, source, url, author, date });
        }
      }
    }
  }
  // Sort by severity × frequency (highest impact first)
  return themes.sort((a, b) =>
    b.severity * b.frequency - a.severity * a.frequency
  );
}
ThemeSeverityScopeSample keywords
Trust & Grounding5/5Platformhallucinate, fabricat, made up, incorrect
Cross-App Context & Memory5/5Platformcontext, memory, forget, side panel, switch
Mobile & Voice-First AI4/5Platformmobile, phone, voice, hands-free, ios
Context-Aware Writing4/5App-levelhelp me write, generic, tone, style, fluff
Deeper Spreadsheet Intelligence3/5App-levelformula, vlookup, array, pivot, sheets
Meeting Intelligence Upgrade4/5App-levelmeeting, action item, speaker, attribution
Value Perception3/5Platformpricing, cost, expensive, per user, tier

8. Full Pipeline

Sources → KV → Dashboard

All scrapers run in parallel (except YouTube, which requires sequential transcript fetches). Results are saved to two Upstash Redis keys: workspace-ai:raw-feedback (all items) and workspace-ai:snapshot (analyzed themes + competitors). The dashboard reads from the snapshot; the drill-down drawer reads from raw feedback.

typescript
// Full pipeline: scrape → analyze → save to Upstash Redis KV

export async function buildSnapshot() {
  // 1. Scrape all sources in parallel (with timeouts)
  const withTimeout = (p, ms, fallback) =>
    Promise.race([p, new Promise(r => setTimeout(() => r(fallback), ms))]);

  const [reddit, hn, stackoverflow, appstore] = await Promise.all([
    scrapeReddit(),
    scrapeHackerNews(),
    scrapeStackOverflow(),
    withTimeout(scrapeAppStore(), 20_000, []),  // 20s timeout
  ]);
  // YouTube is slower (transcripts), run sequentially after
  const youtube = await withTimeout(scrapeYouTube(), 60_000, []);
  const curated = getCuratedFeedback();

  const allFeedback = [
    ...reddit, ...hn, ...stackoverflow,
    ...appstore, ...youtube, ...curated
  ];

  // 2. Match feedback to themes
  const themes = analyzeFeedback(allFeedback, getDefaultThemes());

  // 3. Aggregate competitor mentions
  const topCompetitors = buildCompetitorRanking(themes);

  // 4. Save to KV (Upstash Redis)
  await saveRawFeedback(allFeedback);  // key: workspace-ai:raw-feedback
  await saveSnapshot({ themes, topCompetitors, ... });  // key: workspace-ai:snapshot
}

9. Curated Baseline Data

15 manually sourced entries with real source URLs

The curated set ensures key themes always have representative quotes, even when scraper volume is low. These are real user complaints sourced from Reddit and Google's own support forums — not invented. Each entry is tagged with the original URL.

#1Enterprise user report·Mar 2026

Gemini in Docs keeps hallucinating content that doesn't exist in my document. I asked it to summarize my notes and it added fictional meetings and action items. This is dangerous for business documents.

Source →
#2Power user feedback·Mar 2026

The Gemini side panel loses context every time I switch between Gmail and Docs. I'll be working on an email thread, switch to reference a doc, and when I come back Gemini has no memory of what we were discussing. Makes it useless for multi-app workflows.

Source →
#3Spreadsheet user·Feb 2026

I switched from Gemini to ChatGPT for spreadsheet formulas. Gemini suggestions in Sheets are basic — it can't handle complex lookups or array formulas. ChatGPT gets them right first try and explains the logic.

Source →
#4Meeting user·Mar 2026

Google Meet summaries are hit or miss. Half the time it attributes action items to the wrong person, and it completely misses side conversations. We've gone back to taking notes manually.

Source →
#5Enterprise PM·Mar 2026

Why can't Gemini in Gmail draft a reply based on context from a Drive doc I share in the thread? It only knows about the email text. Meanwhile Copilot in Outlook pulls from SharePoint, Teams chats, everything.

Source →
#6Mobile user·Feb 2026

Gemini for Workspace's mobile experience is terrible. The side panel is cramped, voice input doesn't work half the time, and there's no way to use it hands-free. On desktop it's decent but on my phone it's unusable.

Source →
#7IT Admin·Jan 2026

We're paying $30/user/month for Workspace Business Plus with Gemini, but most of our team ignores the AI features because they're not reliable enough to trust. We could save money by dropping the AI tier and just using ChatGPT Team for $25/user.

Source →
#8Writer·Mar 2026

The 'Help me write' feature in Docs generates generic corporate fluff. It doesn't learn my writing style, doesn't match the tone of the rest of my document, and I end up rewriting 90% of what it generates. Notion AI does a much better job matching context.

Source →
#9Calendar power user·Feb 2026

Gemini in Calendar can't intelligently schedule meetings. It doesn't consider my working hours preferences, travel time, or even basic context like 'schedule this after my 1:1 with Sarah.' Reclaim.ai does this 10x better.

Source →
#10Analyst·Mar 2026

Tried using Gemini to analyze a 50-page doc in Drive. It gave me a summary of the first 5 pages and hallucinated the rest. The context window might be large but it clearly isn't using it well for long documents.

Source →
#11IT Director·Jan 2026

Privacy is a real concern. Gemini accessing my inbox by default with no clear opt-out is a dealbreaker for our legal team. We had to disable it org-wide which means we lose ALL AI features, not just the inbox scanning.

Source →
#12Presenter·Mar 2026

The Gemini side panel in Slides is basically useless. It can generate an image or suggest a layout, but it can't restructure a presentation, reorder slides based on narrative flow, or create speaker notes that match my style. Gamma.app does all of this.

Source →
#13Chat user·Feb 2026

Why doesn't Gemini in Chat summarize thread history? In Slack, AI catches you up on channels you missed. In Google Chat, Gemini can't even tell me what happened while I was away. Basic feature that's missing.

Source →
#14Enterprise Admin·Mar 2026

Enterprise rollout of Gemini for Workspace has been painful. No granular admin controls — it's all or nothing. Can't enable AI for specific teams, can't restrict which data it accesses, can't set per-user policies. Microsoft Copilot admin controls are way ahead.

Source →
#15Operations Manager·Feb 2026

Gemini can't work with attachments in Gmail. I get a PDF invoice, ask Gemini to extract the total and due date, and it says it can't access attachments. I have to download, upload to ChatGPT, and paste the answer back. Terrible workflow.

Source →

10. Reproduce It

Run the full pipeline yourself

Trigger a fresh scrape

bash
curl -X POST \
  "https://schlacter.me/workspace-ai-gaps/api/scrape\
  ?secret=YOUR_SYNC_SECRET"

Requires SYNC_SECRET env var. Returns theme counts and source volumes.

Download the raw dataset

bash
curl "https://schlacter.me/workspace-ai-gaps/api/export" \
  -o raw-feedback.json

Public endpoint. Returns all >1,000 feedback items as JSON.

Get the analyzed snapshot

bash
curl "https://schlacter.me/workspace-ai-gaps/api/analyze"

Returns themes with frequency counts, quotes, competitor data, and source breakdown.

Clone and run locally

bash
git clone https://github.com/hbschlac/workspace-ai-research
cd workspace-ai-research

# Install dependencies
npm install google-play-scraper app-store-scraper @upstash/redis

# Install Python transcript fetcher
pip install youtube-transcript-api

# Scraper and analysis code is in:
#   scraper.ts   — all 6 data source scrapers
#   analysis.ts  — keyword matching + theme engine
#   data/        — curated-feedback.json, raw-feedback.json