Overview
The approach in one paragraph
1. Reddit
Public JSON API — no auth required
Searches 5 subreddits × 6 search terms = 30 API calls. Each call returns up to 25 posts sorted by relevance within the past year. Posts are deduplicated by Reddit post ID.
// Subreddits searched
const SUBREDDITS = [
"GoogleWorkspace", "google", "artificial",
"productivity", "ChatGPT"
];
// Search terms
const SEARCH_TERMS = [
"gemini workspace", "gemini docs", "gemini gmail",
"gemini sheets", "google ai workspace", "gemini side panel"
];
// Fetch via Reddit public JSON API (no auth required)
const url = `https://www.reddit.com/r/${sub}/search.json
?q=${encodeURIComponent(term)}
&restrict_sr=1&sort=relevance&t=year&limit=25`;
const res = await fetch(url, {
headers: { "User-Agent": "workspace-ai-analyzer/1.0" },
});2. Hacker News
Algolia search API — no auth required
Searches for both stories and comments across 4 queries. Comments are included because they contain more specific technical feedback than top-level submissions. HTML tags are stripped from comment bodies.
// Uses Algolia HN search API — no auth required
const queries = [
"gemini workspace", "google gemini docs",
"gemini gmail", "google ai productivity"
];
// Fetch stories
fetch(`https://hn.algolia.com/api/v1/search
?query=${query}&tags=story&hitsPerPage=30`);
// Also fetch comments (richer feedback signal)
fetch(`https://hn.algolia.com/api/v1/search
?query=${query}&tags=comment&hitsPerPage=50`);
// Strip HTML tags from comment text
const text = hit.comment_text
.replace(/<[^>]*>/g, " ")
.replace(/\s+/g, " ")
.trim();3. Apple App Store
app-store-scraper npm package
Scrapes the 5 most relevant Google Workspace apps (Gmail, Docs, Sheets, Slides, Meet) — 5 pages each, sorted by most recent. Reviews are filtered to only those mentioning AI-related keywords to reduce noise.
// App Store IDs scraped (5 apps, 5 pages each)
const apps = [
{ id: 422689480, name: "gmail" },
{ id: 842842640, name: "docs" },
{ id: 842849113, name: "sheets" },
{ id: 879478102, name: "slides" },
{ id: 1013161476, name: "meet" },
];
// Only keep reviews mentioning AI-related keywords
const AI_KEYWORDS = [
"ai", "gemini", "smart", "suggest", "summary",
"compose", "write", "draft", "autocomplete",
"hallucin", "context", "side panel", "useless",
"broken", "buggy", "slow"
];
const reviews = await store.reviews({
id: app.id,
sort: store.sort.RECENT,
page, // pages 1–5
country: "us",
});4. Google Play Store
google-play-scraper npm package
Scrapes the 4 Android Workspace apps (Gmail, Docs, Sheets, Slides) — 30 most recent reviews each. Same AI-keyword filter applied as App Store.
// Play Store app IDs scraped
const apps = [
{ id: "com.google.android.gm", name: "gmail" },
{ id: "com.google.android.apps.docs", name: "docs" },
{ id: "com.google.android.apps.docs.editors.sheets",name: "sheets" },
{ id: "com.google.android.apps.docs.editors.slides",name: "slides" },
];
// Filter: only reviews mentioning AI/Gemini
const text = (review.text ?? "").toLowerCase();
if (
!text.includes("ai") &&
!text.includes("gemini") &&
!text.includes("smart") &&
!text.includes("suggest")
) continue;5. Stack Overflow
StackExchange public API v2.3 — no auth required
Searches both stackoverflow.com and webapps.stackexchange.com (the latter has far more Workspace-specific questions). Full question bodies are fetched via the filter=withbody parameter.
// Stack Overflow public API v2.3 (no auth)
// Searches both stackoverflow.com and webapps.stackexchange.com
const queries = [
"gemini+google+workspace", "gemini+docs",
"gemini+gmail", "gemini+sheets", "google+ai+workspace"
];
fetch(`https://api.stackexchange.com/2.3/search/advanced
?order=desc&sort=relevance&q=${query}
&site=stackoverflow&pagesize=30&filter=withbody`);
// Also webapps.stackexchange.com (more Workspace questions)
fetch(`https://api.stackexchange.com/2.3/search/advanced
?q=${query}&site=webapps&pagesize=30&filter=withbody`);6. YouTube
HTML scraping + youtube-transcript-api (Python)
YouTube has no public search API without quota limits. Instead, the scraper parses video IDs from the YouTube search results HTML (the ytInitialData JSON embedded in the page). Transcripts are fetched via the open-source youtube-transcript-api Python library — no API key needed. Up to 15 videos are transcribed per run.
// Step 1: Search YouTube HTML for video IDs
// No API key — parses ytInitialData from page HTML
const url = `https://www.youtube.com/results
?search_query=${query}&sp=CAISBAgCEAE`; // filter: this year
const ids = [...html.matchAll(
/\"videoId\":\"([a-zA-Z0-9_-]{11})\"/g
)].map(m => m[1]);
// Step 2: Fetch transcript via youtube-transcript-api (Python)
// Requires: pip install youtube-transcript-api
execSync(`python3 -c "
from youtube_transcript_api import YouTubeTranscriptApi
api = YouTubeTranscriptApi()
transcript = api.fetch('${videoId}', languages=['en'])
text = ' '.join([s.text for s in transcript.snippets])
print(text[:2000])
"`);7. Theme Analysis Engine
Keyword matching — no ML, fully auditable
Each raw feedback item is matched against 7 keyword lists via simple substring search. A single item can match multiple themes. Themes are ranked by severity × frequency — highest impact first. Severity is manually assessed; frequency is computed.
// Each theme has a keyword list
const THEME_KEYWORDS: Record<string, string[]> = {
"hallucination": [
"hallucinate", "hallucination", "made up",
"fabricat", "incorrect", "wrong information",
"inaccurate", "false", "imagin"
],
"cross-app-memory": [
"context", "memory", "remember", "forget",
"loses context", "switch", "cross-app",
"between apps", "side panel", "persistent"
],
"mobile-voice": [
"mobile", "phone", "voice", "hands-free",
"android", "ios", "cramped", "small screen"
],
// ... (see analysis.ts for full list)
};
// Match: simple keyword substring search (case-insensitive)
function analyzeFeedback(raw: RawFeedback[], themes: PainPointTheme[]) {
for (const feedback of raw) {
const lower = feedback.text.toLowerCase();
for (const theme of themes) {
const keywords = THEME_KEYWORDS[theme.id] ?? [];
if (keywords.some(kw => lower.includes(kw))) {
theme.frequency++;
if (theme.quotes.length < 5) {
theme.quotes.push({ text, source, url, author, date });
}
}
}
}
// Sort by severity × frequency (highest impact first)
return themes.sort((a, b) =>
b.severity * b.frequency - a.severity * a.frequency
);
}| Theme | Severity | Scope | Sample keywords |
|---|---|---|---|
| Trust & Grounding | 5/5 | Platform | hallucinate, fabricat, made up, incorrect |
| Cross-App Context & Memory | 5/5 | Platform | context, memory, forget, side panel, switch |
| Mobile & Voice-First AI | 4/5 | Platform | mobile, phone, voice, hands-free, ios |
| Context-Aware Writing | 4/5 | App-level | help me write, generic, tone, style, fluff |
| Deeper Spreadsheet Intelligence | 3/5 | App-level | formula, vlookup, array, pivot, sheets |
| Meeting Intelligence Upgrade | 4/5 | App-level | meeting, action item, speaker, attribution |
| Value Perception | 3/5 | Platform | pricing, cost, expensive, per user, tier |
8. Full Pipeline
Sources → KV → Dashboard
All scrapers run in parallel (except YouTube, which requires sequential transcript fetches). Results are saved to two Upstash Redis keys: workspace-ai:raw-feedback (all items) and workspace-ai:snapshot (analyzed themes + competitors). The dashboard reads from the snapshot; the drill-down drawer reads from raw feedback.
// Full pipeline: scrape → analyze → save to Upstash Redis KV
export async function buildSnapshot() {
// 1. Scrape all sources in parallel (with timeouts)
const withTimeout = (p, ms, fallback) =>
Promise.race([p, new Promise(r => setTimeout(() => r(fallback), ms))]);
const [reddit, hn, stackoverflow, appstore] = await Promise.all([
scrapeReddit(),
scrapeHackerNews(),
scrapeStackOverflow(),
withTimeout(scrapeAppStore(), 20_000, []), // 20s timeout
]);
// YouTube is slower (transcripts), run sequentially after
const youtube = await withTimeout(scrapeYouTube(), 60_000, []);
const curated = getCuratedFeedback();
const allFeedback = [
...reddit, ...hn, ...stackoverflow,
...appstore, ...youtube, ...curated
];
// 2. Match feedback to themes
const themes = analyzeFeedback(allFeedback, getDefaultThemes());
// 3. Aggregate competitor mentions
const topCompetitors = buildCompetitorRanking(themes);
// 4. Save to KV (Upstash Redis)
await saveRawFeedback(allFeedback); // key: workspace-ai:raw-feedback
await saveSnapshot({ themes, topCompetitors, ... }); // key: workspace-ai:snapshot
}9. Curated Baseline Data
15 manually sourced entries with real source URLs
The curated set ensures key themes always have representative quotes, even when scraper volume is low. These are real user complaints sourced from Reddit and Google's own support forums — not invented. Each entry is tagged with the original URL.
Gemini in Docs keeps hallucinating content that doesn't exist in my document. I asked it to summarize my notes and it added fictional meetings and action items. This is dangerous for business documents.
The Gemini side panel loses context every time I switch between Gmail and Docs. I'll be working on an email thread, switch to reference a doc, and when I come back Gemini has no memory of what we were discussing. Makes it useless for multi-app workflows.
I switched from Gemini to ChatGPT for spreadsheet formulas. Gemini suggestions in Sheets are basic — it can't handle complex lookups or array formulas. ChatGPT gets them right first try and explains the logic.
Google Meet summaries are hit or miss. Half the time it attributes action items to the wrong person, and it completely misses side conversations. We've gone back to taking notes manually.
Why can't Gemini in Gmail draft a reply based on context from a Drive doc I share in the thread? It only knows about the email text. Meanwhile Copilot in Outlook pulls from SharePoint, Teams chats, everything.
Gemini for Workspace's mobile experience is terrible. The side panel is cramped, voice input doesn't work half the time, and there's no way to use it hands-free. On desktop it's decent but on my phone it's unusable.
We're paying $30/user/month for Workspace Business Plus with Gemini, but most of our team ignores the AI features because they're not reliable enough to trust. We could save money by dropping the AI tier and just using ChatGPT Team for $25/user.
The 'Help me write' feature in Docs generates generic corporate fluff. It doesn't learn my writing style, doesn't match the tone of the rest of my document, and I end up rewriting 90% of what it generates. Notion AI does a much better job matching context.
Gemini in Calendar can't intelligently schedule meetings. It doesn't consider my working hours preferences, travel time, or even basic context like 'schedule this after my 1:1 with Sarah.' Reclaim.ai does this 10x better.
Tried using Gemini to analyze a 50-page doc in Drive. It gave me a summary of the first 5 pages and hallucinated the rest. The context window might be large but it clearly isn't using it well for long documents.
Privacy is a real concern. Gemini accessing my inbox by default with no clear opt-out is a dealbreaker for our legal team. We had to disable it org-wide which means we lose ALL AI features, not just the inbox scanning.
The Gemini side panel in Slides is basically useless. It can generate an image or suggest a layout, but it can't restructure a presentation, reorder slides based on narrative flow, or create speaker notes that match my style. Gamma.app does all of this.
Why doesn't Gemini in Chat summarize thread history? In Slack, AI catches you up on channels you missed. In Google Chat, Gemini can't even tell me what happened while I was away. Basic feature that's missing.
Enterprise rollout of Gemini for Workspace has been painful. No granular admin controls — it's all or nothing. Can't enable AI for specific teams, can't restrict which data it accesses, can't set per-user policies. Microsoft Copilot admin controls are way ahead.
Gemini can't work with attachments in Gmail. I get a PDF invoice, ask Gemini to extract the total and due date, and it says it can't access attachments. I have to download, upload to ChatGPT, and paste the answer back. Terrible workflow.
10. Reproduce It
Run the full pipeline yourself
Trigger a fresh scrape
curl -X POST \
"https://schlacter.me/workspace-ai-gaps/api/scrape\
?secret=YOUR_SYNC_SECRET"Requires SYNC_SECRET env var. Returns theme counts and source volumes.
Download the raw dataset
curl "https://schlacter.me/workspace-ai-gaps/api/export" \
-o raw-feedback.jsonPublic endpoint. Returns all >1,000 feedback items as JSON.
Get the analyzed snapshot
curl "https://schlacter.me/workspace-ai-gaps/api/analyze"Returns themes with frequency counts, quotes, competitor data, and source breakdown.
Clone and run locally
git clone https://github.com/hbschlac/workspace-ai-research
cd workspace-ai-research
# Install dependencies
npm install google-play-scraper app-store-scraper @upstash/redis
# Install Python transcript fetcher
pip install youtube-transcript-api
# Scraper and analysis code is in:
# scraper.ts — all 6 data source scrapers
# analysis.ts — keyword matching + theme engine
# data/ — curated-feedback.json, raw-feedback.json