Feature Roadmap
Overview
Transform PolySuggest from a market ideation tool into the most precise decision-support terminal for Polymarket traders. Total duration: 12 months across 4 phases.
Phase 1: MVP Market Intelligence Hub (4-6 weeks)
Goal: Real-time market data + edge detection + basic calibration tracking
Week 1: Real-Time Market Data Pipeline
Objectives:
- Async Gamma API polling (1-second updates)
- OHLCV storage in TimescaleDB
- In-memory cache for latest prices
- REST endpoints for market data
Deliverables:
market_data.py– Async poller, caching, rate limitingliquidity_analyzer.py– Spread, depth, volume metricstime_series.py– OHLCV storage & retrieval- Tests:
test_market_data.py - Docs: "Real-Time Data" section in README
Acceptance Criteria:
- Sub-1s latency for price updates
- 99.5% API availability (with circuit breaker)
- Support 50+ markets simultaneously
PR: #1-real-time-gamma-api
Week 2: Edge Detection & Alerts
Objectives:
- Implement edge detector (fair prob vs market prob)
- Compare trend sentiment to market probability
- Flag mismatches > 25%
- Real-time alert system (CLI + webhook)
Deliverables:
edge_detector.py– Divergence scoring, decayalerts.py– Alert broker (stdout, slack, webhook)- Extend
schemas.pywith Edge model - CLI command:
polysuggest edges --top 10
Acceptance Criteria:
- Edge detection with >70% historical accuracy
- Alerts fire within 10 seconds of detection
- Webhook integration with Slack/Discord
PR: #2-edge-detection
Week 3: Calibration Tracking
Objectives:
- Track all suggestions in database
- Link suggestions to actual resolutions
- Compute calibration metrics (Brier score, accuracy, ROI)
- Generate calibration heatmap
Deliverables:
calibration.py– Compute historical accuracy metrics- Extend
storage.pywith resolution tracking - Backfill historical suggestions from database
- CLI command:
polysuggest calibrate
Acceptance Criteria:
- 100% of past suggestions linked to resolutions (if available)
- Calibration curve within 5% of actual resolution rate
- Brier score calculated correctly
PR: #3-calibration
Week 4: Dashboard MVP
Objectives:
- Real-time price ticker (top 10 trending markets)
- Edge leaderboard (sorted by EV)
- Calibration heatmap (confidence vs actual accuracy)
- WebSocket server for live updates
Deliverables:
- Next.js components:
PriceTicker.tsx– Live pricesEdgeLeaderboard.tsx– Top edgesCalibrationChart.tsx– Historical accuracy
- FastAPI WebSocket endpoint:
WS /ws/edges - Export to CSV
Acceptance Criteria:
- Dashboard refreshing every 5 seconds
- WebSocket connection stable for 1+ hour
<100mslatency for UI updates
PR: #4-dashboard-mvp
Phase 1 Success Metrics
- Sub-1s latency for price updates
- Edge detection with >70% historical accuracy
- Calibration curve within 5% of actual rate
- Dashboard refreshing every 5 seconds
- 10+ beta users testing MVP
Phase 2: Edge Engine & Probability Calibration (6-8 weeks)
Goal: Calibrated probability estimates + EV ranking + Bayesian inference
Week 5-6: Bayesian Probability Estimation
Objectives:
- Implement Bayesian inference engine
- Train on 12 months of historical Polymarket data
- Compute fair probability with credible intervals
- Integrate base rate calculator (by market category)
Deliverables:
inference_engine.py– Bayesian model (PyMC3/Pyro)base_rates.py– Historical priors by categoryfeature_engineering.py– Sentiment, volume, order flow features- Training pipeline:
train_bayesian_model.py - Tests:
test_inference_engine.py
Acceptance Criteria:
- Fair prob estimates within ±0.08 credible interval
- Model trained on 500+ historical markets
- Inference time <5s per market
PR: #5-bayesian-inference
Week 7: Expected Value (EV) Ranking
Objectives:
- Compute fair price for all markets
- Compare to Gamma market price
- Rank by risk-adjusted EV
- Implement Kelly fraction sizing
Deliverables:
ev_calculator.py– EV, sizing, risk metricsportfolio_optimizer.py– Kelly fraction, risk parity- API endpoint:
GET /api/edges?sort=ev - CLI:
polysuggest edges list --sort ev
Acceptance Criteria:
- EV calculated correctly (fair_prob / market_prob - 1)
- Leaderboard updates every 1 minute
- Kelly sizing never exceeds bankroll limits
PR: #6-ev-ranking
Week 8: Advanced Calibration
Objectives:
- Compute Brier score, accuracy, Sharpe ratio
- Group calibration by: confidence bucket, category, time horizon
- Detect model bias (overconfident? underconfident?)
- ROI tracking per suggestion
Deliverables:
calibration.py– Enhanced metrics (Brier, Sharpe, ROI)- Dashboard:
CalibrationBuckets.tsx– Breakdowns by confidence - CLI:
polysuggest calibrate show --by-confidence
Acceptance Criteria:
- Calibration curve shows increasing accuracy with confidence
- Model bias <5% (predictions vs actuals)
PR: #7-advanced-calibration
Phase 2 Success Metrics
- Fair probability estimates within ±0.08 credible interval
- Edge detection accuracy improved to 75%+
- Calibration heatmap shows proper confidence scaling
- EV ranking leaderboard updated every 1 min
Phase 3: Portfolio & Execution (8-12 weeks)
Goal: Position tracking, risk management, order execution
Week 9-10: Position Tracking & Risk Management
Objectives:
- Track all user positions (entry price, qty, date)
- Compute real-time P&L against Gamma prices
- Calculate portfolio heat (worst-case loss)
- Estimate correlation between markets
- Liquidation risk warnings
Deliverables:
portfolio_tracker.py– Position ledger, P&L, exposurecorrelation_analyzer.py– Outcome correlationsrisk_manager.py– VaR, stress testing, hedging- PostgreSQL tables:
positions,orders,trades - API endpoints:
GET /api/portfolio/*
Acceptance Criteria:
- Position P&L accurate within 0.1%
- Portfolio heat computed in <1s
- Correlation estimates from historical data (or crowdsourced)
PR: #8-portfolio-tracking
Week 11-12: Order Management & Execution
Objectives:
- Place limit orders via Gamma API
- Ladder orders (split into tranches)
- Estimate slippage from order book depth
- Track fills and execution quality
- Paper trading mode (log but don't execute)
Deliverables:
order_manager.py– Order lifecycleexecution_simulator.py– Backtest fillsorder_ledger.py– Order history + analytics- CLI:
polysuggest orders place - CLI:
polysuggest orders status
Acceptance Criteria:
- Limit orders placed successfully
- Slippage estimate within ±2% of actual
- Paper trading mode fully functional
PR: #9-order-management (paper trading)
Week 13-14: Backtesting Engine
Objectives:
- Replay historical market data
- Simulate edge detection at each timestamp
- Execute orders at historical prices
- Compute realized P&L, Sharpe, max drawdown
- Benchmark vs buy-and-hold
Deliverables:
backtester.py– Historical simulation enginebenchmarks.py– Baseline strategies- Dashboard:
BacktestRunner.tsx+BacktestResults.tsx - CLI:
polysuggest backtest run
Acceptance Criteria:
- Backtest 12-month period in <5 minutes
- P&L results match manual calculation
- Sharpe ratio computed correctly
PR: #10-backtester
Phase 3 Success Metrics
- Position P&L accurate within 0.1%
- Portfolio heat calculations <1s
- Order slippage estimates within ±2%
- Backtest engine handles 12-month periods
- Paper trading validated against live data
Phase 4: Research & ML (12+ weeks)
Goal: Multi-model ensemble, semantic search, automated research reports
Week 15-16: Multi-Model Consensus
Objectives:
- Train SVM classifier on historical patterns
- Train LSTM on price momentum & volume
- Ensemble voting (confidence = % agreement)
- Flag high-uncertainty predictions
Deliverables:
models/svm_classifier.py– Historical pattern matchingmodels/lstm_momentum.py– Time series modelconsensus.py– Ensemble voting + weighting- Model registry + versioning
Acceptance Criteria:
- Ensemble beats GPT-4o baseline by 5%+
- Model agreement correlates with accuracy
PR: #11-ensemble-models
Week 17-18: Semantic Search (RAG)
Objectives:
- Embed market descriptions + context
- Semantic search on market history
- RAG: retrieve similar markets as examples
- Build vector database (Chroma)
Deliverables:
semantic_search.py– Market history RAGembedding_service.py– Maintain vector DB- CLI:
polysuggest research search "..." - Vector DB schema + ingestion pipeline
Acceptance Criteria:
- Search returns semantically similar markets
- Embedding quality validated by manual review
PR: #12-semantic-search
Week 19-20: Automated Research Reports
Objectives:
- Daily digest: top edges, emerging trends, P&L
- Market calendar with close-to-resolution warnings
- Portfolio rebalance suggestions
- Delivery: email + Slack + web dashboard
Deliverables:
report_generator.py– Daily digest builderemail_service.py– Email deliveryslack_integration.py– Slack posting- Scheduled task (Celery cron)
- Dashboard:
ResearchReport.tsx
Acceptance Criteria:
- Reports generated daily at fixed time
- Email delivery 99%+ success rate
- Report content manually reviewed for quality
PR: #13-research-reports
Phase 4 Success Metrics
- Ensemble model beats GPT-4o by 5%+
- Semantic search returns relevant markets
- Daily reports delivered on schedule
- Report quality validated by 5+ beta users
Parallel Work Streams
Infrastructure (Throughout All Phases)
- PostgreSQL setup + migrations
- TimescaleDB setup + hypertables
- Chroma vector DB setup
- Redis cache setup
- Docker Compose for local dev
- CI/CD pipeline (GitHub Actions)
- Monitoring + observability (Prometheus, Grafana)
- Logging centralization (ELK or Loki)
Testing & QA (Throughout All Phases)
- Unit tests (>80% coverage)
- Integration tests for API endpoints
- Load testing (50+ concurrent users)
- End-to-end tests (CLI + dashboard)
- Historical data validation (backtests)
Documentation (Throughout All Phases)
- API reference (OpenAPI/Swagger)
- CLI command reference
- Architecture guide
- Deployment guide
- Development setup
- Database schema documentation
Success Criteria by Phase
Phase 1 (MVP)
- ✅ Sub-1s price updates
- ✅ >70% edge detection accuracy
- ✅ Calibration within 5% of actual
- ✅ 10+ beta users
Phase 2 (Calibration)
- ✅ Fair probability within ±0.08 credible interval
- ✅ 75%+ edge accuracy
- ✅ Proper confidence scaling in calibration
Phase 3 (Execution)
- ✅ Position P&L accurate within 0.1%
- ✅ Order slippage within ±2%
- ✅ Backtest engine validated
Phase 4 (ML)
- ✅ Ensemble beats baseline by 5%+
- ✅ Semantic search quality validated
- ✅ 5+ beta users actively using reports
Monetization Timeline
- Phase 1-2: Free tier (public edges, CLI-only)
- Phase 3: Launch Pro tier ($99/mo)
- Phase 4: Enterprise tier (custom, SLA)
Risk Mitigation
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Gamma API downtime | Medium | High | Cache + fallback to CCIP oracles |
| Model overfitting | Medium | High | Walk-forward validation, Sharpe caps |
| Regulatory (US) | Low | High | Position as "decision support", not automated algo |
| Competitor execution platforms | Medium | Medium | Own research/ML, not just API wrapper |
| LLM API costs | Low | Medium | Fine-tune open-source (Llama) fallback |
| Database scaling | Low | Medium | Use managed Postgres (AWS RDS), TimescaleDB |
Resource Requirements
- 1 Backend Engineer (Python/FastAPI, DB, inference)
- 1 Frontend Engineer (React/Next.js, dashboard)
- 1 ML Engineer (Bayesian inference, model training)
- 1 DevOps/Infrastructure (databases, deployments, monitoring)
- 1 PM/QA (roadmap, testing, user feedback)
Total: 5 FTE (can be compressed with overlap)
Metrics Dashboard
Track progress with:
- Product KPIs: Active users, edge accuracy, calibration Brier score
- Technical KPIs: API latency, uptime, backtest duration
- Business KPIs: Paid users, ARR, retention
- ML KPIs: Ensemble vs baseline accuracy, edge decay time
Success Definition (Year 1)
Product Metrics
- 500+ active users (free or paid)
- 100+ ARR ($100k from Pro users)
- 65%+ calibration accuracy (Brier <0.20)
- 15%+ monthly active trader retention
Business Metrics
- 50+ paid users ($99/mo Pro tier)
- 5+ prop shop partnerships
- Featured in 3+ major crypto/trading publications
ML Metrics
- Ensemble beats GPT-4o by 5%+
- Edge decay: median 5 minutes
- Backtested P&L: +3-8% monthly (risk-adjusted)
Next: Review roadmap with stakeholders, prioritize features, begin Week 1 implementation.