Metadata-Version: 2.4
Name: p53-collector
Version: 0.1.0
Summary: Structured retrieval, summarization, and classification of data through RSS feeds and web pages
Author: Point 53, LLC
License-Expression: MPL-2.0
License-File: LICENSE
License-File: NOTICE
License-File: THIRD_PARTY_LICENSES.md
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.10
Requires-Dist: click>=8.3.1
Requires-Dist: dateparser>=1.2.2
Requires-Dist: feedparser>=6.0.12
Requires-Dist: httpx>=0.27
Requires-Dist: mcp>=1.0.0
Requires-Dist: ollama>=0.6.1
Requires-Dist: platformdirs>=4.0
Requires-Dist: protobuf>=6.33.2
Requires-Dist: pydantic>=2.11.0
Requires-Dist: selenium>=4.39.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: webdriver-manager>=4.0.2
Provides-Extra: all
Requires-Dist: anthropic>=0.75.0; extra == 'all'
Requires-Dist: pdfplumber>=0.11; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.75.0; extra == 'anthropic'
Provides-Extra: pdf
Requires-Dist: pdfplumber>=0.11; extra == 'pdf'
Description-Content-Type: text/markdown

# Point 53 Collector

**Your sources. Your summaries. Your machine.**

Point 53 Collector is a CLI tool that pulls articles from RSS feeds and web pages, summarizes them using local AI, stores everything in a local SQLite database, and outputs curated Markdown digests on demand. No cloud accounts required. No data leaves your network unless you choose to let it.

It is part of the Point 53 Suite — open source software built on the belief that the tools of intelligence work belong in the hands of individuals, not institutions.

> **License:** [Mozilla Public License 2.0](https://mozilla.org/MPL/2.0/) — see `LICENSE`, `NOTICE`, and `THIRD_PARTY_LICENSES.md` for full details.

## Why Collector

The information landscape is enormous and accelerating. Staying informed across dozens of sources takes real time — time most people don't have. Collector narrows the firehose into a structured, searchable, AI-summarized feed that you control end to end.

- **Local-first AI**: Summaries are generated on your hardware via [Ollama](https://ollama.com). Your reading habits, your interests, and the content itself never touch a third-party server by default.
- **Vendor flexibility**: Swap between Ollama models freely, or switch to Anthropic's API when it makes sense. The choice is always yours.
- **Structured output**: Every article is categorized, timestamped, and queryable. Filter by date, category, or keyword. Output to Markdown or pipe to stdout.
- **Chat your news**: After generating a digest, open an interactive chat session to ask questions about the articles using the same local models.
- **PDF-aware (opt-in)**: With the `[pdf]` extra installed and `pdf.enabled = true` in config, Collector downloads, scans, and summarizes linked PDFs from feeds like arXiv. Off by default.

## Install

Collector uses [uv](https://docs.astral.sh/uv/) for dependency management and distribution.

### Linux / macOS

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Point 53 Collector as a global CLI tool
uv tool install p53-collector --index https://dist.point53.ai/simple/

# Verify it's on your PATH
collector --help
```

### Windows

```powershell
# Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Install Point 53 Collector as a global CLI tool
uv tool install p53-collector --index https://dist.point53.ai/simple/

# Verify it's on your PATH
collector --help
```

### Prerequisites

- **Python 3.10+** (uv will manage this for you)
- **Ollama** running locally or on your network. Pull the default models:
  ```bash
  ollama pull gemma4:e4b         # article summarization
  ollama pull qwen3.5:9b         # distill --chat
  ollama pull qwen2.5-coder:14b  # SEARCH-page links → RSS (see config note)
  ```
- **Firefox** installed — the default scraping browser. Selenium drives a separate instance, so defaulting to Firefox keeps scraping out of your everyday Chrome profile. Chrome and `undetected-chrome` are configurable via `[webdriver].browser`.

## Quick Start

1. Configure your Ollama endpoint and models in `~/.config/point53/collector/briefings.toml`, and your sources in `~/.config/point53/collector/feeds.toml`. Run `collector config validate` to check them.

2. Fetch and summarize new articles:
   
   ```bash
   collector update
   ```

3. Generate a Markdown digest of unread articles:
   
   ```bash
   collector distill
   ```

4. Search and filter:
   
   ```bash
   collector distill --category "AI" --string-search "open weights"
   ```

5. Chat about what you've read:
   
   ```bash
   collector distill --chat "What were the most significant open-model releases this week?"
   ```

## Configuration

Two TOML files under `~/.config/point53/collector/` — run `collector config doctor` to create them with sensible defaults, and `collector config validate` to check them.

### `feeds.toml` — your sources

```toml
blocklist = []                    # URLs to skip during updates

[[feeds]]
name     = "Example Feed"
category = "Technology"           # free-form label; filter on it with `distill --category`
type     = "RSS"                  # "RSS" (parsed directly) or "SEARCH" (scraped + LLM-curated)
link     = "https://example.com/rss"
note     = ""                     # optional scraping hint
scroll   = 0                      # SEARCH pages: extra scrolls before harvesting links
```

### `briefings.toml` — models & operational settings

| Section | Purpose |
| --- | --- |
| `[webdriver]` | Browser (default `firefox`; also `chrome`, `undetected-chrome`), `page_load_timeout`, `ignore_certificate_errors`, optional `profile_dir` |
| `[pdf]` | `enabled` + `max_size_mb` for downloading and summarizing linked PDFs |
| `[models.article_summary]` | LLM role that summarizes each scraped page |
| `[models.search_to_rss]` | LLM role that turns SEARCH-page links into structured RSS |
| `[models.article_chat]` | LLM role backing `distill --chat` |
| `[rate_limits]` | `summary_sleep` / `search_sleep` throttles between requests |
| `[timeouts]` | `llm` request timeout (seconds) |
| `[warnings]` | `cloud_model` (`"warn"`/`"silent"`) and `chat_turn_limit` (re-warn every N turns of `distill --chat`; `0` disables) |
| `[profile]` | browser session `default`: `"sandboxed"` (ephemeral) or `"credentialed"` (persistent) |

Each `[models.<role>]` takes a `provider` (`ollama` / `anthropic` / `openai-compatible`), `model`, `base_url`, `context_window`, and an optional `api_key` — mix providers per role freely:

```toml
[models.article_summary]
provider       = "ollama"
model          = "gemma4:e4b"
base_url       = "http://localhost:11434"
context_window = 122880
```

**API keys resolve env-first** (file-last): `P53_<PROVIDER>_API_KEY` > vendor env (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) > the literal `api_key` in `briefings.toml`. Local Ollama / LM Studio need no key. Routing any role to a non-local `base_url` prints a one-line cloud-usage warning on stderr — suppress with `warnings.cloud_model = "silent"` or `--i-understand-the-risks`.

## The Suite

Collector is one of six tools in the Point 53 Suite:

| Tool          | Purpose                                                              |
| ------------- | -------------------------------------------------------------------- |
| **Point 53 Collector** | RSS/web aggregation, AI-summarized Markdown briefings                |
| **Point 53 Courier**   | Headless/non-headless search agent with configurable depth and scope |
| **Point 53 Intercept** | Desktop + mic audio capture, transcription, and structured notes     |
| **Point 53 Handler**   | Orchestrator + REPL/TUI + optional web UI; aggregates MCP servers    |
| **Point 53 Nightdesk** | Iterative overnight autoresearch engine with pluggable backends      |
| **Point 53 Monitor**   | iOS/Android call screener (coming soon)                              |

All projects are written in Python and released under the Mozilla Public License 2.0.

## License and Attribution

Point 53 Collector is licensed under the [Mozilla Public License 2.0](https://mozilla.org/MPL/2.0/).

- `LICENSE` — full MPL-2.0 text
- `NOTICE` — copyright and trademark notice, third-party summary
- `THIRD_PARTY_LICENSES.md` — per-dependency attribution and obligations

"Point 53" and "Point 53 Collector" are trademarks of Point 53, LLC. The bare
word "collector" is a functional identifier, not a trademark claim.
