Back to ProjectsProject

Bloomsbury Network Mapper

Fundraising intelligence from public charity filings.

Next.jsSupabaseClaude AIpgvectorPythonVolunteer

The problem

Charity fundraising runs on introductions. You know someone who knows someone who cares about youth sport in London. But mapping those connections by hand is slow, and the data is scattered across public filings that nobody has time to read. The Charity Commission publishes annual reports, trustee lists, and financial statements for every registered charity in England and Wales. Somewhere in those documents are the people most likely to support Bloomsbury Football. The question is which ones, and why.

What I built

An intelligence tool. Not outreach automation, not a CRM -- pure discovery and qualification. The system ingests public Charity Commission filings (about 20,000 documents covering nearly 5,000 charity registries from 2019 to 2025), extracts entities (people, organisations, trusts, companies), resolves identities across documents, builds a relationship graph, and then runs path-template scoring to find candidates who sit close to Bloomsbury's existing network. The output is a ranked list of candidates with explainable evidence chains. Every claim traces back to a specific source document, extraction run, and confidence score.

The pipeline

Raw documents enter an audit stage where they are catalogued and validated. An extraction layer powered by Claude pulls structured data from unstructured text: names, roles, affiliations, financial figures. A validation step quarantines low-confidence extractions. Entity resolution clusters mentions that refer to the same real-world person or organisation -- reversible, so human corrections propagate cleanly. The graph construction phase builds relationship edges between resolved entities. Then the discovery engine runs path templates: 'this person is a trustee of a charity that co-funds programmes with an organisation connected to Bloomsbury.' Each path gets scored on two axes: how valuable the candidate looks (giving history, network position, trust affiliations) and how trustworthy the evidence is (source tier, extraction confidence, human validation).

Why path templates

I could have used PageRank or a generic graph centrality measure and called it a day. The problem is that nobody at Bloomsbury would trust a score they cannot explain. Path templates produce results like: 'Jane Smith is a trustee of the Wellspring Trust, which co-funds three programmes with London Youth, which Bloomsbury has partnered with since 2021.' That is a sentence a fundraiser can act on. An opaque number is not. The system recommends; humans decide. Every candidate goes through a review queue with structured decision codes before anything happens.

Current state

Full application scaffold deployed. Next.js frontend, Supabase backend with pgvector for embeddings, edge functions for pipeline phases. The 72-page product requirements document and 13-epic backlog are complete. Corpus of 20,444 filing documents loaded. Active build underway on the extraction and entity resolution layers.