Dataset profile

Counts, distributions, and top-K rankings for the snapshot currently mounted in this browser.

Snapshot

Identifier
gpt51_270226
Computed 2026-05-07T23:18:03.865413+00:00 · DB file 4.7 GB

Tables

Triples 4,480,648
Entities (instance nodes) 438,870
Predicates 56,818
Concepts 15,984
Instance triples (specialised instanceOf) 280,052
Pipeline batches 20,889

Object type distribution

Where each triple's object resolves to. NE = disambiguated entity, CONCEPT = type/class, LITERAL = string, UNRECOGNIZED = parsing failure.

CONCEPT 280,052
LITERAL 2,262,163
NE 1,938,432
UNRECOGNIZED 1

Pipeline status

Entity status

EXPLORED 100,179
UNEXPLORED 338,691

Triple object disambiguation

FINISHED 4,372,772
GENERATED 6,194
NERFINISHED 101,681
ONNER 1

Top predicates by triple count

Predicate ID Triples
instanceOf P0 285,355
locatedIn P40 204,760
notableFor P22 118,644
hasPart P35 112,473
fieldOfWork P3 112,214
usedFor P98 75,593
countryOfOrigin P26 71,654
relatedTo P37 70,088
notableWork P4 60,591
focusesOn P31 51,538
category P87 47,397
hasFeature P182 47,040
influenced P9 44,743
languageOfWorkOrName P15 44,000
hasNotableFacility P105 43,660
memberOf P10 36,327
areaServed P82 34,999
genre P14 34,922
purpose P79 34,259
sector P71 30,502

Top concepts by instance count

Concept ID Instances
human C0 21,377
tourist attraction C85 7,380
municipality C39 6,807
film C684 4,370
city C38 3,553
politician C67 3,236
building C240 3,177
company C423 2,521
non-fiction work C46 2,483
award C97 2,085
cultural institution C84 2,079
neighborhood C27 2,009
transport hub C82 1,931
academic division C57 1,730
nonprofit organization C6 1,654
song C396 1,553
surname C132 1,521
military officer C449 1,501
sports venue C344 1,399
military organization C298 1,349

Top entities by total degree

How many triples reference each entity (in-degree as object + out-degree as subject). High-degree entities are disambiguation magnets — every surface form variant resolves here.

Entity ID In Out Total
United States of America E14 52,278 83 52,361
United Kingdom E732 12,368 144 12,512
France E861 5,977 102 6,079
English E211 5,933 62 5,995
New York City E40 5,539 74 5,613
England E1791 5,117 58 5,175
Europe E833 4,434 97 4,531
Germany E1728 4,160 95 4,255
Canada E14901 4,193 54 4,247
World War II E32 4,063 92 4,155
London, England E1817 3,900 94 3,994
California, United States E26 3,775 101 3,876
Eastern Time Zone E288 3,783 80 3,863
North America E335 3,776 83 3,859
Roman Catholicism E384 3,697 67 3,764
Washington, D.C. E23 3,325 78 3,403
Japan E174 3,343 56 3,399
Australia E876 3,039 70 3,109
Christianity E348 2,815 90 2,905
New York E550 2,686 53 2,739