The first knowledge graph to combine a culturally rich domain with comprehensive coverage of text, images, video, and audio. Engineered on schema.org for semantic interoperability, discoverability, and structural quality.
IMDB4M overcomes the bimodal bottleneck of existing knowledge graphs by integrating text, images, video, and audio as first-class semantic objects.
Plots, reviews, keywords, genres, cast bios
48.6 avg/movie
Stills, posters with captions & entity links
7.9 avg/movie
Trailers with thumbnails, duration, dates
0.99 avg/movie
Soundtracks with performers & composers
11.2 avg/movie
A comprehensive resource comprising over 1.8 million RDF triples describing movies, artists, and their multimodal content.
| Metric | Value |
|---|---|
| RDF Triples | 1,815,922 |
| Unique Nodes | 660,039 |
| Unique Predicates | 58 |
| Seed Movies (fully annotated) | 376 |
| Total Movies (after expansion) | 50,756 |
| Artists (actors, directors, composers) | 5,484 |
| PerformanceRole Instances | 232,492 |
| ImageObjects | 36,844 |
| Wikidata Alignments (Artists) | 4,284 (78.1%) |
| Entity Types | 17 |
| Dataset | Text | Image | Video | Audio | #Entity | #Relation |
|---|---|---|---|---|---|---|
| MKG-W | 14,123 | 14,463 | – | – | 15,000 | 169 |
| MKG-Y | 12,305 | 14,244 | – | – | 15,000 | 28 |
| TIVA-KG | 11,858 | 11,636 | 10,269 | 2,441 | 11,858 | 16 |
| KVC16K | 14,822 | 14,822 | 14,822 | 14,822 | 16,015 | 4 |
| IMDB4M | 385,595 | 37,220 | 3,983 | 4,211 | 660,039 | 58 |
Built on widely adopted vocabulary standards for semantic interoperability and Web-scale discoverability.
Actor participation captures actor, movie, and character name together, preserving identity separately from fictional roles.
Typed blank nodes with xsd:date, xsd:dateTime, xsd:duration, xsd:integer, and xsd:decimal for machine-interpretable values.
schema:MusicRecording for performed audio artifacts and schema:MusicComposition for the underlying musical work.
Systematic validation through SPARQL-based question answering and human-in-the-loop link verification.
Evaluated against 18 competency questions formalized as SPARQL queries, covering directors, writers, actors, ratings, plots, trailers, soundtracks, images, and more.
IMDB4M enables research across multiple domains in the Semantic Web and Multimedia communities.
Content-based recommendation using visual style of posters and acoustic features of soundtracks.
Knowledge Graph Question Answering (KGQA) with perceptual grounding and RAG systems.
Multimodal Knowledge Graph Completion including link prediction and entity alignment.
If you use IMDB4M in your research, please cite our paper.
@inproceedings{imdb4m2026,
title = {{IMDB4M}: A Large-Scale Multi-Modal Knowledge Graph of Movies},
author = {Reklos, Ioannis and de Berardinis, Jacopo and Simperl, Elena and Mero{\~n}o-Pe{\~n}uela, Albert},
year = {2026},
note = {Under review}
}