The first knowledge graph to combine a culturally rich domain with comprehensive coverage of text, images, video, and audio. Engineered on schema.org for semantic interoperability, with pre-computed image, video, audio, text, and KG (RotatE) embeddings released on Zenodo.
IMDB4M overcomes the bimodal bottleneck of existing knowledge graphs by integrating
text, images, video, and audio as first-class semantic objects. All numbers below are
derived directly from data/kg/imdb_kg_cleaned.ttl on the 376 seed movies.
Plots, reviews, keywords, captions, genres
18.58 triples / seed movie
Stills, posters with captions & entity links
6.91 triples / seed movie
Trailers with thumbnails, duration, dates
0.99 triples / seed movie
Soundtracks with performers, composers, lyricists
12.02 triples / seed movie
355 / 376 seed movies (94.41%) carry all four modalities simultaneously in the cleaned KG.
A comprehensive resource comprising over 1.8 million RDF triples describing movies, artists, and their multimodal content.
| Metric | Value |
|---|---|
| RDF Triples | 1,800,490 |
| Unique RDF Nodes (URIs + literals + bnodes) | 656,121 |
| URIRef Entities (released as KG embeddings) | 139,465 |
| Distinct Predicates | 58 |
| Seed Movies (fully annotated) | 376 |
Total Movies (schema:Movie) |
50,756 |
| Artists Analyzed (actors, directors, composers) | 5,484 |
schema:PerformanceRole Instances |
232,492 |
schema:ImageObject Instances |
34,039 |
schema:VideoObject Instances |
3,981 |
schema:Person Instances |
16,994 |
schema:MusicRecording Instances |
4,521 |
schema:MusicComposition Instances |
3,970 |
schema:AggregateRating Instances |
734 |
schema:Review Instances |
563 |
Wikidata Alignments (owl:sameAs) |
4,284 actors + 376 movies (4,660 triples) |
| Dataset | Text | Image | Video | Audio | #Entity | #Relation |
|---|---|---|---|---|---|---|
| MKG-W | 14,123 | 14,463 | – | – | 15,000 | 169 |
| MKG-Y | 12,305 | 14,244 | – | – | 15,000 | 28 |
| TIVA-KG | 11,858 | 11,636 | 10,269 | 2,441 | 11,858 | 16 |
| KVC16K | 14,822 | 14,822 | 14,822 | 14,822 | 16,015 | 4 |
| IMDB4M | 390,747 | 34,039 | 3,981 | 4,521 | 656,121 | 58 |
Built on widely adopted vocabulary standards for semantic interoperability and Web-scale discoverability.
Actor participation captures actor, movie, and character name together, preserving identity separately from fictional roles.
Typed blank nodes with xsd:date, xsd:dateTime, xsd:duration, xsd:integer, and xsd:decimal for machine-interpretable values.
schema:MusicRecording for performed audio artifacts and schema:MusicComposition for the underlying musical work.
IMDB4M ships pre-computed embeddings for every released modality plus
knowledge-graph embeddings trained with PyKEEN’s RotatE. All vectors are L2-normalised
and aligned one-for-one to the KG via imdb4m:hasEmbedding records.
Systematic validation through SPARQL-based question answering, KG-wide competency-question coverage, and human-validated link verification.
Evaluated against 18 competency questions formalised as SPARQL queries, covering directors, writers, actors, ratings, plots, trailers, soundtracks, images, and more. Query success is re-run over all 376 seed movies (6,720 / 6,768 instances return at least one answer); YouTube-link accuracy is the human-validated agreement rate (129 / 148 sampled links).
IMDB4M enables research across multiple domains in the Semantic Web and Multimedia communities.
Content-based recommendation using visual style of posters and acoustic features of soundtracks.
Knowledge Graph Question Answering (KGQA) with perceptual grounding and RAG systems.
Multimodal Knowledge Graph Completion including link prediction and entity alignment.
schema:genre from poster and plotimdb4m:hasEmbeddingIf you use IMDB4M in your research, please cite our paper. The embedding bundle is archived separately on Zenodo and should be cited via its dataset DOI.
@inproceedings{imdb4m2026,
title = {{IMDB4M}: A Large-Scale Multi-Modal Knowledge Graph of Movies},
author = {Reklos, Ioannis and de Berardinis, Jacopo and Simperl, Elena and Mero{\~n}o-Pe{\~n}uela, Albert},
year = {2026},
note = {Under review}
}
@dataset{imdb4m_embeddings_2026,
title = {{IMDB4M} Multi-Modal and KG Embeddings (v1)},
author = {Reklos, Ioannis and de Berardinis, Jacopo and Simperl, Elena and Mero{\~n}o-Pe{\~n}uela, Albert},
year = {2026},
doi = {10.5281/zenodo.20057840}
}