RDF / Turtle Schema.org

The first knowledge graph to combine a culturally rich domain with comprehensive coverage of text, images, video, and audio. Engineered on schema.org for semantic interoperability, discoverability, and structural quality.

1.8M+ RDF Triples
376 Seed Movies
5,484 Artists
93.6% Quad-Modal
View on GitHub Explore Schema

Four Modalities, One Graph

IMDB4M overcomes the bimodal bottleneck of existing knowledge graphs by integrating text, images, video, and audio as first-class semantic objects.

📝

Text

100%

Plots, reviews, keywords, genres, cast bios

48.6 avg/movie

🖼️

Images

100%

Stills, posters with captions & entity links

7.9 avg/movie

🎬

Video

99.2%

Trailers with thumbnails, duration, dates

0.99 avg/movie

🎵

Audio

94.1%

Soundtracks with performers & composers

11.2 avg/movie

By the Numbers

A comprehensive resource comprising over 1.8 million RDF triples describing movies, artists, and their multimodal content.

Metric Value
RDF Triples 1,815,922
Unique Nodes 660,039
Unique Predicates 58
Seed Movies (fully annotated) 376
Total Movies (after expansion) 50,756
Artists (actors, directors, composers) 5,484
PerformanceRole Instances 232,492
ImageObjects 36,844
Wikidata Alignments (Artists) 4,284 (78.1%)
Entity Types 17

Comparison with Related Work

Dataset Text Image Video Audio #Entity #Relation
MKG-W 14,123 14,463 15,000 169
MKG-Y 12,305 14,244 15,000 28
TIVA-KG 11,858 11,636 10,269 2,441 11,858 16
KVC16K 14,822 14,822 14,822 14,822 16,015 4
IMDB4M 385,595 37,220 3,983 4,211 660,039 58

Schema.org Foundation

Built on widely adopted vocabulary standards for semantic interoperability and Web-scale discoverability.

IMDB4M Knowledge Graph Schema

PerformanceRole Pattern

Actor participation captures actor, movie, and character name together, preserving identity separately from fictional roles.

N-ary Structures

Typed blank nodes with xsd:date, xsd:dateTime, xsd:duration, xsd:integer, and xsd:decimal for machine-interpretable values.

Two-level Audio

schema:MusicRecording for performed audio artifacts and schema:MusicComposition for the underlying musical work.

Validated & Verified

Systematic validation through SPARQL-based question answering and human-in-the-loop link verification.

94.4%
F1 Score
99.3%
Precision
90.0%
Recall
99.3%
Query Success
87.2%
YouTube Link Accuracy
0.993
Levenshtein Similarity

Evaluated against 18 competency questions formalized as SPARQL queries, covering directors, writers, actors, ratings, plots, trailers, soundtracks, images, and more.

Research Applications

IMDB4M enables research across multiple domains in the Semantic Web and Multimedia communities.

🎥

Movie Recommendation

Content-based recommendation using visual style of posters and acoustic features of soundtracks.

  • Audio embeddings from linked YouTube videos
  • Visual style analysis of movie posters
  • Temporal features from trailers
🔍

Multimodal QA

Knowledge Graph Question Answering (KGQA) with perceptual grounding and RAG systems.

  • Joint reasoning over symbolic and perceptual modalities
  • Complex queries with reified relations
  • Multimodal RAG benchmarking
🧩

KG Completion

Multimodal Knowledge Graph Completion including link prediction and entity alignment.

  • Infer genre from poster and plot
  • Cross-platform entity alignment
  • Multimodal KG embeddings (TransE+)

Cite This Work

If you use IMDB4M in your research, please cite our paper.

@inproceedings{imdb4m2026, title = {{IMDB4M}: A Large-Scale Multi-Modal Knowledge Graph of Movies}, author = {Reklos, Ioannis and de Berardinis, Jacopo and Simperl, Elena and Mero{\~n}o-Pe{\~n}uela, Albert}, year = {2026}, note = {Under review} }
Download Dataset