IMDB4M - A Large-Scale Quad-Modal Knowledge Graph of Movies

RDF / Turtle Schema.org

The first knowledge graph to combine a culturally rich domain with comprehensive coverage of text, images, video, and audio. Engineered on schema.org for semantic interoperability, discoverability, and structural quality.

1.8M+ RDF Triples

376 Seed Movies

5,484 Artists

93.6% Quad-Modal

View on GitHub Explore Schema

Multimodal Coverage

Four Modalities, One Graph

IMDB4M overcomes the bimodal bottleneck of existing knowledge graphs by integrating text, images, video, and audio as first-class semantic objects.

📝

Text

100%

Plots, reviews, keywords, genres, cast bios

48.6 avg/movie

🖼️

Images

100%

Stills, posters with captions & entity links

7.9 avg/movie

🎬

Video

99.2%

Trailers with thumbnails, duration, dates

0.99 avg/movie

🎵

Audio

94.1%

Soundtracks with performers & composers

11.2 avg/movie

Knowledge Graph

By the Numbers

A comprehensive resource comprising over 1.8 million RDF triples describing movies, artists, and their multimodal content.

Metric	Value
RDF Triples	1,815,922
Unique Nodes	660,039
Unique Predicates	58
Seed Movies (fully annotated)	376
Total Movies (after expansion)	50,756
Artists (actors, directors, composers)	5,484
PerformanceRole Instances	232,492
ImageObjects	36,844
Wikidata Alignments (Artists)	4,284 (78.1%)
Entity Types	17

Comparison with Related Work

Dataset	Text	Image	Video	Audio	#Entity	#Relation
MKG-W	14,123	14,463	–	–	15,000	169
MKG-Y	12,305	14,244	–	–	15,000	28
TIVA-KG	11,858	11,636	10,269	2,441	11,858	16
KVC16K	14,822	14,822	14,822	14,822	16,015	4
IMDB4M	385,595	37,220	3,983	4,211	660,039	58

Ontology

Schema.org Foundation

Built on widely adopted vocabulary standards for semantic interoperability and Web-scale discoverability.

PerformanceRole Pattern

Actor participation captures actor, movie, and character name together, preserving identity separately from fictional roles.

N-ary Structures

Typed blank nodes with xsd:date, xsd:dateTime, xsd:duration, xsd:integer, and xsd:decimal for machine-interpretable values.

Two-level Audio

schema:MusicRecording for performed audio artifacts and schema:MusicComposition for the underlying musical work.

Quality Assurance

Validated & Verified

Systematic validation through SPARQL-based question answering and human-in-the-loop link verification.

94.4%

F1 Score

99.3%

Precision

90.0%

Recall

99.3%

Query Success

87.2%

YouTube Link Accuracy

0.993

Levenshtein Similarity

Evaluated against 18 competency questions formalized as SPARQL queries, covering directors, writers, actors, ratings, plots, trailers, soundtracks, images, and more.

Use Cases

Research Applications

IMDB4M enables research across multiple domains in the Semantic Web and Multimedia communities.

🎥

Movie Recommendation

Content-based recommendation using visual style of posters and acoustic features of soundtracks.

Audio embeddings from linked YouTube videos
Visual style analysis of movie posters
Temporal features from trailers

🔍

Multimodal QA

Knowledge Graph Question Answering (KGQA) with perceptual grounding and RAG systems.

Joint reasoning over symbolic and perceptual modalities
Complex queries with reified relations
Multimodal RAG benchmarking

🧩

KG Completion

Multimodal Knowledge Graph Completion including link prediction and entity alignment.

Infer genre from poster and plot
Cross-platform entity alignment
Multimodal KG embeddings (TransE+)

Reference

Cite This Work

If you use IMDB4M in your research, please cite our paper.

@inproceedings{imdb4m2026,
  title     = {{IMDB4M}: A Large-Scale Multi-Modal Knowledge Graph of Movies},
  author    = {Reklos, Ioannis and de Berardinis, Jacopo and Simperl, Elena and Mero{\~n}o-Pe{\~n}uela, Albert},
  year      = {2026},
  note      = {Under review}
}

Download Dataset