Back to Projects

OTT Viewership Analysis

MongoDBPythonPyMongoPandasNoSQLData MiningBusiness Intelligence
MongoDB Query Analysis

Project Overview

In this project, I developed complex MongoDB queries to extract meaningful insights from a film database. The goal was to demonstrate advanced querying techniques that can be applied to business intelligence and data analysis scenarios. By leveraging MongoDB's flexible document model and powerful query capabilities, I was able to answer specific business questions that would help drive decision-making in a media company.

Database Structure

The project utilized the "sample_mflix" database, which contains information about movies, including details such as:

  • Directors and cast members
  • Runtime and release year
  • Genres and languages
  • Ratings from various sources (IMDB, Metacritic, Tomatoes)
  • Viewer and critic scores

Query Challenges

Challenge 1: Director-Actor Overlap

The first challenge was to identify movies where the director also appeared as a cast member. This query helps identify multi-talented filmmakers who both direct and act in their productions.

// Query to find movies directed by Mel Brooks where he also appeared in the cast
db.movies.find({
    directors: "Mel Brooks",
    cast: "Mel Brooks"
})

This query returned multiple results, showing that Mel Brooks frequently directed and starred in his own films.

Challenge 2: Genre and Runtime Analysis

The second challenge involved finding comedy movies with a specific runtime range. This type of query is useful for programming decisions, such as filling specific time slots in a broadcast schedule.

// Query to find comedy movies with runtime between 90 and 120 minutes
db.movies.find({
    runtime: { $gte: 90, $lte: 120 },
    genres: "Comedy"
})

Challenge 3: Multi-Genre Analysis with Time Constraints

This challenge required finding movies that belonged to multiple specific genres and were released before a certain year. Such queries help in content categorization and historical trend analysis.

// Query to find movies with both Adventure and Fantasy genres released before 2010
db.movies.find({
    genres: { $all: ["Adventure", "Fantasy"] },
    year: { $lt: 2010 }
})

Challenge 4: Language and Popularity Filters

This query identified movies in specific languages that had achieved a certain level of popularity (measured by IMDB votes) and were released after a specific year. This helps in identifying successful international content for potential distribution.

// Query to find Polish or German movies with at least 1000 IMDB votes released after 1996
db.movies.find({
    languages: { $in: ["Polish", "German"] },
    "imdb.votes": { $gte: 1000 },
    year: { $gt: 1996 }
})

Challenge 5: Complex Rating Analysis

This complex query combined multiple rating criteria, genre filtering, and time period constraints to identify movies that met very specific quality and categorization requirements.

// Query to find Drama movies from the 1990s with specific Tomatoes ratings
db.movies.find({
    "tomatoes.viewer.rating": { $lt: 8.0 },
    "tomatoes.critic.rating": { $gt: 7.0 },
    year: { $gte: 1990, $lt: 2000 },
    genres: "Drama"
})

Business Applications

The queries developed in this project have several practical applications in the media and entertainment industry:

  • Content Recommendation: Identifying movies with specific characteristics to recommend to users based on their preferences
  • Programming Decisions: Finding content that fits specific time slots or themed programming blocks
  • Acquisition Strategy: Analyzing which types of content perform well with critics vs. viewers to inform content acquisition decisions
  • Market Analysis: Understanding trends in film production across different time periods, genres, and languages
  • Talent Spotting: Identifying multi-talented individuals who excel in multiple roles (like directing and acting)

Conclusion

This project demonstrated the power of MongoDB's query capabilities for extracting specific insights from complex, nested document structures. The techniques used here can be applied to various business intelligence scenarios, helping organizations make data-driven decisions about content strategy, programming, and audience targeting. By leveraging NoSQL databases like MongoDB, businesses can perform sophisticated analyses on unstructured or semi-structured data that would be challenging to model in traditional relational databases.