Media TV and Movies: Turning Streaming Viewing History into Outlook Calendar Events

Media TV and Movies: Turning Streaming Viewing History into Outlook Calendar Events

Like many people I watch TV and movies across multiple streaming services. I thought it would be nice to have a “watch history” for all of them in one place. I wanted to be able to quickly answer questions like “Did I watch that show already?” or “When did I watch that movie?” or “What did I watch last month?”

The solution I decided on was to create Outlook calendar events for each show or movie that I watched. I used Python and the Microsoft Graph API to create and retrieve Outlook events. The result is to have a personal “watch log” inside the Outlook calendar — searchable, filterable, and easy to browse by date.

However, the more challenging part was identifying how to get the data from the streaming services as they each own watch history method. Many provided a user friendly way to request and download watch history data. Some don’t have this feature so you have to put your techie hat on and scrape their website to get the required data.

All code is available in the repo: media_and_tv_data on GitHub

This repo will provide a good start about how to get data from the streaming services. You can find instructions about how to obtain watch history data from the following streaming services:

  1. Netflix
  2. Amazon Prime
  3. Apple TV+
  4. Crave TV
  5. Disney+
  6. Google TV

Streaming service watch history data availability

Netflix
Netflix makes this pretty straightforward: you can request and download a copy of your account data from Netflix (it’s part of their “download your information” area). Once you download it, you’ll find a watch history file (a CSV) that lists what you watched and when. In this project you just drop that file into the Netflix folder and the scripts can turn those watch entries into calendar events.

Amazon Prime (Prime Video)
Prime Video also provides a download-style option for viewing history. You obtain your Prime Video watch history as a spreadsheet-style file (CSV) from your Amazon/Prime Video account history area, then save it into the project’s Prime Video folder. From there, the project reads it and can create calendar entries for each show or movie you watched.

Apple TV+
Apple’s approach is more “Apple ecosystem” than “streaming site download.” Instead of a simple “download watch history” button inside Apple TV+, you typically get this through Apple’s privacy/data export (a request-and-download process), which can include TV app activity. After you download your Apple data, you pull out the TV-related activity file and place it into the Apple TV+ folder for the project to use.

Crave TV
Crave is the one of the most “hands-on” one: there usually isn’t a clean “download my watch history” export. To get the data, you generally have to be a bit techie—open Crave in your browser, use the browser’s Developer Tools (Network tab), and watch for the site loading your watch history as JSON data behind the scenes. You save those JSON responses into the project’s Crave folder, and then the scripts can process them into calendar events.

Disney+
Disney+ is the other most “hands-on” one: there usually isn’t a clean “download my watch history” export. Disney+ does provide a watch list page but user friendly way to download the data. You can scrape the webpage. However they do have an API that a third party has created an SDK for. But this doesnt include any dates or watch durations. You need to provide those yourself. In this project, the data is pulled using a small script that talks to Disney+ in a more automated way (think “API-style access”), producing a file that represents your watch progress/history. There’s also support for manually adjusting dates if needed, then the project uses that final file to create the calendar events.

Google TV
Google TV data usually comes from Google’s “download your data” tools (commonly Google Takeout), where you can export information tied to your Google/Google Play/Google TV activity. That export can include JSON files describing your library and purchases, which are basically structured data files. Once you download them, you put those files into the Google TV folder and the project can combine them into something usable for generating calendar events.

High-level architecture

Shared components

  • shared_event_utils.py:
    core shared functionality, including the EventManager class, authentication, and duplicate detection.
  • Centralized authentication:
    Microsoft Graph API token management in one place.
  • Advanced duplicate detection:
    bulk caching and exact matching using datetime + title.
  • Title normalization:
    handles encoding issues and character variations (mojibake / Unicode normalization).

Platform modules

Each streaming service has its own directory with specialized processing, but they all follow the same general pattern.
Some examples of platform-specific behavior:

  • Netflix: timezone conversion and duration calculation
  • Amazon Prime TV: EST timezone handling and bulk caching
  • Apple TV+: Atlantic timezone and episode formatting
  • Crave TV: series/movie distinction and rate limiting
  • Disney+: watch time calculation and manual date filtering
  • Google TV: duration estimation and EST timezone

Key features

1) Testing & validation modes

  • Test mode: process a limited number of records for fast iteration
  • Dry run mode: process everything but don’t actually create any calendar events
  • Production mode: full event creation with logging and summaries

2) Defensive programming

  • Manual date filtering via LAST_EVENT_DATE to avoid accidentally processing old events
  • Timezone protection to guard against timezone-related duplicate creation
  • Robust error handling with centralized token refresh and retry logic

3) Harmonized experience across platforms

  • Consistent logging: standardized console output everywhere
  • Unified configuration: the same settings pattern across all platforms
  • Comprehensive summaries: clear reporting of created vs. skipped events

Supported platforms and inputs

Platform Data Source Data File Name
Netflix Download data ViewingActivity.csv
Amazon Prime TV Download data PrimeVideo.ViewingHistory.csv
Apple TV+ Download data TV App Favorites and Activity.json
Crave TV Manual website JSON watchHistory_pageNumber_0.json, graphql_0.json
Disney+ Scraping watchlist API watchlist_progress_raw_manual_dates.csv
Google TV Download data Library.json, Purchase History.json
Get Outlook Events Microsoft Graph API N/A (direct API access)

Quick start

1) Prerequisites

  • Microsoft Graph API credentials:
    client_id, tenant_id, client_secret, user_id
  • Required libraries:
    pandas, numpy, requests, pytz, ftfy
  • Python 3.9+ (timezone support)

2) Configuration

Each platform has a create_events.py that uses the same configuration flags:

# In any platform's create_events.py
TEST_MODE = False      # Set to True for limited testing
DRY_RUN = True         # Set to True for safe validation
LAST_EVENT_DATE = "2025-01-01"  # Update to your desired cutoff date

3) Usage examples

Safe validation (recommended first run)

TEST_MODE = False  # Process all records
DRY_RUN = True     # Don't create events, just show what would happen

Limited testing

TEST_MODE = True   # Process only limited records
DRY_RUN = False    # Actually create events
TEST_LIMIT = 5     # Process only 5 records

Production run

TEST_MODE = False  # Process all records
DRY_RUN = False    # Actually create all events

4) Running a platform

cd netflix  # or any platform directory
python create_events.py

Repository structure

The repo is organized by platform, with shared logic at the top level:

media_tv_and_movies/
├── README.md
├── shared_event_utils.py
├── .gitignore
├── netflix/
│   ├── README.md
│   ├── create_events.py
│   ├── process_raw_data.py
│   ├── test_modules.py
│   ├── ViewingActivity_example.csv
│   ├── FilteredViewingActivity_example.csv
│   └── create_events_log_example.txt
├── amazon_prime_tv/
│   ├── README.md
│   ├── create_events.py
│   ├── process_raw_data.py
│   ├── test.py
│   ├── PrimeVideo.ViewingHistory_example.csv
│   ├── PrimeVideo.ViewingHistory_clean_example.csv
│   └── create_event_log_example.txt
├── apple_tv_plus/
│   ├── README.md
│   ├── create_events.py
│   ├── process_raw_data.py
│   ├── TV App Favorites and Activity_example.json
│   ├── TV App Favorites and Activity_example.csv
│   └── create_events_log_example.txt
├── crave_tv/
│   ├── README.md
│   ├── create_events.py
│   ├── process_raw_data.py
│   ├── watchHistory_pageNumber_0_example.json
│   ├── graphql_0_example.json
│   ├── raw_data_clean_example.csv
│   └── create_events_log_example.txt
├── disney_plus/
│   ├── README.md
│   ├── create_events.py
│   ├── get_watchlist.py
│   ├── update_manual_watchlist.py
│   ├── watchlist_progress_raw_manual_dates_example.csv
│   ├── create_events_log_example.txt
│   └── Disney-Plus-api-wrapper-master/
│       ├── src/pydisney/
│       ├── LICENSE
│       └── README.md
├── google_tv/
│   ├── README.md
│   ├── create_events.py
│   ├── create_1_library_csv.py
│   ├── create_2_purchase_history_csv.py
│   ├── create_3_combine_library_purch_hist.py
│   ├── Library_example.json
│   ├── Library_example.csv
│   ├── Purchase History_example.json
│   ├── Purchase History_example.csv
│   ├── combine_library_purch_hist_example.csv
│   └── create_events_log_example.txt
└── get_outlook_events/
    ├── README.md
    └── get_events.py

What the output looks like

The scripts are intentionally noisy in a good way: you get enough console feedback to be confident you’re not creating garbage events,
and the summary at the end makes it obvious what happened.

Console logging example

 DRY RUN MODE ENABLED - Will show what would happen without creating events
Starting Netflix event creation...
Access token obtained successfully
Cached 150 existing Netflix events
Found 25 records to process
Filtering to events after 2025-01-01. Found 20 records to process.
EXISTS  2024-04-28 15:54:32  The Crown S6 E1
DRY_RUN 2024-04-28 16:30:15  The Crown S6 E2

Summary report example

==================================================
EVENT CREATION SUMMARY (DRY RUN MODE)
==================================================
Events That Would Be Created: 3
Events Skipped (Already Exist): 2
Total Processed: 5
 DRY RUN: No events were actually created
==================================================

How duplicates are prevented

Calendar event creation is the kind of task where “mostly right” can still ruin your day. The duplicate strategy here is simple and strict:

  • Bulk caching: fetch all existing events once per run
  • Exact matching: same datetime + same (normalized) title = duplicate
  • Title normalization: corrects encoding/Unicode issues so the “same” title compares correctly
  • Smart outcomes:
    rewatches are allowed (different datetime),
    and updates are allowed (different title)

Error handling and resiliency

  • Automatic token refresh: 401 errors trigger token refresh + retry
  • Rate limiting support: especially important for Crave TV
  • Data validation: timezone and date validation built into the flow
  • Encoding fixes: normalization handles mojibake and Unicode problems

Technical highlights

The shared core is built around an EventManager class that centralizes event creation, logging, and duplicate detection.

class EventManager:
    def __init__(self, platform_name, token=None, test_mode=False, test_limit=5, dry_run=False)
    def fetch_existing_events(self) -> Set[str]
    def event_exists(self, row, title: str) -> bool
    def log_event_result(self, status: str, datetime_str: str, title: str)
    def should_continue_processing(self, current_count: int) -> bool
    def print_summary(self)
    def refresh_token(self)
    def handle_token_expired(self, create_event_func, *args, **kwargs)

Title normalization is also a key piece of making duplicate detection reliable:

def normalize_title(s: str) -> str:
    # Fixes mojibake, Unicode normalization, case folding, punctuation collapse
    # Handles: ’ -> ', “ -> ", †-> ", etc.

Why put watch history in a calendar?

It sounds a little odd until you try it. A calendar is a surprisingly good “personal timeline” UI:

  • You can scroll a week/month and remember what you watched
  • Search works well (“what season was I on in April?”)
  • It’s easy to correlate viewing with life events (“oh, I binged that during vacation”)
  • It’s platform-agnostic: the calendar becomes the single view

Getting started (recommended approach)

  1. Clone the repo
  2. Set up your Microsoft Graph API credentials
  3. Install requirements
  4. Pick one platform (Netflix is a nice starting point)
  5. Run in dry run mode first
  6. Set LAST_EVENT_DATE to a safe cutoff
  7. Once it looks correct, switch to production mode

License and contribution

This project is licensed under the MIT License. It’s a personal project, but suggestions and improvements are welcome.
The modular architecture makes it straightforward to add a new platform or improve an existing one.

Status

✅ Production Ready — all platforms are harmonized with centralized event management,
advanced duplicate detection, and comprehensive testing capabilities.


If you’re curious, want to adapt it, or just want to see how the pieces fit together, the repo is here:
https://github.com/sitrucp/media_and_tv_data

If you end up using it, I’d love to hear how you’re applying the data (beyond calendar events). There are a lot of fun directions this could go:
dashboards, “what did I watch this year?” summaries, genre trends, rewatch analysis, and more.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.