52 movies

making a little program to choose a movie for me

written on 2023-01-16

i made a letterboxd list for the movies i want to watch this year. some are serious, some are less so, all of them caught my eye at some point or have been recommended to me.

i have some trouble choosing movies to watch often, so i thought having a narrowed down list might help me. i also thought writing a program to choose the movie for me might also be a worthwhile endeavor...

some starting rules:

i run the program once a week (maybe i should pick a day)
i can watch the movie i choose any time after choosing it
no hard rules about not taking breaks, watching on my laptop, etc. it would be nice to catch almost all of these on a big screen but that's super unlikely, so i'm going to try and be less picky about it
the program will return 3 movies from the pool of unwatched movies; my selection has to be from the 3 movies
i can apply one filter a month (eg: show me movies that are less than 90 minutes long, show me movies that are not in english, etc.). but this might be too hard for me to implement so we will see
if it for any reason takes more than 30 minutes to find a chosen movie, you can switch to one of the second choices of movie

for building the program, at bare minimum i could do a really barebones python script, but it would be nice to port this onto this blog somehow. it would also be a good opportunity to re-familiarize myself with databases, since that's something i'd like to have in my toolkit for other projects.

first steps:

validating/formatting my data?

i downloaded a zip file containing a bunch of .csv data from my letterboxd account (likes, diary, watchlist, etc.). it includes the list of movies i'm using and the data looks something like this:

Position,Name,Year,URL,Description 1,Beau Travail,1999,https://boxd.it/1NI8, 2,The Housemaid,1960,https://boxd.it/WJs, 3,Atlantics,2019,https://boxd.it/hVEK, 4,All About Lily Chou-Chou,2001,https://boxd.it/1JP0, 5,Shirin,2008,https://boxd.it/35Ue,

not bad! what i need, however, is a column for "watched", and it would be nice to have a way to grab the poster as well. i have some experience using JSON for this type of thing but i wonder if i can do it with a .csv.
movie selection logic

i don't need to make this public facing, so i can just write a simple python script that randomly selects three movies that are not marked "watched" (create a separate dataset from the unwatched movies), lets me select 1, 2, or 3 (0, 1, or 2), and then moves that selected movie into the "watched" category or set. this could all be done in bash
publishing results

in theory, this shouldn't be too complicated either. it's a little messy to include this script as part of the blog, but if i want to have a dedicated page for it on my blog and publish it via neocities, it's probably the easiest/most straightforward thing. also, this directory i'm working out of really just defines for me "what i'm publishing on the web" and not really anything like "this is specifically MY blog." in any case, i can probably just make a static page that grabs the most recent set of data. so this means i have to store three sets:
- the movies i've watched
- the movies i haven't watched
- the selection pool for a given week, indicating the movie i ended up choosing

okay...time to go to my unpaid job. more to come...

time passes

i've finally come up with a first draft at a script i can run to do this (mostly just create the functionality in step 2 above). what i'm essentially doing is creating a python list that has numbers 1-52 (currently i'm calling it pos) corresponding to each movie in the .csv by the "Position" column. this list will maintain the movies that i've yet to watch. i also have a small list (sel) that contains three randomly selected numbers within the range of 0 to the length of pos. these three numbers are going to be three indexes in the pos list.

# [1, 2, ... 51, 52]
pos = list(range(1, 53))

# 3 random numbers from 0-51
sel = random.sample(range(0, len(pos)), 3)

# 3 random numbers from pos, based on the index, not the item
temp = [pos[sel[0]], pos[sel[1]], pos[sel[2]]]

so if sel returned [15, 3, 51], temp would return [16, 4, 52]. i can then grab these numbers to reference specific positions in my .csv, which then correspond to certain movies.

moving all these numbers back and forth around different buckets is important for when i've chosen a movie to watch. i could have marked the movies i've watched by writing separate .csv files, or created a new .json dataset, but i think for the purposes of minimizing the amount of files i create and parse through, using lists is good enough.

how i'm thinking this will work is that when i've watched a movie, i'll delete a corresponding value in pos. this will make the range that sel randomly chooses from smaller over time. now because sel is going to refer to the indexes of items in pos, not their values, it doesn't matter that, say sel will never return 52 after i've watched one movie, because it can still return the 52nd movie on my list.

update 01/22¹

i've completed the main logic, in the future i maybe want to revisit this entry and do a more thorough writeup. for now i'm going to write up a way to publish the log for what i select. currently i need to find a way to appropriately import and export data so that flask/jinja can read it and format it nicely...see /52-movies/ for what it looks like as i test it :)

more to come. i was thinking about how nice it is to comment in code directly, and how fun it is to have a guide on how to use a tool embedded into the tool. i guess often times that practice is ergonomics or "design" or something like that but i think the sprawling editors notes, marginalia, many apologies and clarifications, self-reflections, i think about this when i'm making something by hand too...texts within texts and something something.

would be nice to do a cool "updated/edit" formatted heading or insert...TBD? ↩

🕳〰️

52 movies