Text Transcripts

            Have you ever wondered what your first text to your best friend (or worst enemy) was? Want to know how your texting trends have changed year over year or from month to month? Then this post is for you. I was asking myself similar questions and ended up looking into a transcript and statistic generator for texts. The hardest part to get started was an unreliable iCloud backup; if you have too many texts, iCloud will silently fail to back them up. Finally, I resorted to making an unencrypted backup of my phone and located the chat.db. This had the added benefit of resetting my iPhone settings and learning that I’ve been listening to music in mono for at least the last 3 years or so. Back to the technical details of how Apple stores your messages:

 

The Chat Database

            Apple’s chat.db has a few tables in it, but the relevant ones are: chat, message, and the appropriately named chat_message_join. chat appears to be used for identifying contacts and is the table with identifying contact information (the chat_identifier column). message is used for the actual messages sent and received and has the important columns text and is_from_me. Finally, chat_message_join is a table used to JOIN the message and chat tables to tie them together. For those who are not as familiar with databases, a JOIN statement is used to combine the data from two separate tables. In this case, there is a separate table that just tracks a chat id, a message id, and the message date. To get more information about this message, we must join the three tables together based on the ids.

 

            One peculiarity is the handling of dates. Apple chooses to record message dates as the seconds from Jaunary 1st, 2001 multiplied by one billion rather than the standard Unix timestamp of seconds since Jaunary 1st, 1970. Perhaps this was a technical decision as no texts were sent on iPhones prior to the 21st Century (excluding time travelers).  

 

            The database also has important information about types of messages. Apple introduced various method types throughout the last 5 years (reactions, expressive, drawn) and these need to be handled. As you may have suspected when sending a reaction to your friend on an Android device, each reaction is a separate SMS message. iPhones know how to interpret this message and transform the chat, but other devices do not and display it as a standard text message. This is handled by the associated_message_type column of the message table: “Loved” is 2000, “Liked” is 2001, “Disliked” is 2002, “Laughed at” is 2003, “Exclamation point” is 2004, and “Questioned” is 2005 (and removing one of these has the same offset value from 3000). These messages are also based on the sender’s phone language, so you can see if your contact has changed their phone’s language based on these reaction messages. Another special message type is expressive messages (a long press on the send button). These are determined by expressive_send_style_id column of the message table. Messages that are hand drawn are sent without any text which can lead to issues if you make bad assumptions about text length or the text field being present in all messages.

 

The Transcript

            The outputted data has two parts: transcript and summary statistics. The transcript is a log with the date, text, and sender of each message found between you and the specified contact. It is ordered by message date starting with your first message. Following that are various statistics: total messages sent/received, messages found to match text patterns, message times (times of day, years, months, seasons), and finally the largest time differentials between messages. Finding the differentials in messages was the most technical challenge. Because the SQLite API passes each result from the database to a callback function, we can think about accessing the messages (and their associated data) as a finite stream.  For tracking maximum/minimums in a stream of data, a heap is one of the better data structures. As each differential is calculated, it is inserted into the heap (or not if it is smaller than the minimum value in the max heap). This code uses a fixed size maximum heap to track differentials of time between messages. The size of the heap is configurable, so you can get down into the details of the times you’ve had 10-minute pauses as well as when you were ignored for a month.