So how do you do this? there are a number of steps that need to be done before you give an email a thumbs up or down, so if you're not into this sort of semi-technical stuff, you might want to go find a cat video for a while.
so what happens when you start your email client? To you it looks like it shows you your inbox, and you're happy. behind the scenes theres more that happens, each with its own points of failure and quirks. For my example below I'm working with the IMAP protocol. there are plenty of others out there (POP3, Exchange, Lotus Notes, and so forth) but they work about the same, more or less.
First off, your client connects to the server. for this it needs to know the server address, the particular port, and what security protocol to use. The address is pretty much like an address for some building - it tells where the server is located. now there is a lot of other magic out there for taking "imap.google.com" and getting to the correct server that holds your email, but that is for later. The port specifies what to connect to at that address. think of it as a particular office in the building that your address is for. This building has a lot of offices, all of them with some purpose, most of which we don't care about right now. We want to go to office 993 and only there, since whatever is in that office will know how to talk to us and vice versa. the final bit is the security protocol. while you could communicate back and forth in plain text, this isn't really recommended. Basically the protocol allows you to encrypt the text going back and forth between you and the server.
And by text, I mean human readable text (in most cases). if you could look at the data that is being sent back and forth, you will see that it looks like a set of command line interfaces. in fact, if you were so inclined you could use a terminal emulator and talk directly to the email server, manage your email, and read and write emails directly. I prefer an email client, though. Things are so much simpler that way.
So after we connect, the server would like us to tell it who we are. The user ID and password are used for this. now because we are connecting securely, this information is passed in as readable text. You are connected securely, aren't you? and once you are authenticated, you're ready.
Well, up to a point. now you probably need to see what is out there. usually your client will now get a list of all the directories and subdirectories out there, so that it can present those in a way pleasing to the user. there are a number of special directories, such as trash, the in box, and so forth that we'll come back to later.
Now that we know what the directories are, we will want to know what is in them. So your client will usually open the folder you are interested in, read the list of emails, and manipulate them as it needs to. Sometimes it will maintain a list of emails locally, sometimes it will only work with the ones on the server. in our case we'll assume that it only gets the list of the header information.
Now you, the user selects an email. Your email client will opens the folder, retrieves the requested email, marks it as read, and closes the folder, before formatting and showing the email to you. Deleting an email follows the same pattern - open the folder, do the action, close the folder. Now you can keep the folder open, as long as you remain in the folder, and your email client probably does that, at least until you change to a new folder.
Now email is pretty much a pull technology - it waits for a request from the client before the server responds with the data. it is also single threaded in that you cannot have multiple requests for data at the same time. there are some push models out there (for instance, IMAP supports a Subscribe/Idle set of commands) but it is still pretty much one connection, one line of inquiry.
So how can we monitor multiple folders at the same time? IMAP will only let you have a single folder open at a time - opening multiple folders won't hurt anything, but it also probably won't give you what you want.
the first way is opening multiple instances of the connection to the server. This would allow each folder to be monitored separately and on its own connection. this does have the advantage of keeping everything separate, but then you have to manage multiple connections. you also have the disadvantage of some servers only allowing a small number of connections open at a time. keeping all of these connections active will take up resources that would be better used elsewhere.
the second is polling the server periodically to see if there are any changes. this had the advantage of only keeping a single connection open, but the onus is on the client to poll the server to see what is new.
A hybrid approach could be used. We could poll the folders periodically, but use the Subscribe/Idle to have the server push changes in one of the folders (like the inbox) at us as soon as they happen.
So I started with the subscribe model on my categorizer. After a couple of false starts, and a reading of the RFC for IMAP, and realizing that what I was wanting was not within the capapbilities of the IMAP protocol (hey, i want to subscribe and monitor all folders, not just one) I backed off of this and went to a polling model. while this is working I will probably implement a version of the hybrid approach later this week so it will be more respoonsive.
the watcher will keep an eye on the inbox, mainly to watch for new emails arriving. as each email arrives it will run it through the categorization algorithms and move it to a new folder.
the polling will watch all of the folders to detect changes, primarily to find the moves and deletes. a deletion will signify that the email was categorized correctly, while a move will usually mean that it wasn't. Since i'll only be training when the email is deleted, or possibly moved to the trash, I'll need to detect these correctly.
So, todays takeaways -
Mail libraries cover most of the details, but not everything, and especially not the process.
Sometimes you have to go back to the IETF RFC for something to see how it works.
Just because you can send a command doesn't you should send the command.
Everything was simpler back when this stuff was first done, at least in hindsight.

