A Cautionary Time Tale
Americans Suck at (Talking About) Time
Jay in California just got home from work Friday afternoon, and realized he forgot to call the India branch about the new sales reports.
Jay (05:02PM PST Friday April 4th): Afternoon! I'm sorry for calling so late, but I forgot to mention at our call this morning that we'd like the sales reports on Monday morning (the 7th). Can you do that?
Pradeep (05:32AM IST Saturday April 5th): Urghle. How did you get my cell number? Kidding, I can do that. Monday morning? I've got some questions about the output format, so can I call you first thing on Monday to sort them out before I generate it? Now isn't a good time.
Jay of course agrees. Monday sounds like a great time to deal with sales reports because he's about to head out to an awesome party.
Jay puts down his drink and heads to the men's room.
Pradeep (08:04AM IST Monday April 7th): Morning Jay!
Jay (07:34PM PST Sunday April 6th): Um, Hi, Pradeep?
Pradeep: Yup. So about the columns. Did you want a date column or a datetime column?
Jay: Date is plenty specific.
Pradeep: Great. Do you want the dates in IST or PST?
Jay: Does it matter?
Pradeep: As long as you care if the trends are half-a-day off, it matters. Will that affect your projections?
Jay: Oh, yeah. Ummm, let's do IST.
Pradeep: Okay! I'll send that right over.
Jay: Okay, the date in all these reports from India and Austin look like '2014-03-31'. So I can easily parse that.
Now, let's integrate with the Austin data.
Hmm, that looks wrong. I get 8 records when I group by date for last week's report, and there should obviously only be 7.
Let's see. Oh! The last few sales are on Monday from Pradeep. Because they are in IST which is 12.5 hours forward and the sales cutoff time is always specified in PST. Riiiiight. So I need to convert them to PST before I insert them. Okay! 7 records!
Now let's make a stacked area chart of sales from Austin and India together.
So I build a date series list by truncating the Sunday's time to 00:00:00 and subtracting 1 day from that 6 times. I fill in the existing data for both the India and Austin offices, and mark all the dates that didn't sell anything as 0s because stacked area charts don't allow missing data.
Hum. The charting library throws an error. I wonder why? If I open the series data...oh. There are more than seven data points in here again, by seemingly-random amounts. What's that?
Okay, so if I look at a date that has no sales it looks like this:
Fri Apr 04 2014 00:00:00 GMT-0700 (PDT)
What if I pick one that has some sales?
Thu Apr 03 2014 17:00:00 GMT-0700 (PDT)
Eep, 5:00! That's not what I want!
So apparently the time is coming back from the database in UTC even though I clearly want it to come back in Pacific Time? GOT IT. Wishful thinking never got me anywhere.
Time is hard.
Computers are Only as Good at Time as You Are
Thinking about Time
When I put a date in the database, at some point, I want to pull it out again.
When I pull it out again, I want to be able to turn it into something useful, which may very likely be Unix time, because I like doing math on numbers rather than crazy complicated time strings.
Unix time is defined as:
Unix time, or POSIX time, is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.
Coordinated Universal Time is defined as:
Coordinated Universal Time (French: Temps Universel Coordonné, UTC) is the primary time standard by which the world regulates clocks and time. It is one of several closely related successors to Greenwich Mean Time (GMT). For most purposes, UTC is synonymous with GMT, but GMT is no longer precisely defined by the scientific community.
Without going into further details, what this fails to spell out most obviously to most people is that Unix time has an associated time zone. More specifically:
Time zones around the world are expressed as positive or negative offsets from UTC, as in the list of time zones by UTC offset.
The upshot of all this is that if you have a date, and you want someday to turn it into Unix time so you can do math on it, which yeah, you probably do...you have to associate a time zone with it. When you want to express a date object stored as a time object for its Unix representation (which you will unless you want to calculate seconds from epoch by hand -- which you don't), it is a different number depending on the time zone of the time object.
The same is, of course, true of times.
If you want to output a time object to a string, you can either use the associated time zone, or you can convert the time object to a different time zone and output that.
Most languages will associate time objects with UTC if you don't specify the time zone explicitly.
Working with Time
Unless you are working with imaginary times and dates in fictional tales that will never be forced to interact with real-world mechanics or real people that live in different time zones, you must follow the following rules to retain correctness:
Never Ever Parse Without Time Zone
Never parse a date without a time zone. There will always be a time zone (whether it is implicit or explicit) by dates that you get, and unless you don't actually care whether you data is up to a day off, you should be using the UTC default instead of the known time zone information.
Always Assign/Associate Dates With Time Zone
As a correllary to the above that is easy to forget: Always associate a time zone with dates coming from the database or other libraries that create time objects for you. If you don't, libraries producing times will generally assume the associated time is UTC. You should put your dates into your database as UTC, so the default will just work. But I know you, American developer, and you didn't do that when you designed your schema and all the dates you've been putting in are probably PST. So if start putting in your dates in as a time zone other than UTC, then when you pull data from the database, you must associate PST with that data, or your Unix time representations will be wrong.
Always Convert To (Preferably) UTC Before Storage
Always convert the date or time to a predefined time zone before inserting it into your database. This will ensure that you can do the above. Generally UTC is recommended. But being consistent within your codebase is more important than agreeing with what the worldwide developer community may think is a good idea. So if you can't convert your existing code and data to UTC, then put it in as whatever time zone that you have been using.
Don't Date in a Hurry (Or Without Testing)
If you are doing date manipulation in a hurry, the odds that you will get all this stuff right the first time are just not with you. In the end you will likely find the insanity and naivety of manipulating dates without tests to be a bad plan.
Dates are Times
In the vast vast majority of cases, a date is a time and all of the above apply to dates as well as times. In the example above, when Jay called Pradeep, Pradeep was half a day ahead what Jay expected. When Pradeep called Jay, he was half a day behind what Pradeep expected. If you don't treat dates as carefully as your times, your data may be off by as much as +/-1 day. If that is acceptable to you, you may want to consider not using dates at all, but using weeks, or months, instead as your grouping factor.