MySQL DATETIME vs TIMESTAMP vs INT performance and benchmarking with MyISAM

Ouvir com webReader

Recently there was a discussion about DATETIME vs TIMESTAMP vs INT performance and storage requirements. If one setup was bound to disk usage, one would expect INT to perform better, since storage requirements are smaller. However, the amount of calculations to use it as a timestamp can be overwhelming as well.

So I went to do some benchmarks to see what we can conclude. The tests were run with:

  • MySQL 5.4.0-beta
  • Intel Quad core x 2800 MHz
  • Solaris 10
  • Single thread script to generate the data, and later on, simple LOAD DATA INFILE statements (to be deterministic);
  • I should also note that for each test, the tables were recreated to be positively sure there was no caching at all (also, but not every iteration, I did a filesystem cache flush) and that the test itself was repeated a couple of times.

The schema used was basically this:

CREATE TABLE `test_datetime` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`datetime` FIELDTYPE NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;

Being FIELDTYPE either DATETIME, TIMESTAMP and INT. As you may already have guessed, the INT timestamp will be UNIX style, by means of UNIX_TIMESTAMP() built-in function, which obviously adds some work to the server. However, remember there are calculations done for DATETIME and TIMESTAMP as well, since they are a kind of packed values.

The configuration file was pretty default, but for this test I don’t think there was much tweaking possible, since it’s all about INSERTs and table scans:

skip-locking
key_buffer = 128M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 8M
thread_cache_size = 8
query_cache_type = 0
query_cache_size = 0
thread_concurrency = 4

For the initial generation of data, I wrote a small script to load sequentially 10.000.000 rows of timestamped data. The results were:

           avg          min          max       Data_length
DATETIME   14111	14010        14369     130000000
TIMESTAMP  13888        13887        14122     90000000
INT        13270        12970        13496     90000000

mysql_dt_myisam
We can see that DATETIME performs better and indeed the INT time was the slower one. So far, it seems that the time conversion function has a notable impact, considering the same I/O is done for TIMESTAMP. That’s why I did another test: I dumped the data using SELECT … INTO OUTFILE so the same converted that was available:


mysql> select * from test_datetime into outfile ‘/tmp/test_datetime.sql’;
Query OK, 10000000 rows affected (6.19 sec)


mysql> select * from test_timestamp into outfile ‘/tmp/test_timestamp.sql’;
Query OK, 10000000 rows affected (8.75 sec)


mysql> select * from test_int into outfile ‘/tmp/test_int.sql’;
Query OK, 10000000 rows affected (4.29 sec)

Interesting how things turned: we now eliminated the calculations done for INT, but since the DATETIME and TIMESTAMP fields are exported as usual strings, they have to be reconverted for every row. By reading the calculations done for both types it’s easier to understand that the former was stored more packed than the latter.

I did more testing. In order to use the same data for the queries below, which will be mostly READ, I did a quick transformation to the data:

alter table test_datetime rename test_int;
alter table test_int add column datetimeint INT NOT NULL;
update test_int set datetimeint = UNIX_TIMESTAMP(datetime);
alter table test_int drop column datetime;
alter table test_int change column datetimeint datetime int not null;
select * from test_int into outfile ‘/tmp/test_int2.sql’;
drop table test_int;

So now I have exactly the same timestamps from the DATETIME test, and it will be possible to reuse the originals for TIMESTAMP tests as well.

mysql> load data infile ‘/export/home/ntavares/test_datetime.sql’ into table test_datetime;
Query OK, 10000000 rows affected (41.52 sec)
Records: 10000000 Deleted: 0 Skipped: 0 Warnings: 0


mysql> load data infile ‘/export/home/ntavares/test_datetime.sql’ into table test_timestamp;
Query OK, 10000000 rows affected, 44 warnings (48.32 sec)
Records: 10000000 Deleted: 0 Skipped: 0 Warnings: 44


mysql> load data infile ‘/export/home/ntavares/test_int2.sql’ into table test_int;
Query OK, 10000000 rows affected (37.73 sec)
Records: 10000000 Deleted: 0 Skipped: 0 Warnings: 0

As expected, since INT is simply stored as is while the others have to be recalculated. Notice how TIMESTAMP still performs worse, even though uses half of DATETIME storage size.

Let’s check the performance of full table scan:

mysql> SELECT SQL_NO_CACHE count(id) FROM test_datetime WHERE datetime > ‘1970-01-01 01:30:00′ AND datetime < ‘1970-01-01 01:35:00′;
+———–+
| count(id) |
+———–+
|    211991 |
+———–+
1 row in set (3.93 sec)


mysql> SELECT SQL_NO_CACHE count(id) FROM test_timestamp WHERE datetime > ‘1970-01-01 01:30:00′ AND datetime < ‘1970-01-01 01:35:00′;
+———–+
| count(id) |
+———–+
|   211991 |
+———–+
1 row in set (9.87 sec)


mysql> SELECT SQL_NO_CACHE count(id) FROM test_int WHERE datetime > UNIX_TIMESTAMP(’1970-01-01 01:30:00′) AND datetime < UNIX_TIMESTAMP(’1970-01-01 01:35:00′);
+———–+
| count(id) |
+———–+
|    211991 |
+———–+
1 row in set (15.12 sec)

Then again, TIMESTAMP performs worse and the recalculations seemed to impact, so the next good thing to test seemed to be without those recalculations: find the equivalents of those UNIX_TIMESTAMP() values, and use them instead:

mysql> select UNIX_TIMESTAMP(’1970-01-01 01:30:00′) AS lower, UNIX_TIMESTAMP(’1970-01-01 01:35:00′) AS bigger;
+——-+——–+
| lower | bigger |
+——-+——–+
|  1800 |   2100 |
+——-+——–+
1 row in set (0.00 sec)


mysql> SELECT SQL_NO_CACHE count(id) FROM test_int WHERE datetime > 1800 AND datetime < 2100;
+———–+
| count(id) |
+———–+
|    211991 |
+———–+
1 row in set (1.94 sec)

mysql_dt_myisam_2

It’s very important to note that before the last SELECT I forced a filesystem cache flush. But finally, INT shows a huge difference - in fact, the expectable half times, probably due to the storage size (half of DATETIME). I’ve tried to benchmark UNIX_TIMESTAMP() alone, but could not come up with any conclusion:

mysql> select benchmark(10000000,UNIX_TIMESTAMP(’1970-01-01 01:35:00′))
+————————————————————+
| benchmark(100000000,UNIX_TIMESTAMP(’1970-01-01 01:35:00′)) |
+————————————————————+
|                                                          0 |
+————————————————————+
1 row in set (1 min 7.02 sec)

And this result is indeed weird, meaning that the function is not called twice for each row..?

Regarding TIMESTAMP vs DATETIME, I see only problems with the former: it’s limited to dates bigger than year 1970 (like INT, except that you can use the latter with your own time-offset), it’s slower in every aspect, and remeber that ‘0′ in a TIMESTAMP means ‘0000-00-00 00:00:00‘ and not ‘1970-01-01 00:00:01‘. According to a a related test by DBTuna team comparing both data type’s range scans, TIMESTAMP renders 55% slower, so I start to question this data type existence…

Anyway, what I’ve tried to demonstrate was usage scenarios that you’ll need to consider for your own real cases: INT remain smaller in storage (50%) and will only perform better if INSERTs and SELECTs are already fed with an INT value - and this is specially relevant for WRITE-intensive scenarios - but DATETIME alleviates extra responsability/care from the developer. Programmers don’t usually care about this, and want the most flexibility from the database, so it’s up to you to find with them a compromise. I may have provided both enough arguments for an endless discussion, though… :-)


You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

AddThis Social Bookmark Button

9 Responses to “MySQL DATETIME vs TIMESTAMP vs INT performance and benchmarking with MyISAM”

  1. [...] my tests with DATETIME vs vs TIMESTAMP vs INT performance and benchmarking with MyISAM storage engine, I’ve wondered about the performance impact using InnoDB, which is usually [...]

  2. Interesting post. I find it a bit misleading, unless one insists on reading the gory details.

    What you’ve proven, is that when the “UNIX time” is calculated in advance, INT wins. This happens to be the case in real life: The application can easily obtain the time, and send it to the database as data. As for SELECTs, we typically want to fetch rows between time X and Y, which are both easily converted to integers by the application, and then passed to the database, as plain numbers.

    Unfortunately, the first graph gives the impression that INT is the loser here. And let’s face it, humans watch the pictures, not the text…

  3. Hi Eli, thanks for the input. Actually the change you suggest becomes an extra work, especially when dealing with frameworks; and let’s face it: DATETIME conversion to INT really sounds more like a hack rather than a solution. Which is pretty normal, when you’re optimizing, though, but maybe not when you’re developing.

    Let’s see: from what I’ve seen as a DBA, me and you are right, but the “carefulness” you’re asking to is generally neglected by developers. You and I are right, but they just don’t care.

    So in the end, what I tried to expose here is 1) which one is fastest - INT - and 2) when that “speed increase” may turn against you (like having to compare dates) - which you probably would compensate by ever using DATETIME (OK, I should have done some JOINs, although in MySQL even a SELECT is considered a JOIN…).

    I agree 100% on the lack of success of the last graph, though. Maybe a rows/sec is more clear…

  4. [...] MySQL DATETIME vs TIMESTAMP vs INT performance and benchmarking with MyISAM [...]

  5. Should you use DATETIME or TIMESTAMP format in MySQL if you do many ORDER BYs on this field? …

    This was an interesting question; my theory was that DATETIME would be faster as it’s searching a smaller space. It turns out that it’s correct.

    http://gpshumano.blogs.dri.pt/2009/07/06/mysql-datetime-vs-timestamp-vs-int-performance-and-benchmarking...

  6. converting ’1970-01-01 01:30:00′ into UNIX_TIMESTAMP will take the database time on int type date try to make a query like “SELECT * FROM table where int_date > 1111111111″, using int also to be compared in a int date.

  7. [...] Ans:: This was an interesting question; my theory was that DATETIME would be faster as it’s searching a smaller space. It turns out that it might be. mysql-datetime-vs-timestamp-vs-int-performance [...]

  8. [...] Jeśli już zaczniemy korzystać z int, nie używajcie przypadkiem w zapytaniach funkcji dot. czasu bo są mało wydajne (np. UNIX_TIMESTAMP()). Własne sprawdzanie tych „teorii” robiłem lata temu. Dla ciekawskich polecam chociażby benchmark jaki zrobił ntavares w 2009 roku. [...]

  9. “And this result is indeed weird, meaning that the function is not called twice for each row..?”

    Well, should be deterministic meaning one call for the same value get you the same result, there is no reason to process again the same data …

    So in the end, without using unix_timestamp for static values but precomputed ones .. using an int will be much faster, no? ;)

Leave a Reply