1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.50 out of 5)
Loading ... Loading ...

Preface

In November 2008 I’ve got a nice project, that resulted in a complex search engine, that specializes in clothing. I can’t give the link now, but will do when site officially launches. Key features of such service are:

  • Large number of users, which search with various parameters, rate items, leave comments, create articles etc.
  • Large database, that is updated frequently

Sure, the site is running on a dedicated server, sure I use caching, but optimal DB is a key to success.

MyISAM or InnoDB?

Since we need reliability, I created different tests, search the internet etc. My main question was – what to use MyISAM or InnoDB? I found the post on the MySQL Perfomance Blog (also this and this) and came to the conclusion, that when SELECT’ing, InnoDB is definitely faster (or, at least, is not slower). In addition, it provides constraints mechanism and allows transactions. The only thing it doesn’t have is FULLTEXT search. One more disadvantage is slower inserts. However, 80% of DB usage in this project are SELECT queries without FULLTEXT. For the fulltext search I created MyISAM table with fields I need and it works fine. The project is finished, but there will be phase 2 and I’m thinking of introducing Sphinx there. Anybody has experience with it?
So I came up with InnoDB storage engine.

Optimizing structure

Next important step is DB structure. I’ll give some background info about the project so you could better understand what I’m talking about. The site collects info from different UK shops. Then users can rate them, add some tags, create articles (called “looks”) and link that articles to the products they’re writing about. And the same (tags, comments, rating) applies to the looks. Here we come across with an interesting question. We can create structure of different types for this:

  • Create 2 separate cross-tables for linking Rating to Look or Product
  • Create one cross-table and add additional field “type” there to differentiate between Product and Look entities

See both variants in the following images:

Normalized variant

Normalized variant


Denormalized version

Denormalized version


Initially I thought that the second variant would perform better, but after creating and running tests, I saw, that:

  • If there are 0-2 ratings linked to entity, then normalized variant is better
  • If there are 3-5 ratings linked to entity, then denormalized variant is slightly (less than 1%) better
  • If there are 6-8 ratings linked to the entity, the normalized variant wins again

I did these tests for different situations similar to this one, in each test I generated random records and randomly linked them with each other. Number of record in Product, Look tables was 10000, in Rating – 20000. Number of records in cross tables varies because the linkings are random.
So I came up with normalizing everything like it should be done according to the DB theory.

The only exclusion of this rule is the following situation. In this project you can add different alternatives to the product. Each alternative is “better” or “cheaper” or “different” etc. And since the only case when these alternatives are pulled from DB is when user views the product, they are pulled altogether, so it is better to keep them in the same table.

Conclusion

  • InnoDB turns out to behave better under the high load, it supports transactions and maintains constraints. All that helps to develop better and never leave orphan records.
  • Using normalization techniques you usually get better performance
  • Create tests and test everything if you are going to do something great :)

Further reading

P.S. I’m looking forward to see your comments on implementation techniques in the high-loaded websites, your thoughts about the InnoDB vs MyISAM question and anything else you can think of :)

Related posts:

  1. Low Coupling and High Cohesion – GRASP (Design patterns series) Low coupling When designing some architecture, you face with the...

Related posts brought to you by Yet Another Related Posts Plugin.

Share this post with a friend Share this post with a friend

9 Comments

  1. Jonah says:

    InnoDB has foreign constraints, MyISAM does not. That’s the breaker for me

  2. Konstantin Mirin says:

    Do you mean that’s good for you or bad? Sorry, English is not the native language for me…

  3. Jonah says:

    I like foreign key constraints. It keeps the integrity of the database and allows for things such as collapsing (like if one row is deleted, all it’s foreign rows can be configured to automatically be deleted). InnoDB has this but not MyISAM. It can be Googled.

  4. Konstantin Mirin says:

    Yes, I googled for the phrase, but didn’t find good results, so I asked about it.
    Sure I know about foreign key constraints. That’s an excellent thing, but I didn’t use cascade deleting, I delete them manually because there should be other things to be done with deleting :)
    However, this feature is really nice.

    I knew about constraints, but main advantage for me was the speed for SELECT queries and row-level lock instead of table-level in MyISAM. That’s more important for the portal :)

  5. jz says:

    In Russian:
    Ограничения внешних ключей бывают очень полезны, если в проекте используется доступ к данным из различных скриптов (PHP, Python, etc.) Тогда не надо задумываться, что и где удалять. А в Yii есть прекрасная возможность задавать свои обработчики событий в модели данных. Например, у меня так удаляются связанные с позицией каталога файлы.

    P.S. Понимаю по-английски, но плохо на нём говорю и поэтому пишу по русски. Дай знать, если так поступать не стоит и лучше пытаться высказывать мысль на плохом английском)
    ——
    English translation by Konstantin:
    Foreign key restrictions may be very useful if you have several methods to access at in you project (from PHP, Python etc). If you have this, you don’t need to think what delete and where. [I guess, he means cascade delete]. And in Yii there is an excellent feature – set event handlers in the data model. For example, I delete product files when item is removed from the catalogue using this technique.
    P.S. I understand English, but I don’t know it good enough to express my thoughts correctly. Let me know if I should rather use bad English instead of good Russian.

  6. Konstantin Mirin says:

    Thanks for commenting! It actually doesn’t matter cause I can translate your comments if I see them interesting :) Keep reading my blog and feel free about commenting.
    Regarding your comment, yes I also use these things. I like setting initial values the beforeValidate (if that’s user ID or other data needed for validation) or beforeSave methods. It’s very convenient to put data integrity tasks to the model instead of controller.

  7. Igor says:

    myisam is better for high read volumes, innodb for high update volumes due to table vs. row locking.

    innodb is journaled, and can recover from crashes where myisam can’t, much like NTFS vs. FAT file systems.

    myisam has full-text indexing, innodb doesn’t.

    innodb has transaction support, commits and rollbacks, myisam lacks these.

  8. Konstantin Mirin says:

    Yes, you’re right.
    InnoDB is better for the high-loaded system because it can recover from crashes and, by the way, it is better than MySIAM even for high reding volumes, see the mysqlperformaceblog (http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/)
    InnoDB beats MyISAM almost everywhere :)

  9. Alexandru Georoceanu says:

    A solution for your search, instead of sphinx, would be Lucene.
    You can either use the Java (faster, better, newer, maybe a bit harder to implement) version ( http://lucene.apache.org/java/docs/ ), or you can use the Zend’s port to PHP version Lucene ( http://framework.zend.com/manual/en/zend.search.lucene.html ) included in the Zend Framework.

    Good luck

Leave a Reply