Database Software and Management for Scientific Research


Image representing MySQL as depicted in CrunchBaseI am trying to decide between database management systems for smallish ecology projects. Generally, only one person will be accessing them at a time and only a few people who work on the project will ever be entering data or performing queries. I would like to get used to a system that will serve me well in the long run, regardless of future projects.

With that in mind, I would like a system that is cheap, works across multiple platforms, and plays well with other programs (i.e. can use SQL, python, R to interact). Ideally, it would have a decent GUI so collaborators less familiar with programming could work with in easily. Also, having a system where data entry forms could be used on an iPad or other tablet would be ideal for field data collection.

My first database experience was with MS Access, which I liked quite a bit. However, it’s only available for Windows machines and I frequently go back and forth between Mac OS and Windows as do collaborators (and I imagine setting up Linux servers in the distant future). I could use a parallels program or emulator to run MS Access on a Mac machine but that’s more of a headache than it’s worth to me. I did use Pendragon software on a Palm Pilot for field data collection. It made for fewer errors during data collection (due to built in checks), no transcription errors copying paper data to the database, and saved time on data entry. For a project with ~100,000 lines of data in the main table and a half dozen related tables collected by a dozen researchers over 5+ years, this was a great system (except that the Palm Pilot batteries back in 2005 didn’t last long enough).

I have very briefly used OpenOffice Base, which is very much like MS Access but it doesn’t look as pretty and lacks a few of the functions. I like the fact that it’s open source, but don’t know much about the pros and cons of Base. At least I can use it on both Mac OS and Windows, but I’ve found at least one critic suggesting that Base isn’t a sufficient replacement for Access (here)

FileMaker Go for iPad

FileMaker Go for iPad (Photo credit: John Federico)

More recently, I’ve used FileMaker Pro 12. It works on both Mac and Windows OS (not sure about Linux). It isn’t cheap but it’s not prohibitively expensive. It comes with a free download of the FileMaker Go app for the iPad, which is nice for setting up field data collection systems. My problem with FileMaker Pro (FMP) is that not as many colleagues have access to it and I just don’t find the work flow as easy or intuitive as MS Access. Maybe if I give it more time, I will get a more efficient workflow. The system of forms and tables is still a bit awkward for me and I haven’t been thrilled with the scripting. Previously, FMP couldn’t use SQL queries, but I believe that it was introduced with enhanced capabilities in version 12 (but I haven’t tried it).

Most recently, I’ve been learning a little about MySQL and trying out MySQL Workbench on my Mac OS. MySQL is a bit more complex and might be harder for collaborators, but it seems like it might be a better system in the long run. I’ve only just tried it one afternoon but the Workbench seems like a decent GUI interface. MySQL is very powerful and free but has a steeper learning curve than MS Access, OO Base, or FMP. It is often used in websites, so could be adapted for use on a tablet, but I am not sure if there is an easy way to do with without programming skills.

There are also other options like PostgreSQL or SQLite but I will probably stick to one of the solutions above (unless someone convinces me that PostgreSQL is the best solution for my purposes). Most comparisons of Relational Database Management Systems (RDBMS) focus on encryption, database sizes, and things that I’m not generally concerned with for my applications.

If you’re still reading, what do you use? What system do you find most useful for day-to-day handling of small research lab data (e.g. no genomics or massive web-based data collection)? For systems other than FMP, do you have a relatively easy collection system for field work on an iPad or other tablet? Is it worth learning SQL/MySQL/Workbench (or PostgreSQL)? What about for working with non-tech-savy collaborators or switch between computers and operating systems? I’d love to hear your thoughts and suggestions. I haven’t found a great overall introduction for practicing scientists (there are good intros to each system individually). In the end, any of these solutions might work for my purposes, but it would be nice to know a bit more before I commit to one.

Resources:

Practical Computing for Biologists offers a great intro to MySQL and getting it set up on a Mac OS.

Ethan White has a nice intro to databases and MS Access on his programming for biologists website

Comparison of FMP, Access, MySQL, PostgreSQL (unfortunately no OpenOffice Base)

This is a nice, brief overview of what a science research lab database should do but not how to get there

Advertisements

10 thoughts on “Database Software and Management for Scientific Research

  1. Hey Dan,
    I don’t think my opinion counts for much because most of my experience is with Filemaker. That said, it’s funny that you feel it’s less intuitive than Access because I felt the exact opposite – I bet this is just a consequence of what one starts with. For me the ideal solution is one that gives me the best database per unit effort because I am time-limited enough as is and don’t need the “perfect database”, just one that works reliably. FM works on iPads and it is really easy to publish something online via the “instant web” feature so you can have people in remote locations enter or view data. I am a happy customer but like I said, I don’t have a means for comparison.

    • Hi Andrew,

      Thanks for the thoughts. I agree that given time limitations it’s best to get a database that works reliably, without wasting extra time on it (although making sure it works reliably with plenty of testing will save time in the long run). I do like the ease of Filemaker Go for the iPad. It was really easy to set up a system for multiple Palm Pilots to upload data to the same database in MS Access. It seemed like that might be a bit of a problem with filemaker. Probably not, but I was having some syncing issues. I think the big thing is taking the time to really learn the system and going through some training/tutorials.

      In part, I just want to spend my time learning a system that is going to fulfill my needs long term because as a postdoc, this is probably the last chance I’ll ever having to devoting time to learning a new system. I’m currently going through the Practical Computing for Biologists book that I linked to in the post. It’s a great resource and learning the basics of MySQL and Python and having them play nicely together inspired me to consider switching to MySQL for most of my database needs. The major limitation of MySQL (or PostgreSQL) for my needs is the ease of remote local data entry (offline on a tablet).

      Thanks again. I hope things are going well in Omaha
      -Dan

  2. This is a great question, and one that I get asked pretty frequently. At some point I should probably write up a full fledged post on the topic, but in short:

    1. The first decision you should make is whether or not you need the ability for multiple users to simultaneously (literally at the same time) add to or modify the database. This is where much of the real power of systems like MySQL and Postgres lie. If you don’t need this (and it sound like given the consideration of Access and Filemaker you really don’t) then I generally consider the overhead of managing these professional grade systems to not be worth the effort. In fact it can be harder for a small group to collaborate using one of these DBMSs because it really requires you to run it on a server, manage permissions, etc., if more than one person is going to use it.

    1b. The one major exception to this is that Postgres has the ability to execute spatially related queries, so if you work with spatial data it can be worth the effort.

    2. Given all of the tradeoffs you’ve described above, one thing to consider is keeping the core database in SQLite, but then interacting with it through the other tools. Access, Filemaker, and pretty much any major database system or programming language (R, Python, etc.) can interact with an SQLite file through ODBC (http://en.wikipedia.org/wiki/ODBC). This means that you can use whatever tools are most suitable (or people are most comfortable with) for entering and querying data, but it’s all stored in a cross platform format that can easily be shared because SQLite databases are stored as a single file (and shared via Dropbox, etc.). There are plenty to standalone GUIs if you prefer them to using Access. I use the Firefox plugin.

    3. I’ve tried to use/teach OO Base and have general found it to be too unstable for serious use and to have too many weird gotchas and bugs for use in the classroom.

  3. FileMaker is proprietary and doesn’t work on Linux, so I would probably grow green & angry if my collaborators used it.

    MySQL (or MariaDB) is very solid, it’s used everywhere, and plays nice with almost all languages and operating systems. It’s always a good idea to know a little about SQL and if you do need to store a lot of data at some point it will work fine too.

    NoSQL databases (e.g.: MongoDB) are quite popular these days. MongoDB is mostly great if you have huge data-sets and need a great deal of flexibility, but it doesn’t seem to be your case.

    • Hi Philippe – Thanks for the thoughts and information. I understand the frustration of dealing with proprietary software that doesn’t work across many platforms. I’m thinking that I will continue with Filemaker when doing small lab projects with undergrads when I want the data collected in the field on the lab iPad. It will be good to learn more SQL and MySQL (or probably MariaDB or PostgreSQL based on Ethan’s comments).
      I hadn’t heard of NoSQL databases before I posted this. It’s good to know about in case I get into situations that would benefit in the future.

  4. Oh, and I forgot…
    4. There are serious concerns about MySQL from the open source community based on a variety of behaviors by Oracle. There is a new drop-in replacement by the original MySQL team called MariaDB (https://mariadb.org/) that lots of folks are actively switching to.

    • Ethan – Thanks for the very thoughtful reply. I found it a difficult topic to figure out via web searches because results were either basics about what databases are or had lots of detail about things that I didn’t understand the importance of (or lack thereof) for my purposes. It seems difficult to find a middle ground on introductory database information, so I really appreciate your suggestions.

      As I mentioned to Philippe, it seems like Filemaker will probably be a good solution for relatively small projects involving field data collection because iPad forms can be created easily and I only need one person accessing the database at a time. I will have to look more into SQLite and associated GUIs though. I think I remember an introduction in your Advanced Programming Course notes, so maybe I’ll start there.

      Good to know that OO Base can be buggy. Not something I want to deal with given the multitude of other good options. I will have to look into MariaDB, but maye even more into PostgreSQL since I will be increasingly working with larger spatially-explicit datasets.

  5. Pingback: Extending the Data Life Cycle | Daniel J. Hocking

  6. Pingback: Year in Review: 2014 | Daniel J. Hocking

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s