Feb 022011
 

Introduction

Backups are a subject I return to semi-frequently with a passion to never be in an “oh shit” scenario.  Last time I built my backup system, bacula with a postgresql DB backend, I determined that I would move to a common database backup script for all of my databases.  Holland fit this bill perfectly with support for postgres, sqlite and mysql.  This allows one command to backup all of my databases on all of my servers and subsequently creates a much simpler bacula configuration (the database job is defined the same as the catalog job).

The Solution

The problem I had when configuring holland to backup postgresql is that there was no example configuration file.  It wasn’t hard to craft a working default postgres configuration and the following is what I came up with (/etc/holland/backupsets/default.conf:

[holland:backup]
plugin = pgdump
backups-to-keep = 1
auto-purge-failures = yes
purge-policy = after-backup
estimated-size-factor = 1.0

[pgdump]
role = postgres

[pgauth]
username = postgres

Conclusion

Setting up holland to backup databases is incredibly easy and flexible.  By having a common backup solution for all databases other configurations become easier and processes can be streamlined.

Aug 132010
 

Introduction

SQL SELECT statements have the following form:

SELECT COLUMNS
FROM TABLES
[WHERE CONDITIONS]
[GROUP BY COLUMNS
[HAVING CONDITIONS]
]
[ORDER BY COLUMNS ASC|DESC]

The way SQL processes these directives is slightly different:

FROM TABLES
[WHERE CONDITIONS]
[GROUP BY COLUMNS
[HAVING CONDITIONS]
]
[ORDER BY COLUMNS ASC|DESC]
SELECT COLUMNS

FROM Clause

FROM TABLE [AS ALIAS] .. TABLE [AS ALIAS]

This lets SQL know which tables are being used for the query and any aliases they might be referenced as in the query. The aliases are only _necessary_ when an inner join is performed.

Examples

  • FROM table1, table2
  • FROM table1 as t1, table2 as t2
  • FROM table1 as t1a, table1 as t1b

This last example is a join and requires the aliases.

WHERE Clause

WHERE CONDITIONS

This lets SQL know which rows should be selected from the table based on the conditions passed. The conditions can also be combined with the logical `or` and `and` operators (Which should be properly parenthesized to demonstrate priority).

The following operators can act on columns in a where clause:

* =
* <> (also !=)
* >
* <
* >=
* <=
* BETWEEN
* LIKE
* IN

Examples

  • WHERE t1.c1 = “string”
  • WHERE t1.c1 = t2.c1 and t1.c2 <> t2.c2
  • WHERE t1.c1 = t2.c1 and (t1.c2 <> t2.c2 or t1.c3 <> t2.c3)
  • WHERE t1.c1 in (value set)

This last example demonstrates using either an explicit set or a subquery where the subquery returns a list of values that filter this main query.

GROUP BY Clause

GROUP BY COLUMNS

This lets SQL know to group the table for aggregate filtering operations. For example if you need to sort based on the max sales of your sales people you could group by your sales members’ identifiers and then use a HAVING clause to filter based on their max(sales_amount). This clause is only really useful when a HAVING clause is useful.

Multiple GROUP BY columns just restricts the groupings to be tighter and tighter. For example, if you have three columns t1, t2, and t3 and you use a GROUP by t1, t2 you will end up with the following groupings:

——————————————————————————-
t1 and t2 are all the same in this group; t3 varies
——————————————————————————-
t1 and t2 are all the same in this group; t3 varies
——————————————————————————-

Examples

  • GROUP BY c1, c2
  • GROUP BY c1

HAVING Clause

HAVING CONDITIONS

This is very similar to the WHERE clause and the logical operators `and` and `or` can be used as in the WHERE clause. The difference here is that typically you’ll be filtering based on an aggregate operation on an ungrouped column to filter out groups.

Examples

  • HAVING max(t3) > n
  • HAVING average(t3) between x and y

ORDER BY Clause

ORDER BY COLUMNS ASC|DESC

This lets SQL know you want to sort the specified columns ascending or descending. The sorting will be applied to the columns in the order that they are specified. Thus it works similar to the way groups work it makes groups out of the first specification and then the second and so on. Performing the new operation only within the context of the previous. Thus the following data would be sorted as shown:

Before Sorting:

—————-
| c1 | c2 | c3 |
—————-
| aa | aa | aa |
| bb | bb | bb |
| aa | dd | ee |
| cc | cc | cc |
| aa | bb | cc |
—————-

After Sorting (ORDER BY c1, c2, c3):

—————-
| c1 | c2 | c3 |
—————-
| aa | aa | aa |
| aa | bb | cc |
| aa | dd | ee |
| bb | bb | bb |
| cc | cc | cc |
—————-

Examples

  • ORDER BY c1
  • ORDER BY c1, c2, c3

SELECT Clause

SELECT COLUMNS

This lets SQL know which columns (or what projection) of the table to actually display. One can also specify aggregate functions here to perform functions such as counting, averaging, etc.

Examples

  • SELECT col1, col2
  • SELECT col1, t1.col2
  • SELECT COUNT(col1)

Conclusion

Remember that SQL SELECT statements are not processed in the order that they are parsed. This will simplify the query building process to think of it as operations on a set of data (since that is what it is). The steps are as follows:

  1. Select the set to act on.
  2. Filter out elements from the set.
  3. Group the remaining elements.
  4. Filter out groups of elements.
  5. Sort the elements.
  6. Get the projection of the elements’ attributes.

All of this somehow translates to the SQL SELECT statement syntax we started this discussion with:

SELECT COLUMNS
FROM TABLES
[WHERE CONDITIONS]
[GROUP BY COLUMNS
[HAVING CONDITIONS]
]
[ORDER BY COLUMNS ASC|DESC]

References

Holland on Gentoo

 Linux Guides  Comments Off
Aug 012010
 

Introduction

There is a new king of backups in town, holland. This little framework written in Python allows one to easily backup anything that might need to be converted to a more flat file style before being backed up. Right now there is support for mysql, sqlite, and postgresql but with a little finesse it could potentially support directories as well as databases. This would make not only mysql backups a breeze but LDAP as well.

Progress Update

I have added a preliminary set of ebuilds to my overlay (which could use some code review if anyone is interested) that allows holland to easily be installed on Gentoo systems. So easy in fact that all it takes is `emerge holland`.

It accepts a set of use flags to bring in the “providers” you want to be able to backup for and makes sure that those packages are installed on the system.

Examples

The holland ebuilds have three providers right now:

  • mysql
  • postgresql
  • sqlite

You can install any of these three you want in any combination; it doesn’t care. It will default to installing the mysql but can easily be told not to by placing -mysql in the use flags for holland. Diego Pettenò — Flameeyes mentioned to me that in EAPI 4 we’ll get the cool option of being able to specify one of a set of use flags must be set without forcing the choice but until then we have this slick solution.

There is also lvm support for snapshotting off the database directory before grabbing the database and a myriad of other features I haven’t had a chance to explore yet.

To perform a rudimentary backup after installing holland simply run `holland bk`. This will read the configurations in `/etc/holland` and backup the databases it finds.

Conclusion

The new kid on the block, holland, will make backups of complex databases and directories a breeze. Simply change that cronjob from using mysqldump to calling holland and you’re finished.

 

A hot topic for anyone running MySQL and Apache on any machine is optimization and I hope to quickly explain how to optimize MySQL and where to tweak Apache to help with it’s operational efficiency as well.

First things first …

A few common things to possibly turn on, install, or enable:

  • APC (Another PHP Cache)
  • Memcached

MySQL Optimization

The easiest way (and the quickest) to determine what can be done to optimize MySQL is to use a script called mysqltuner.pl. The really interesting thing about this script is that to get it you simply run the following:

wget mysqltuner.pl

After you’ve gotten this script run it with your perl interpreter:

perl mysqltuner.pl

It will prompt you for your username and password for mysql and then print out a nice report outlining how your mysql daemon has been running. Here’s an example of this output:

>> MySQLTuner 1.0.1 – Major Hayden
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with ‘–help’ for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:

——– General Statistics ————————————————–
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.90-log
[OK] Operating on 32-bit architecture with less than 2GB RAM

——– Storage Engine Statistics ——————————————-
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 28M (Tables: 96)
[--] Data in InnoDB tables: 1M (Tables: 55)
[!!] Total fragmented tables: 2

——– Performance Metrics ————————————————-
[--] Up for: 1h 39m 49s (24K q [4.163 qps], 354 conn, TX: 9M, RX: 34M)
[--] Reads / Writes: 60% / 40%
[--] Total buffers: 148.6M global + 960.0K per thread (100 max threads)
[OK] Maximum possible memory usage: 242.3M (11% of installed RAM)
[OK] Slow queries: 0% (0/24K)
[OK] Highest usage of available connections: 28% (28/100)
[OK] Key buffer size / total MyISAM indexes: 64.0K/6.3M
[OK] Key buffer hit rate: 99.6% (164K cached / 682 reads)
[OK] Query cache efficiency: 81.3% (14K cached / 17K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 463 sorts)
[!!] Joins performed without indexes: 42
[OK] Temporary tables created on disk: 12% (54 on disk / 417 total)
[OK] Thread cache hit rate: 70% (106 created / 354 connections)
[!!] Table cache hit rate: 2% (48 open / 2K opened)
[OK] Open file limit used: 9% (95/1K)
[OK] Table locks acquired immediately: 98% (7K immediate / 7K locks)
[!!] Connections aborted: 7%
[OK] InnoDB data size / buffer pool: 1.0M/4.0M

——– Recommendations —————————————————–
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
MySQL started within last 24 hours – recommendations may be inaccurate
Adjust your join queries to always utilize indexes
Increase table_cache gradually to avoid file descriptor limits
Your applications are not closing MySQL connections properly
Variables to adjust:
join_buffer_size (> 128.0K, or always use indexes with joins)
table_cache (> 48)

The first thing to notice is that mysql should be running for 24 to 48 hours before you trust the results of this script. The second thing to acknowledge is that it isn’t omniscient. Understanding how these parameters affect your memory footprint and your runtime efficiency are what we’re most interested in though.

Glossary of Parameters

The best source of my.cnf parameters is of course the MySQL documentation, but I’ll clarify a few of the more pertinent items from the above output.

  • Maximum Possible Memory – This is not the actual hard limit on your memory usage but a calculated theoretical maximum based on the parameters in your my.cnf and if this is over ~90% you may use your swap more than you want to. After making any changes I suggest you check this to make sure you haven’t over-allocated yourself.
  • Highest Usage of Available Connections – This is the historical max connections used by mysql and if it’s at 100% this isn’t a guarantee that you need to increase max_connections but may just indicate that your connections spiked once. That being said I recommend following its advice if you have the memory for it.
  • Query Cache Efficiency – The hit rate for the cache on your queries (I hope you have enabled). As with any hash table if you’re not hitting what’s in the top bucket you need to increase the number of buckets.
  • Temporary Tables Created on Disk – This can be a finicky parameter to get right and is the first place I would recommend cutting if you don’t have the memory to run mysql like you would want, but if you do have the memory then up this as far as you can within reason.
  • Thread Cache Hit Rate – Just like any other cache.
  • Table Cache Hit Rate – Just like any other cache.

Fragmented Tables

If you noticed the fragmented tables along with the recommendation to run OPTIMIZE TABLE that’s not something that will typically slow mysql down noticeably with today’s disks, but if you want to defragment them I recommend reading this article on the topic.

Apache Tweaking

The following parameters are usually in httpd.conf unless you run a distribution that organizes apache in an easier to work with manner (my preference of course is Gentoo which stores these parameters in: 00_mpm.conf).

The parameters for tweaking apache’s processes depend on which worker module you have in use. The following is the pertinent sections from my configuration:

#prefork MPM
# This is the default MPM if USE=-threads
#
# MinSpareServers: Minimum number of idle child server processes
# MaxSpareServers: Maximum number of idle child server processes

StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 50
MaxRequestsPerChild 500

# worker MPM
# This is the default MPM if USE=threads
#
# MinSpareThreads: Minimum number of idle threads available to handle request spikes
# MaxSpareThreads: Maximum number of idle threads
# ThreadsPerChild: Number of threads created by each child process

StartServers 2
MinSpareThreads 15
MaxSpareThreads 50
ThreadsPerChild 25
MaxClients 150
MaxRequestsPerChild 1000

These parameters are pretty self explanatory but just for completeness’ sake I’ll do a small recap on these here:

* StartServers – The number of servers to have running and handling connections at first launch.
* MinSpareServers – The minimum number of servers to have running not currently handling connections.
* MaxSpareServers – The maximum number of servers to have running not currently handling connections.
* MaxClients – The maximum number of clients that can simultaneously connect to Apache.
* MaxRequestsPerChild – The maximum number of requests that a child will respond to before terminating.

Using these parameters we can control the amount of memory that Apache will consume at the cost of other things we may want to have like lots of connections or many sequential requests.

Conclusion

There are a lot of different things that can be done for a server that seems to be performing less than ideally and I’ve only covered a fraction of things that can happen. As always with this topic the situation will change based on your needs and how your server runs, but at least now you may have a starting point for what to modify after you’ve troubleshooted every other thing that could go wrong.

© 2011 Alunduil's Hosting Suffusion theme by Sayontan Sinha