Look at a character and do nothing with it: August 2010

Tuesday, August 31, 2010

Using tar to Perform Incremental Dumps

Incremental backup is a special form of GNU tar archive that
stores additional metadata so that exact state of the file system
can be restored when extracting the archive.

GNU tar currently offers two options for handling incremental
backups: ‘--listed-incremental=snapshot-file’ (‘-g snapshot-file’) and ‘--incremental’ (‘-G’).

The option ‘--listed-incremental’ instructs tar to operate on
an incremental archive with additional metadata stored in a standalone
file, called a snapshot file. The purpose of this file is to help
determine which files have been changed, added or deleted since the
last backup, so that the next incremental backup will contain only
modified files. The name of the snapshot file is given as an argument
to the option:

‘--listed-incremental=file’
‘-g file’: Handle incremental backups with snapshot data in file.

To create an incremental backup, you would use
‘--listed-incremental’ together with ‘--create’

(see section How to Create Archives). For example:

$ tar --create \
           --file=archive.1.tar \
           --listed-incremental=/var/log/usr.snar \
           /usr

This will create in ‘archive.1.tar’ an incremental backup of
the ‘/usr’ file system, storing additional metadata in the file
‘/var/log/usr.snar’. If this file does not exist, it will be
created. The created archive will then be a level 0 backup;
please see the next section for more on backup levels.

Otherwise, if the file ‘/var/log/usr.snar’ exists, it
determines which files are modified. In this case only these files will be
stored in the archive. Suppose, for example, that after running the
above command, you delete file ‘/usr/doc/old’ and create
directory ‘/usr/local/db’ with the following contents:

$ ls /usr/local/db
/usr/local/db/data
/usr/local/db/index

Some time later you create another incremental backup. You will
then see:

$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar \
           /usr
tar: usr/local/db: Directory is new
usr/local/db/
usr/local/db/data
usr/local/db/index

The created archive ‘archive.2.tar’ will contain only these
three members. This archive is called a level 1 backup. Notice
that ‘/var/log/usr.snar’ will be updated with the new data, so if
you plan to create more ‘level 1’ backups, it is necessary to
create a working copy of the snapshot file before running

tar. The above example will then be modified as follows:

$ cp /var/log/usr.snar /var/log/usr.snar-1
$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar-1 \
           /usr

You can force ‘level 0’ backups either by removing the snapshot
file before running tar, or by supplying the

‘--level=0’ option, e.g.:

$ tar --create \
           --file=archive.2.tar \
           --listed-incremental=/var/log/usr.snar-0 \
           --level=0 \
           /usr

Incremental dumps depend crucially on time stamps, so the results are
unreliable if you modify a file's time stamps during dumping (e.g.,
with the ‘--atime-preserve=replace’ option), or if you set the clock
backwards.

Metadata stored in snapshot files include device numbers, which,
obviously are supposed to be non-volatile values. However, it turns
out that NFS devices have undependable values when an automounter
gets in the picture. This can lead to a great deal of spurious
redumping in incremental dumps, so it is somewhat useless to compare
two NFS devices numbers over time. The solution implemented
currently is to consider all NFS devices as being equal
when it comes to comparing directories; this is fairly gross, but
there does not seem to be a better way to go.

Apart from using NFS, there are a number of cases where
relying on device numbers can cause spurious redumping of unmodified
files. For example, this occurs when archiving LVM snapshot
volumes. To avoid this, use ‘--no-check-device’ option:

‘--no-check-device’: Do not rely on device numbers when preparing a list of changed files
for an incremental dump.
‘--check-device’: Use device numbers when preparing a list of changed files
for an incremental dump. This is the default behavior. The purpose
of this option is to undo the effect of the ‘--no-check-device’
if it was given in TAR_OPTIONS environment variable
(see TAR_OPTIONS).

There is also another way to cope with changing device numbers. It is
described in detail in Fixing Snapshot Files.

Note that incremental archives use tar extensions and may
not be readable by non-GNU versions of the tar program.

To extract from the incremental dumps, use
‘--listed-incremental’ together with ‘--extract’
option (see section Extracting Specific Files). In this case, tar does
not need to access snapshot file, since all the data necessary for
extraction are stored in the archive itself. So, when extracting, you
can give whatever argument to ‘--listed-incremental’, the usual
practice is to use ‘--listed-incremental=/dev/null’.
Alternatively, you can use ‘--incremental’, which needs no
arguments. In general, ‘--incremental’ (‘-G’) can be
used as a shortcut for ‘--listed-incremental’ when listing or
extracting incremental backups (for more information regarding this
option, see incremental-op).

When extracting from the incremental backup GNU tar attempts to
restore the exact state the file system had when the archive was
created. In particular, it will delete those files in the file
system that did not exist in their directories when the archive was
created. If you have created several levels of incremental files,
then in order to restore the exact contents the file system had when
the last level was created, you will need to restore from all backups
in turn. Continuing our example, to restore the state of ‘/usr’
file system, one would do(12):

$ tar --extract \
           --listed-incremental=/dev/null \
           --file archive.1.tar

$ tar --extract \
           --listed-incremental=/dev/null \
           --file archive.2.tar

To list the contents of an incremental archive, use ‘--list’
(see section How to List Archives), as usual. To obtain more information about the
archive, use ‘--listed-incremental’ or ‘--incremental’
combined with two ‘--verbose’ options(13):

tar --list --incremental --verbose --verbose archive.tar

This command will print, for each directory in the archive, the list
of files in that directory at the time the archive was created. This
information is put out in a format which is both human-readable and
unambiguous for a program: each file name is printed as

x file

where x is a letter describing the status of the file: ‘Y’

if the file is present in the archive, ‘N’ if the file is not
included in the archive, or a ‘D’ if the file is a directory (and
is included in the archive). See section Dumpdir, for the detailed
description of dumpdirs and status codes. Each such
line is terminated by a newline character. The last line is followed
by an additional newline to indicate the end of the data.

The option ‘--incremental’ (‘-G’)
gives the same behavior as ‘--listed-incremental’ when used
with ‘--list’ and ‘--extract’ options. When used with

‘--create’ option, it creates an incremental archive without
creating snapshot file. Thus, it is impossible to create several
levels of incremental backups with ‘--incremental’ option.

Tuesday, August 24, 2010

15 Advanced PostgreSQL Commands with Examples

1. How to find the largest table in the postgreSQL database?


$ /usr/local/pgsql/bin/psql test
Welcome to psql 8.3.7, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

test=# SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
              relname              | relpages
-----------------------------------+----------
 pg_proc                           |       50
 pg_proc_proname_args_nsp_index    |       40
 pg_depend                         |       37
 pg_attribute                      |       30

If you want only the first biggest table in the postgres database then append the above query with limit as:


# SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 1;
 relname | relpages
---------+----------
 pg_proc |       50
(1 row)

relname – name of the relation/table.

relpages - relation pages ( number of pages, by default a page is 8kb )

pg_class – system table, which maintains the details of relations

limit 1 – limits the output to display only one row.

2. How to calculate postgreSQL database size in disk ?

pg_database_size is the function which gives the size of mentioned database. It shows the size in bytes.


# SELECT pg_database_size('geekdb');
pg_database_size
------------------
         63287944
(1 row)

If you want it to be shown pretty, then use pg_size_pretty function which converts the size in bytes to human understandable format.


# SELECT pg_size_pretty(pg_database_size('geekdb'));
 pg_size_pretty
----------------
 60 MB
(1 row)

3. How to calculate postgreSQL table size in disk ?

This is the total disk space size used by the mentioned table including index and toasted data. You may be interested in knowing only the size of the table excluding the index then use the following command.


# SELECT pg_size_pretty(pg_total_relation_size('big_table'));
 pg_size_pretty
----------------
 55 MB
(1 row)

How to find size of the postgreSQL table ( not including index ) ?

Use pg_relation_size instead of pg_total_relation_size as shown below.


# SELECT pg_size_pretty(pg_relation_size('big_table'));
 pg_size_pretty
----------------
 38 MB
(1 row)

4. How to view the indexes of an existing postgreSQL table ?


Syntax: # \d table_name

As shown in the example below, at the end of the output you will have a section titled as indexes, if you have index in that table. In the example below, table pg_attribute has two btree indexes. By default postgres uses btree index as it good for most common situations.


test=# \d pg_attribute
   Table "pg_catalog.pg_attribute"
    Column     |   Type   | Modifiers
---------------+----------+-----------
 attrelid      | oid      | not null
 attname       | name     | not null
 atttypid      | oid      | not null
 attstattarget | integer  | not null
 attlen        | smallint | not null
 attnum        | smallint | not null
 attndims      | integer  | not null
 attcacheoff   | integer  | not null
 atttypmod     | integer  | not null
 attbyval      | boolean  | not null
 attstorage    | "char"   | not null
 attalign      | "char"   | not null
 attnotnull    | boolean  | not null
 atthasdef     | boolean  | not null
 attisdropped  | boolean  | not null
 attislocal    | boolean  | not null
 attinhcount   | integer  | not null
Indexes:
    "pg_attribute_relid_attnam_index" UNIQUE, btree (attrelid, attname)
    "pg_attribute_relid_attnum_index" UNIQUE, btree (attrelid, attnum)

5. How to specify postgreSQL index type while creating a new index on a table ?

By default the indexes are created as btree. You can also specify the type of index during the create index statement as shown below.


Syntax: CREATE INDEX name ON table USING index_type (column);
# CREATE INDEX test_index ON numbers using hash (num);

6. How to work with postgreSQL transactions ?

How to start a transaction ?

# BEGIN -- start the transaction.

How to rollback or commit a postgreSQL transaction ?

All the operations performed after the BEGIN command will be committed to the postgreSQL database only you execute the commit command. Use rollback command to undo all the transactions before it is committed.


# ROLLBACK -- rollbacks the transaction.
# COMMIT -- commits the transaction.

7. How to view execution plan used by the postgreSQL for a SQL query ?

# EXPLAIN query;

8. How to display the plan by executing the query on the server side ?

This executes the query in the server side, thus does not shows the output to the user. But shows the plan in which it got executed.

# EXPLAIN ANALYZE query;

9. How to generate a series of numbers and insert it into a table ?

This inserts 1,2,3 to 1000 as thousand rows in the table numbers.

# INSERT INTO numbers (num) VALUES ( generate_series(1,1000));

10. How to count total number of rows in a postgreSQL table ?

This shows the total number of rows in the table.

# select count(*) from table;

Following example gives the total number of rows with a specific column value is not null.

# select count(col_name) from table;

Following example displays the distinct number of rows for the specified column value.

# select count(distinct col_name) from table;

11. How can I get the second maximum value of a column in the table ?

First maximum value of a column

# select max(col_name) from table;

Second maximum value of a column


# SELECT MAX(num) from number_table where num  < ( select MAX(num) from number_table );

12. How can I get the second minimum value of a column in the table ?

First minimum value of a column

# select min(col_name) from table;

Second minimum value of a column


# SELECT MIN(num) from number_table where num > ( select MIN(num) from number_table );

13. How to view the basic available datatypes in postgreSQL ?

Below is the partial output that displays available basic datatypes and it’s size.


test=# SELECT typname,typlen from pg_type where typtype='b';
    typname     | typlen
----------------+--------
 bool           |      1
 bytea          |     -1
 char           |      1
 name           |     64
 int8           |      8
 int2           |      2
 int2vector     |     -1

typname – name of the datatype

typlen – length of the datatype

14. How to redirect the output of postgreSQL query to a file?


# \o output_file
# SELECT * FROM pg_class;

The output of the query will be redirected to the “output_file”. After the redirection is enabled, the select command will not display the output in the stdout. To enable the output to the stdout again, execute the \o without any argument as mentioned below.

# \o

As explained in our earlier article, you can also backup and restore postgreSQL database using pg_dump and psql.

15. Storing the password after encryption.

PostgreSQL database can encrypt the data using the crypt command as shown below. This can be used to store your custom application username and password in a custom table.

# SELECT crypt ( 'sathiya', gen_salt('md5') );

PostgreSQL crypt function Issue:

The postgreSQL crypt command may not work on your environment and display the following error message.


ERROR:  function gen_salt("unknown") does not exist
HINT:  No function matches the given name and argument types.
         You may need to add explicit type casts.

PostgreSQL crypt function Solution:

To solve this problem, installl the postgresql-contrib-your-version package and execute the following command in the postgreSQL prompt.

# \i /usr/share/postgresql/8.1/contrib/pgcrypto.sql

Wednesday, August 11, 2010

A self-contained perl and cpan for Catalyst applications deployment

Build a local perl and cpan for catalyst applications. Keep it simple and consistent for product releases. All the run-time environment will go to /opt/nmetrics.

1. Build a latest local perl


# tar xzvf perl-5.12.1.tar.gz
# cd perl-5.12.1
# sh Configure -de -Dprefix=/opt/nmetrics
# make
# make test
# make install

2. Local cpan
Nothing special for cpan, just run it from /opt/nmetrics/bin, that's it! All modules installed from CPAN will go into local perl library directory.
For modules released not from CPAN(3rd party modules, licensed modules), just put them into local perl library.
As long as you invoke perl from local, not from system, all the dependencies should self-contained.
Then you can ship /opt/nmetrics as a whole product.

3. Perl 5.12.x deprecated warnings
Perl 5.12.x will warn deprecated features even if you are not using warnings. So if you want to disable this:


use 5.012;
no warnings 'deprecated';
temp_warning();

sub temp_warning {
 # needs to be one level lower than the warnings setting
 warn "Hey Evel Knievel! Deprecation warnings are disabled!" unless
  warnings::enabled( 'deprecated' );
 }

4. Applications
All the applications and scripts must use /opt/nmetrics/bin/perl for product release

5. Use pure perl implementations
As much as possible. Keep product clean in self-contained style.

6. Subversion
__ALL__