Plpgsql Print Table Assignments

First three parts translated by Martin Řehák, parts 4-6 translated by Karolína Lehotská, parts 7-10 translated by Lucie Prunarová, parts 11-14 translated by Zdeněk Růžička

The PostgreSQL RDBMS, like any true RDBMS, allows users to create and use stored procedures. A stored procedure is database logic code, saved on the SQL server, and executed later. In PostgreSQL, stored procedures can be created using any of these languages: SQL, Perl, Python, TCL, and PL/pgSQL. PL/pgSQL is the most frequently used language for writing stored procedures. PL/pgSQL is a simple ADA-based programming language with SQL support. This language was significantly inspired by Oracle's PL/SQL language. It is simple, strong, but it cannot be used for creating custom PostgreSQL's data types.

This document is designed for PostgreSQL 8.1 or higher

When PL/pgSQL is not applicable

There are some limitations on when stored procedures can be used. Code in stored procedures must be absolutely non-interactive. Since most applications need some level of interaction, user experiences suggest it is not a good design to try to write an application entirely through stored procedures. The SQL server cannot completely replace the application server; so it is up to the designer to compare the advantages and disadvantages of coding in the database layer versus the application layer. Some limits are included in PL/pgSQL too. This language is strictly static, this language isn't designed for heavy mathematic, string or array operations (the best using of PL/pgSQL is like glue for SQL statements).

PL/pgSQL is not useful in initializations of bigger arrays, because any array update can be slow. If there’s no other way, it’s much faster to create fields of a array by conversion table (by SRF function) into the array. For example the fastest way to populate a field with incrementing values is SELECT ARRAY(SELECT * FROM generate_series(1,100));. The function generate_series is an SRF function generating an incrementing set of values over the entered interval. Same principes are true for strings too. When you’re not satisfied with the speed of a function, try to use the following trick by converting a SRF function – replace the part of string concation by RETURN NEXT command. The final concation is situated out of body, with use of array_to_string command.

CREATE OR REPLACE FUNCTION generate_string(integer) RETURNS SETOF varchar AS $$ BEGIN FOR _i IN 1 .. $1 LOOP RETURN NEXT '<item>'||_i||'</item>'; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; SELECT array_to_string( ARRAY(SELECT * FROM generate_string(1000)), '');

There‘s no support for I/O operations in present and it’s not expected in the near future - PL/pgSQL exists in trusted version only. The solution is to use an untrusted language, like pl/perl or pl/python. PL/pgSQL is not useful to create general triggers. In PL/pgSQL it’s not possible to convert values of type RECORD into fields [a RECORD item is accessible only through a static identifier, known at compilation time; a field could be accessed due to a dynamic index.] So there is no choice of dynamic iteration through all RECORD type items, or of dynamic access to RECORD type. Again, a solution is to use another language; this time a trusted variant could also work.

Inappropriate use of the PL/pgSQL language

PL/pgSQL is a very simple language and mainly very simple interpreted commands. It hasn't own "arithmetic" unit. All commands are translated to a series of SQL statements. Usually "simple SQL" expressions are used - PostgreSQL can evaluate it much faster than full SQL statement, but still slower than truly compiled integrated functionality. So - importantly - think first and use integrated functionality everywhere when it is possible. The following code is an example of bad programming style (I found this code on net):

create or replace function check_phone_number (text) returns boolean as $$ declare _str alias for $1; _char text; i int4; begin i=1; while true loop if length (_str) != 16 then exit; end if; _char := substring(_str from i for 1); if _char != '+' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char != ' ' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char != ' ' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char != ' ' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; i:=i+1; _char := substring(_str from i for 1); if _char < '0' or _char >'9' then exit; end if; return true; end loop; raise exception e'wrong char \'%\' on position %. Use format +XXX XXX XXX XXX', _char, i; return false; -- << death code end; $$ language plpgsql;

This code is one big mistake. Long, with repeated code - this code will be really slow in PL/pgSQL. When we need similar functionality, we have to use a regular expressions:

CREATE OR REPLACE FUNCTION check_phone_number(text) RETURNS boolean AS $$ BEGIN IF NOT $1 ~ e'^\\+\\d{3}\\ \\d{3} \\d{3} \\d{3}$' THEN RAISE EXCEPTION 'Wrong formated string "%". Expected format is +999 999 999 999'; END IF; RETURN true; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE;

Still this code isn't good. It returns true or raises exceptions. I prefere little bit different design (less restrictive with availability to normalise string:

CREATE OR REPLACE FUNCTION normalise_phone_number(text) RETURNS text AS $$ DECLARE aux text := translate($1, ' ',''); BEGIN IF aux ~ e'^\\+\\d{12}$' THEN RETURN substring(aux FROM 2); ELSEIF aux ~ e'^\\d{9}$' THEN RETURN aux; ELSE RAISE EXCEPTION 'Unexpected format of phone number "%".', $1; END IF; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE;


postgres=# select normalise_phone_number('+420 222 222 222'); normalise_phone_number ------------------------ 420222222222 (1 row) postgres=# select normalise_phone_number('+420 222 222222'); normalise_phone_number ------------------------ 420222222222 (1 row) postgres=# select normalise_phone_number('+420 22 222222'); ERROR: Unexpected format of phone number "+420 22 222222". postgres=# select normalise_phone_number('724 191 000'); normalise_phone_number ------------------------ 724191000 (1 row)

Don't use PL/pgSQL for deep recursion calls

Due internal design principles PL/pgSQL should not be well optimized for recursion calls. PL/pgSQL supports recursion, and for not too deeps calls can provide enough to satisfy sb performance, but it is very slow for deep recursion. Nice example is taken from presentation It is perfect example how don't use PL/pgSQL ever.

-- non recursion form CREATE OR REPLACE FUNCTION public.psqlfibnr(n integer) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ DECLARE prev1 int = 0; prev2 int = 1; result int = 0; BEGIN FOR i IN 1..n LOOP result := prev1 + prev2; prev2 := prev1; prev1 := result; END LOOP; RETURN result; END; $function$ -- recursion form CREATE OR REPLACE FUNCTION public.psqlfibr(n integer) RETURNS integer LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ BEGIN IF n < 2 THEN RETURN n; END IF; RETURN psqlfib(n-1) + psqlfib(n-2); END; $function$ -- non recursive calls postgres=# select n, psqlfibnr(n) from generate_series(0,35,5) as n; n | psqlfibnr ----+----------- 0 | 0 5 | 5 10 | 55 15 | 610 20 | 6765 25 | 75025 30 | 832040 35 | 9227465 (8 rows) Time: 1.178 ms -- recursive calls postgres=# select n, psqlfib(n) from generate_series(0,35,5) as n; n | psqlfib ----+--------- 0 | 0 5 | 5 10 | 55 15 | 610 20 | 6765 25 | 75025 30 | 832040 35 | 9227465 (8 rows) Time: 282992.820 ms

Speed of recursion calls in PL/pgSQL is not comparatable with recursion optimized languages like Javascript.

When SQL procedures will be faster than PL/pgSQL procedures?

Answer is simple. SQL procedures will be faster everytime when we can use a integrated functionality. I use a bubble sort as example - but it can be quick sort too. You can implement it in PL/pgSQL without any problem, but your code will be always slower (sometimes significantly slower) than simple SQL function that uses integrated quicksort:

CREATE OR REPLACE FUNCTION sort(anyarray) RETURNS anyarray AS $$ SELECT array(SELECT * FROM unnest($1) ORDER BY 1); $$ LANGUAGE sql;

Introduction into PL/pgSQL language

PL/pgSQL does not create new types or its own functions. It shares both with the database system generally. PL/pgSQL functions can contains almost all of SQL statements. It has a FOR statement for iteration over SELECT's result. We can use a dynamic SQL - statement EXECUTE. Authors were surely inspired by the PL/SQL language, which is the native programming language for Oracle RDBMS, so it’s not difficult to convert stored procedures from Oracle to PostgreSQL or back. It is possible to use Pl/pgSQL in PostgreSQL for implementation of own aggregates or normal functions, operators, and for the implementation trigger handlers. Each new version of PostgreSQL improves the possibilities of stored procedures, so CREATE FUNCTION statement are enhanced in any PostgreSQL version too. CREATE FUNCTION statement is used to define new custom functions:

CREATE [OR REPLACE] FUNCTION name ([IN|OUT|INOUT] name type [,...]) RETURNS suggest_type AS $$ {body of function} $$ LANGUAGE plpgsql [IMMUTABLE | STABLE | VOLATILE] [CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT] [EXTERNAL SECURITY INVOKER | EXTERNAL SECURITY DEFINER] I recommend writing the function code in your favorite plain-text text editor. Save the code into a file; then, in psql, use \i name_of_file to run the command(s) saved in the file. I definitely advise against writing functions directly in psql or pgAdmin. To use PL/pgSQL it is necessary to enable the language for the database. The command createlang –L your_db should create a table, including a line reading plpgsql | t If the table doesn’t include this line, PL/pgSQL must be enabled for this database using the following command (which requires Postgres super-user rights): createlang plpgsql your_db

When PL/pgSQL is enabled, test it by writing a simple function:

CREATE OR REPLACE FUNCTION Sum(a int, b int) RETURNS int AS $$ BEGIN /* * My first trivial PL/pgSQL function. */ RETURN a + b; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

If you rewrite the function without bugs, you can test it with SELECT Sum(10,10). In the column sum (the column name is the same as the function's name) the value 20 should appear. In Pl/pgSQL, comments are written the same as in SQL: single-line comments begin with a double dash; multi-line comments have C notation, like /* comment */. Execution of any PL/pgSQL function must be finished by the RETURN statement.

The function has two parts: variable declaration (before the variable's first use) and body. The body is a list (block) of commands separated by semicolons; this block lies between the key words BEGIN and END. The RETURN command ends execution of the function and returns the result. Older versions of PostgreSQL don’t support giving labels to the function's arguments--only positional notation was supported. As a result, you may see this as a $-sign the number of the parameter in sequence (so the first parameter would be $1, the second parameter would be $2, and so on). The default state of function parameters is IN. All of parameters are constants within the function. Blocks are not used for range limitation of control structures, such as IF, WHILE, FOR, LOOP (in contrast to Pascal); they serve only to limit the existence of local variables. It is possible to overload the function; we can have several definitions of a function, which vary in the number and type of arguments. All identifiers, like SQL key words, are case-insensitive. In case of complication, all strings (except those quoted within double apostrophes) are translated to lower case. The assignment operator is := in PL/pgSQL (as in Pascal). It is also possible to assign the result of a SQL query using the SELECT INTO construction:

SELECT INTO target aspect FROM ...;

The result may be a variable of type RECORD, a row, or a list of variables. During administration, check if the result corresponds to what is expected (in number and types). If the result contains multiple lines, the first line's values are used. Variables of type RECORD are never set to NULL. Beyond the test value NULL, which mustn’t be unique always, we can test the value of the built-in variable FOUND, which will be TRUE if the query returns at least one line. It is also possible to save the number of processed lines into a variable by using the GET command after each SQL command.


Testing FOUND only works after a SELECT INTO. The next two example functions return the number of lines in the "name" table. The PERFORM instruction in the first example is used to execute a SQL command or function when no further processing of the data is done.


After the DECLARE command, we can declare a list of several variables and their types, separated by semicolon. A variable can be set to an initial value (which is re-calculated during each entry into block), or to NULL, or also indicated as a constant. If these rules are broken, the PL/pgSQL environment will stop execution of the procedure. All variables defined as NOT NULL must have set up a DEFAULT value.

DECLARE name[CONSTANT] type [NOT NULL][DEFAULT|(:=)constant|parameter of function]

Aside from ordinary types, we can also use the RECORD type, which can include a line of random tables or so-called derived types. It is based upon either a line of concrete tables (name _tab%ROWTYPE), columns name_tab.nm_column%TYPE) or previously declared variables (variable%TYPE). You can access single elements within a RECORD or a row using dot notation.

PL/pgSQL code can contain any SQL commands. For instance, a function returning the name of the month on basic his index. PL/pgSQL developers should beware of giving PL/pgSQL variables the same names as database objects – the effects of this namespace collision may be hard-to-find errors. One solution is to use prefixes on PL/pgSQL variables. For instance, PL/pgSQL variables could be prefixed with the underscore symbol "_". There is an additional trap waiting for users of PL/SQL--while Oracle resolves collisions between column names and local variables in favor of the column, PostgreSQL does the opposite. So, in Oracle, you can resolve namespace collisions by specifying the definition of the local variable. In PostgreSQL, thanks to the higher priority of local variables, this will not resolve the conflict.

CREATE OR REPLACE FUNCTION MonthName1 (_month INTEGER) RETURNS VARCHAR(10) AS $$ DECLARE result varchar; BEGIN IF _month <1 OR _month > 12 THEN RAISE EXCEPTION E'Parameter is outside acceptable limits!\n'; END IF; SELECT INTO result CASE _month WHEN 1 THEN 'January' WHEN 2 THEN 'February' WHEN 3 THEN 'March' WHEN 4 THEN 'April' WHEN 5 THEN 'May' WHEN 6 THEN 'June' WHEN 7 THEN 'July' WHEN 8 THEN 'August' WHEN 9 THEN 'September' WHEN 10 THEN 'October' WHEN 11 THEN 'November' WHEN 12 THEN 'December' END; RETURN result; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

You can see used SELECT INTO SQL statement. You can see IF THEN statement. The construction IF...THEN...END IF is clear. The first argument of the RAISE command is a level of exception. We have three possibilities at our disposal: DEBUG – signs on block, NOTICE – announces the error to the user, EXCEPTION – stops the function's execution. The second argument to RAISE is the text of the error report (we write this in double apostrophes). If the symbol % appears in the text, this symbol is replaced by the value of a variable, which comes as the third, fourth, etc. argument to RAISE. RAISE NOTICE is usually used to display tuning reports or the values of variables. Basically it is only one tuning instrument which at our disposal.

Note that the CASE command is parameterised: it includes a variable. It is possible to use parameters in all SQL commands, everywhere, where some value is expected. It is not possible to parameterise names of columns and tables. It is possible to work around this limitation (at the cost of decreased performance and increased effort) using dynamic queries (see the below example with EXECUTE).

Interpret PL/pgSQL uses cached plans for SQL statements. PostgreSQL 8.2 and older doesn't allow refresh these cached plans, when these plans are invalid (by example - after drop temporary table). There was two solutions - a) using a dynamic sql (without execution plan cache), b) newer to drop temporary tables.

In the following example we will create a table and fill it with values of the sine function in a set range. We use the WHILE construction and the parameterised SQL command INSERT. The table tabsin is created only if it doesn‘t yet exist.

CREATE OR REPLACE FUNCTION tabsin(_p1 float, _p2 float, _p3 float) RETURNS void AS $$ DECLARE _i float = _p1; _do float = _p2; _step float = _p3; BEGIN IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname = 'tabsin' AND relkind = 'r' AND pg_table_is_visible(oid)) THEN RAISE NOTICE 'Create table tabsin'; CREATE TEMP TABLE tabsin (x NUMERIC(5,4) PRIMARY KEY, fx NUMERIC(5,4)); ELSE RAISE NOTICE 'Deleting all records from table tabsin'; TRUNCATE TABLE tabsin; END IF; WHILE _i < _do LOOP INSERT INTO tabsin VALUES(CAST(_i AS NUMERIC(5,4)), SIN(_i)); _i := _i + _step; END LOOP; END; $$ LANGUAGE plpgsql;

The command tabsin creates table for an interval from -1 to 1 with step 0.0001. The execution time is very quick. Inserting twenty thousand rows takes two seconds on my 1.6 G notebook. We can check the values with a SELECT command:

SELECT * FROM tabsin WHERE x=0.1234;

You can leave a LOOP ..END LOOP construction using the EXIT command, with a user-defined label (which identifies the loop which should be stopped) and a user-defined condition. The label should be placed at the end of the loop.

<<main>> LOOP EXIT main WHEN c > 10; RAISE NOTICE ''%'', c; c := c + 1; END LOOP main;

Custom PL/pgSQL functions concept

The most important thing on PL/pgSQL functions (let us say by all UDF functions) is a fact, that we can use in queries – the whole proces of execution dates is simplificated by it. We can processed dates right on the server.

One of the functions missing in PostgreSQL is establishing Easter. It is used for establishing bank holidays in planning functions. I used Carter's algorithm for mark out the Easter Sunday.

CREATE OR REPLACE FUNCTION eastrsunday(_year INTEGER) RETURNS DATE AS $$ DECLARE B INTEGER; D INTEGER; E INTEGER; Q INTEGER; DD INTEGER; MM INTEGER; BEGIN IF _year < 1900 OR _year > 2099 THEN RAISE EXCEPTION 'Out of range'; END IF; B := 255 - 11 * ($1 % 19); D := ((B - 21) % 30) + 21; IF D > 38 THEN D := D - 1; END IF; E := ($1 + $1/4 + D + 1) % 7; Q := D + 7 - E; IF Q < 32 THEN DD:=Q; MM := 3; ELSE DD := Q - 31; MM := 4; END IF; RETURN to_date(to_char(DD, '00') || to_char(MM, '00') || to_char(_rok,'0000'), 'DD MM YYYY'); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

The next event, in which is the UDF function necessary, is the usage of cleaning functions for pairing notations in the table. These functions represent last rescue, when we do selections beyond historical databases, where no one take care of domains integrity. I work with column of values in the example, where data can contains strings like "records","rec","rec.", "rc", etc. My task is replace these variant strings by one string:

CREATE OR REPLACE FUNCTION clean(VARCHAR) RETURNS VARCHAR AS $$ DECLARE pom varchar DEFAULT $1; BEGIN pom := to_ascii(lower(trim(both FROM pom))); pom = replace(pom,'rec' ,'records'); pom = replace(pom,'rec.' ,'records'); pom = replace(pom,'rc','records'); /* replace multiple spaces by single space */ WHILE position(' ' IN pom) <> 0 LOOP pom := replace(pom, ' ',' '); END LOOP; RETURN trim(both FROM pom); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

Usage of PL/pgSQL for administration of database

Suitable designed saved procedures make administration of database measurably easier. I use following function for clearing RDBMS from databases that students create during a term. If the function is started without parameters or if the amount of arguments does not agree as well, it shows help.

CREATE OR REPLACE FUNCTION drop_students_databases() RETURNS INTEGER AS $$ DECLARE helpstr VARCHAR(300); BEGIN helpstr  := E'Function for dropping databases of users whose\n' || E'name (the name of user) suits the mask of order LIKE.\n\n' || E'e.g. SELECT drop_students_databases(' || quote_literal('group%') || E')\n\n' || E'note you have to be loged in as user postgres.\n' || E'author: Pavel Stěhule, 19.1.2002\n'; RAISE NOTICE '%', helpstr; RETURN 0; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION drop_students_databases(_mask varchar(100)) RETURNS INTEGER AS $$ DECLARE db RECORD; deleted INTEGER := 0; helpstr VARCHAR(300); BEGIN IF length(_mask) = 0 OR _mask ISNULL OR current_user() <> 'postgres' THEN helpstr  := E'Function for dropping databases of users whose\n' || E'name (the name of user) suits the mask of order LIKE.\n\n' || E'e.g. SELECT drop_students_databases(' || quote_literal('group%') || ')\n\n' || E'note you have to be loged in as user postgres.\n' || E'author: Pavel Stěhule, 19.1.2002\n'; IF current_user <> 'postgres' THEN RAISE EXCEPTION E'Function might be called only by user postgres\n\n%', helpstr; ELSE RAISE EXCEPTION E'Argument of the function must be a non-empty string\n\n%', helpstr; END IF; END IF; FOR db IN SELECT datname, usename FROM pg_database, pg_user WHERE datdba=usesysid AND usename LIKE _mask LOOP deleted := deleted + 1; RAISE NOTICE 'Drop database %,%', db.datname, db.usename; EXECUTE 'drop database ' || quote_ident(db.datname); END LOOP; RETURN deleted; END; $$ LANGUAGE plpgsql;

Notice the construction of FOR IN LOOP END LOOP which iterates through the result of SQL command (in this case list of databases whose owners suit the mask LIKE). Since names may contain gaps (or some special chars), it is needed to type the database name in quotation marks (by quote_ident). We have to use EXECUTE when we would to use a variable on column or table position.

Using PL/pgSQL language for custom functions

PostgreSQL doesn't support function next_day (coming from Oracle). You can use a PL/pgSQL language for implementation functions that simplify porting of application from someone database to PostgreSQL:

CREATE OR REPLACE FUNCTION next_day(IN d date, IN day varchar) RETURNS date AS $$ DECLARE id integer dow integer; BEGIN dow := EXTRACT(dow FROM d); id := CASE lower(substring(day FROM 1 FOR 3)) WHEN 'sun' THEN 0 WHEN 'mon' THEN 1 WHEN 'tue' THEN 2 WHEN 'wed' THEN 3 WHEN 'thu' THEN 4 WHEN 'fri' THEN 5 WHEN 'sat' THEN 6 END; IF id IS NULL THEN RAISE EXCEPTION 'Wrong identifier for day ' || quote_literal(day); END IF; RETURN CASE id <= dow WHEN true THEN d + (id - dow + 7) WHEN false THEN d + (id - dow) END; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

Usage of PL/pgSQL in CHECK conditions

We can use functions in PL/pgSQL as a control in CHECK conditions. These functions have to have only one argument of corresponding type and if the value of the argument is NULL, then it must return NULL. This behaviour may be treated programmatic or we may use attribute isstrict which ensure that whenever at least one of the arguments is NULL, then we use the value NULL like the result of function without executing the function. The following example tests the validity of ISBN code (although supplement isbn_issn exists, it will not be installed in most cases). Notice the usage of parameter weight of array type.

CREATE OR REPLACE FUNCTION check_ISBN(ISBN CHAR(12)) RETURNS boolean AS $$ DECLARE pos INTEGER; asc INTEGER; sum INTEGER DEFAULT 0; weight INTEGER[] DEFAULT '{10,9,8,7,6,5,4,3,2,1}'; -- for ISSN {8,7,6,5,4,3,2,1} digits INTEGER DEFAULT 1; BEGIN FOR pos IN 1..length(ISBN) LOOP asc := ascii(substr(ISBN,pos,1)); IF asc IN (88, 120) THEN -- ISDN might contain control number X sum := sum + 10; digits := digits + 1; ELSIF asc >= 48 AND asc <= 57 THEN sum := sum + (asc - 48)*weight[digits]; digits := digits + 1; END IF; END LOOP; IF digits <> 11 THEN -- for ISSN <> 9 RETURN 'f'; ELSE RETURN (sum % 11) = 0; END IF; END; $$ LANGUAGE plpgsql STRICT IMMUTABLE;

We can use the defined function for CHECK condition, e.g. (column isbn):

CREATE TABLE ZCHU_ReservationBooks ( id INTEGER PRIMARY KEY DEFAULT nextval('doc_seq'), idoo INTEGER NOT NULL, -- subject of protection id taboo CHAR(1) NOT NULL CHECK(taboo IN ('m','v','s')), isign varchar(50) NOT NULL UNIQUE CHECK (isign <> ''), typ CHAR(2) NOT NULL REFERENCES cv_TypDocRK(kod), author_surname VARCHAR(30) NOT NULL CHECK (author_surname <> ''), author_name VARCHAR(30) NULL CHECK (author_name <> ''), title VARCHAR(100) NOT NULL CHECK (nazev <> ''), note VARCHAR(300) NULL, year_of_publication INTEGER NULL CHECK (year_of_publication > 1918 AND year_of_publication < EXTRACT(year FROM current_date)), established DATE DEFAULT current_date NOT NULL, disabled DATE NULL CHECK (disabled > established), isbn CHAR(12) NULL CHECK (check_ISBN(isbn)), url VARCHAR(100) NULL CHECK (url ~* '^(file|http|ftp)://.+') );

Usage of PL/pgSQL functions for designing own operators

PostgreSQL does not support operation of dividing interval by interval. However, it is not problem to overcome this defect and define own operator / for this combination of operand.

omega=# select '1hour'::interval / '10min'::interval; ERROR: operator does not exist: interval / interval CREATE OR REPLACE FUNCTION div_op(a interval, b interval) RETURNS double precision AS $$ BEGIN -- note - better to use SQL language RETURN EXTRACT(EPOCH FROM a) / EXTRACT(EPOCH FROM b); END; $$ LANGUAGE plpgsql IMMUTABLE RETURNS NULL ON NULL INPUT; CREATE OPERATOR / (procedure = div_op, leftarg = interval, rightarg = interval); omega=# select '1hour'::interval / '10min'::interval; ?column? ---------- 6

In CHECK expressions we can use binary operators OR and AND. In standard distribution the operators XOR and implication are missing. Nevertheless it is not a problem this missing operators fill in the system.

CREATE OR REPLACE FUNCTION op_xor (a boolean, b boolean) RETURNS boolean AS $$ BEGIN -- better use SQL language! RETURN ((NOT a) AND b) OR (a AND (NOT b)); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION op_imp_lr (a boolean, b boolean) RETURNS boolean AS $$ BEGIN RETURN (NOT a OR b); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION op_imp_rl (a boolean, b boolean) RETURNS boolean AS $$ BEGIN RETURN (NOT b OR a); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; /* * PL/pgSQL is black box for optimalizer, so much better is using * a SQL language */ CREATE OR REPLACE FUNCTION op_imp(a boolean,b boolean) RETURNS boolean AS $$ SELECT ($1 OR NOT $2) $$ LANGUAGE sql;

Both functions have to be registered and we have to push ashore them some symbol. It performs the order CREATE OPERATOR. We create the name of the operator by combining following characters +-*/<>=~!@#%^&|’?$.

DROP OPERATOR # (boolean, boolean); CREATE OPERATOR # ( procedure = op_xor, leftarg = boolean, rightarg = boolean, commutator = # ); DROP OPERATOR >>> (boolean, boolean); CREATE OPERATOR >>> ( procedure = op_imp_lr, leftarg = boolean, rightarg = boolean, commutator = <<< ); CREATE OPERATOR <<< ( procedure = op_imp_rl, leftarg = boolean, rightarg = boolean, commutator = >>> );

We can try both operators on the table:


Using of this two operators make the CHECK conditions significantly easier. In the table "bug_reports" I require entering user’s id or user’s name. Further I require filling in column err_message if the type of return is error.

DROP TABLE bug_reports; DROP SEQUENCE bug_reports_id_seq; CREATE TABLE bug_reports ( id SERIAL PRIMARY KEY, inserted DATE DEFAULT current_date NOT NULL, inserted_by_not_registrated VARCHAR(60) CHECK (inserted_by_not_registrated <> ''), inserted_by INTEGER REFERENCES users(id) CHECK ( inserted_by_not_registrated IS NOT NULL # inserted_by IS NOT NULL), class_err CHAR(2) NOT NULL REFERENCES class_err_codes(code), error_message TEXT CHECK ( (class_err IN ('ch','zc') >>> error_message IS NOT NULL) AND (error_message <> '')), long_description TEXT NULL CHECK (long_description <> '') ); INSERT INTO bug_reports (inserted_by, tclass_err, long_description) VALUES (1,'po', 'I installed a new version.'); -- ok INSERT INTO bug_reports (inserted_by, inserted_by_not_registrated, class_err, long_description) VALUES (1, 'Pavel Stěhule','po', 'I installed a new version.'); -- exception INSERT INTO bug_reports (inserted_by_not_registrated, class_err, long_description) VALUES ('Pavel Stěhule','po', 'I installed a new version.'); -- ok INSERT INTO bug_reports (inserted_by_not_registrated, class_err, long_description) VALUES ('Pavel Stěhule','ch', 'I installed a new version.'); -- exception

Absence of operators XOR can be solved easily (without necessity of writing PL/pgSQL function).

inserted_by INTEGER REFERENCES Users(id) CHECK ( (inserted_by_not_registrated IS NOT NULL) <> (inserted_by IS NOT NULL))

Usage of PL/pgSQL functions in the design of user aggregation functions.

If we compare the number of bulit-in aggregation functions in PostgreSQL with other RDBMS, we find that many aggregation functions are missing in PostgreSQL. Fortunately it is possible to develop user aggregation functions in PostgreSQL.

Aggregation functions are created by using two functions in PostgreSQL. The first, which is ran for each value, giving a result which is in turn used by the second function which gives the final result. The first example doesn’t need a second function – it creates a list of items, which are separated by commas. By aggregation functions is important to ignore NULL.

CREATE OR REPLACE FUNCTION comma_aggreg (state text, p text) RETURNS text AS $$ BEGIN IF p IS NULL THEN RETURN state; END IF IF length(state) > 0 THEN RETURN state || ', ' || p; ELSE RETURN p; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; DROP AGGREGATE comma(text); CREATE AGGREGATE comma ( basetype = text, sfunc = comma_aggreg, stype = text, initcond = '' ); SELECT comma(name) FROM names;

With a little modification of the aggregation function sum we obtain a "hungry" function – meaning that if the text is already aggregated, it will be ignored. Using this function isn't as fast as other methods, but it is the simplest way to gain the desired result.

CREATE OR REPLACE FUNCTION comma_aggreg(state text,p text) RETURNS text AS $$ BEGIN IF p IS NULL THEN RETURN state; END IF; IF length(state) > 0 THEN IF position(p in state) = 0 THEN RETURN state || ', ' || p; ELSE RETURN state; END IF; ELSE RETURN p; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE;

The table includes the areas of activity in districts and regions. If I wouldn't have used hungry counting of string type variables, the regions would be listed twice (note - or use a DISTINCT aggregate, what is better and faster).

aopk=> select * from activity; area | district | region ---------------+---------------+------ Benešov | Benešov | SČ Benešov | Kutná Hora | SČ Č. Budějovice | Č. Budějovice | JČ (3 line) aopk=> select area, sum(district), comma(DISTINCT region) from activity group by area; area | sum | sum ---------------+---------------------+----- Č. Budějovice | Č. Budějovice | JČ Benešov | Benešov, Kutná Hora | SČ (2 line)

Next is a little bit more complicated aggregation function returning a list of items in HTML format.

CREATE OR REPLACE FUNCTION html_li_aggr (state text,p text) RETURNS text AS $$ BEGIN IF p IS NULL THEN RETURN state; ELSE RETURN state || '<li>' || p || E'</li>\n'; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION html_ul_final (state text) RETURNS text AS $$ BEGIN IF $1 <> '' THEN RETURN E'<ul>\n' || state || '</ul>'; ELSE RETURN ''; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; DROP AGGREGATE html_ul (text); CREATE AGGREGATE html_ul ( basetype = text, sfunc = html_li_aggr, stype = text, initcond = '', finalfunc = html_ul_final );

The result of the query SELECT html_ul(jmeno) FROM jmena is one value (even though more lines of text) which can be inserted into an HTML page.

note - PostgreSQL 8.4 has very fast buildin aggregate function array_agg. This function in combination with function array_to_string can do same work, but sugnificantly faster on bigger dataset.

Design of triggers in PL/pgSQL

Trigger is a procedure, that RDBMS activates before or after executing the orders INSERT, UPDATE a DELETE in some table, to which we assign a trigger before. An arbitary PL/pgSQL procedure without parametres returning a value of the type TRIGGER can be used as a trigger. The data type trigger, is in fact a RECORD data type compatible with the table for which it is used. Two types of triggers exist: BEFORE and AFTER. BEFORE triggers are called before carrying out an operation, AFTER triggers after carrying out an operation.

Let us imagine the situation, when we want to have data divided into two tables, but we only want to work with a unification of these tables. If we use view, we give up the posibility to use referential integrity. Another possibility is to use a third table, which will include values from the first two tables.

CREATE TABLE source1 ( code CHAR(2) PRIMARY KEY, description VARCHAR(100) CHECK (description <> '') ); CREATE TABLE source2 ( code CHAR(2) PRIMARY KEY, description VARCHAR(100) CHECK (description <> '') ); CREATE TABLE object ( code CHAR(2) PRIMARY KEY, description VARCHAR(100) CHECK (description <> ''), source CHAR(1) NOT NULL CHECK (source = '1' OR source = '2') );

I must say that any PL/pgSQL function runs under an implicit transaction, that means if some command in a procedure fails, all changes in the data of executed functions will be annulled. The behaviour of trigger functions is based on this. In the beginning we try to execute the requested function in the final table, if the command fails, it won’t be executed in the source table.


When executing a PL/pgSQL function as a trigger, we can use some bulit-in variables. TG_OP describes the command which triggers the trigger, TG_RELNAME holds the name of the table (in lower-case letters), in which the trigger was ran, NEW includes a line with a new version of the values (only for INSERT and UPDATE), OLD includes a line with an original version of the values (only for DELETE and UPDATE). If a procedure returns NULL, no data will be changed. Trigger only solves the transfer of data from source tables to the final table. If we change the final table directly, it won’t correspond to the source tables. We can set user rights so that only the owner of the table has the right to change the content of the table. Other users can of course view the final table.

Trigger is always ran with the rights of the user who performed the action which triggered it. If we would strip users of the right to modificate the final table, INSERT would fail. Fortunately we have the posibility to change this behaviour with the attribute SECURITY DEFINER. Then the commands in the body of a function will be ran with the function owner’s rights, which in our case is necessary.

We have to assign a function to a trigger and a table with the command CREATE TRIGGER. We can create two basic types: those which are executed before executing the command, which invokes the trigger and those, which are ran after executing the command.


In some cases we want to prevent UPDATE. The updating of some column in a table (mostly columns of seriál type, where UPDATE has no reason). Again, there exists a supplement NO UPDATE, which solves this problem, but which isn’t installed by default. If there aren't many tries to UPDATE, we can use a trigger and a short PL/pgSQL procedure:

DROP SEQUENCE codebook_id_seq; DROP TABLE codebook; CREATE TABLE codebook ( id SERIAL PRIMARY KEY, description VARCHAR(100) CHECK (description <> '') ); CREATE OR REPLACE FUNCTION trig_static_id() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'UPDATE' AND <> THEN RAISE EXCEPTION 'You can not update PRIMARY KEY column on table %', TG_RELNAME; ELSE RETURN NEW; END IF; END; $$ LANGUAGE plpgsql; CREATE TRIGGER static_id_codebook BEFORE UPDATE ON codebook FOR EACH ROW EXECUTE PROCEDURE trig_static_id();

Triggers of the BEFORE type are ran before the evaluation of CHECK conditions of columns and tables. Domain types are evaluated before the activation of BEFORE triggers. Thanks to that, it is possible to fill in some missing values in the functions of triggers and not exceed restrictions. On the other side, in a trigger we can detect values that don’t satisfy CHECK conditions. The solution of this situation isn’t to invoke a exception, but to end the execution of the trigger. RDBMS will find an unsatisfactory value and will invoke an exception. If more triggers are defined over one table, then the triggers are ran in alphabetical order (unlike ANSI SQL, where the triggers are ran in the order in they were defined).

Triggers in PostgreSQL can also have static parameters specified. This enables us to use one procedure in different contexts. Static parameters aren’t accessable through parameters of of the function including the trigger, but are accessible in an external variable TG_ARGV[].

Let us have the requirement to log entries into tables, when the value, which will be written to the log can’t be simply derived from name of table. We need complicated branching.

CREATE TABLE foo_a(i integer); CREATE TABLE foo_b(i integer); CREATE OR REPLACE FUNCTION trig() RETURNS trigger AS $$ BEGIN RAISE NOTICE 'Trigger: %,Insert to table: %,Parameter: %', TG_NAME, TG_RELNAME, TG_ARGV[0]; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER foo_a_trg AFTER INSERT ON foo_a FOR EACH ROW EXECUTE PROCEDURE trig('a'); CREATE TRIGGER foo_b_trg AFTER INSERT ON foo_b FOR EACH ROW EXECUTE PROCEDURE trig('b'); INSERT INTO foo_a VALUES(1); INSERT INTO foo_b VALUES(1); <pre> The result is: <pre> psql:testtr.sql:16: NOTICE: Trigger: foo_a_trg,Insert to table: foo_a,Parameter: a INSERT 336812 1 psql:testtr.sql:17: NOTICE: Trigger: foo_b_trg,Insert to table: foo_b,Parameter: b INSERT 336813 1

Usage PL/pgSQL function with parametres of type table

In some cases we need to run a function for a whole row of a table, for example when we need to replace the repeated term CASE in the command SELECT. Let us have tables of dealers and of sales. The table of sales includes the columns c1, c2, c3, t2, ot, oec1, oec2, oec3, t1, b1, b2, c050, c150, c300, t1n and t2n, which contain the amount of products sold in some tariff and in a given sale. The tariffs are set directively and do not change.

CREATE TABLE dealers ( id SERIAL PRIMARY KEY, PNo VARCHAR(10) NOT NULL UNIQUE CHECK (PNo ~ '^[0-9]{9,10}$'), name VARCHAR(20) NOT NULL CHECK (name <> ''), surname VARCHAR(20) NOT NULL CHECK (surname <> '') ); CREATE TABLE sale ( id SERIAL PRIMARY KEY, dealer int REFERENCES dealers(id), closing_date DATE NOT NULL DEFAULT current_date::DATE, c1 integer NULL CHECK(c1 > 0), c2 integer NULL CHECK(c2 > 0), c3 integer NULL CHECK(c3 > 0), t2 integer NULL CHECK(t2 > 0), ot integer NULL CHECK(ot > 0), oec1 integer NULL CHECK(oec1 > 0), oec2 integer NULL CHECK(oec2 > 0), oec3 integer NULL CHECK(oec3 > 0), t1 integer NULL CHECK(t1 > 0), b1 integer NULL CHECK(b1 > 0), b2 integer NULL CHECK(b2 > 0), c050 integer NULL CHECK(c050 > 0), c150 integer NULL CHECK(c150 > 0), c300 integer NULL CHECK(c300 > 0), t1n integer NULL CHECK(t1n > 0), t2n integer NULL CHECK(t2n > 0) );

The customer will require the list of products sold, but not by tariffs, but by the final amount of products divided into classes by volume of each sale. The classes are:

0 - 4, 5 - 9, 10 - 19, 20 - 49, over 50

Then the customer can determine the size of a sale by each dealer and also his orientation for a certain segment of the market. The list can be created either by a selection using the CASE command (below) or by a selection with a PL/pgSQL function, which has one line of the table as an argument.

CREATE OR REPLACE FUNCTION c_sale_seg (rec sale, l int,h int) RETURNS integer AS $$ DECLARE s INTEGER; BEGIN s := coalesce(rec.c1,0) + coalesce(rec.c2,0) + coalesce(rec.c3,0) + coalesce(rec.t2,0) + coalesce(rec.oec1,0) + coalesce(rec.oec2,0) + coalesce(rec.oec3,0) + coalesce(rec.t1,0) + coalesce(rec.b1,0) + coalesce(rec.b2,0) + coalesce(rec.c150,0) + coalesce(rec.c300,0) + coalesce(rec.t1n,0) + coalesce(rec.t2n,0) + coalesce(rec.ot,0); IF l IS NOT NULL THEN IF s < l THEN RETURN 0; END IF; END IF; IF h IS NOT NULL THEN IF s > h THEN RETURN 0; END IF; END IF; RETURN s; END; $$ LANGUAGE plpgsql;

The usage of these functions drastically simplifies the design. The function c_prodej_seg sums the sale in all tariffs. If the volume of a sale is in an interval determined by the second and third arguments, it returns this value, in any other case it returns null. Notice that the first argument of the function is of the type table. In the function itself, we work with it as if it would be tabulka%ROW.

INSERT INTO dealers (PNo, name, surname) VALUES ('7307150000','Pavel', 'Stěhule'); INSERT INTO dealers (PNo, name, surname) VALUES ('7807150000','Zdeněk', 'Stěhule'); INSERT INTO sale (dealer,c1,c2) VALUES (1,18,11); INSERT INTO sale (dealer,c1,c2) VALUES (1,18,6); INSERT INTO sale (dealer,c3,c2) VALUES (1,50,24); INSERT INTO sale (dealer,t1,c3) VALUES (1,1,1); INSERT INTO sale (dealer,c300) VALUES (2,10); INSERT INTO sale (dealer,c1,c2) VALUES (2,11,2); INSERT INTO sale (dealer,c1,c2) VALUES (2,1,6); INSERT INTO sale (dealer,c3,c2) VALUES (2,5,12); INSERT INTO sale (dealer,t1,ot) VALUES (2,1,1); INSERT INTO sale (dealer,c300) VALUES (2,252);

It returns this query:

SELECT SUM(c_sale_seg(sale,NULL, 4)) AS lt5, SUM(c_sale_seg(sale,5, 9)) AS be5a9, SUM(c_sale_seg(sale,10, 19)) AS be10a19, SUM(c_sale_seg(sale,20, 49)) AS be20a49, SUM(c_sale_seg(sale,50, NULL)) AS ge50, name || ' ' || surname as dealer from sale join dealers on dealer = group by name || ' ' || surname ;


lt5 | be5a9 | be10a19 | be20a49 | ge50 | dealer -----+-------+---------+---------+------+---------------- 2 | 0 | 0 | 53 | 74 | Pavel Stěhule 2 | 7 | 40 | 0 | 252 | Zdeněk Stěhule

Functions that return tables

Individual lines are created by repeatedly calling RETURN NEXT value (this version of RETURN doesn’t end the function) and calling RETURN without parameters, with that we end the execution of a function. As a come-back type we can use scalar types, table types or composite types (these are more or less equivalent to the table type (we create them using the command CREATE TYPE)). If we don’t want to define a new type, we can use OUT parametres (RETURNS SETOF RECORD).

CREATE TYPE tf AS (f1 varchar(10), f2 varchar(10)); CREATE OR REPLACE FUNCTION makesettf(mx integer) RETURNS SETOF tf AS $$ DECLARE f tf%ROWTYPE; BEGIN FOR i IN LOOP f.f1 := CAST(i AS varchar(10)); f.f2 := 'bbbbb '||CAST(i AS varchar(10)); RAISE NOTICE '%', f.f1; RETURN NEXT f; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; SELECT a.*, b.* FROM makesettf(10) a JOIN makesettf(5) b ON a.f1 = b.f1; SELECT * FROM makesettf(3) UNION ALL SELECT * FROM makesettf(8);

A more practical example, inpired by the UDF of Firebird (which returned a table with the dates of german beer festivals), is to use SRF in the design of functions for the determination of a day by adding n workdays to a given day.

SRF functions are good for generating smaller dynamic (parametrical) tables, because it isn’t possible to index tables which are created by them. This restriction is easily forgiven during testing. Custom functions AddWDays and AddWDays2 were tested by copmaring the values (each function is implemented in a different way, but they must give the same results) over a virtual table created by a union of tables produced by SRF functions.

The function DaysInYear returns a table of days in the given year, the function Seq returns a sequence of integers restricted by a parameter. The function AddWDays and AddWDays2 has two parameters – the first day, to which work days are added, the number of which is determined by the second parameter). The function DiffWDays returns the numer of work days in an interval defined by parameters. If the functions are correctly designed, then the result of test queries must be an empty table.

CREATE OR REPLACE FUNCTION DaysInYear(y integer) RETURNS SETOF date AS $$ DECLARE st date; BEGIN st := to_date('0101'||y,'MMDDYYYY'); WHILE EXTRACT(year FROM st) < y + 1 LOOP RETURN NEXT st; st := st + 1; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; -- testy SELECT DaysInYear, generate_series FROM DaysInYear(2003) CROSS JOIN generate_series(1,40) WHERE AddWDays(DaysInYear, generate_series) <> AddWDays2(DaysInYear, generate_series); SELECT DaysInYear, generate_series FROM DaysInYear(2003) CROSS JOIN generate_series(1,40) WHERE DiffWDays(DaysInYear, AddWDays(DaysInYear, generate_series)) <> generate_series;

We have few national holidays, but they are on different dates each year (because of easter) , so it’s good to use SRF for generating the table. If preformance problems would occur, we could materialize the table, that means to save into a normal table with the command SELECT INTO. The function StátníSvátky returns a table with national holidays, including the name for the range given by the first and second parameters. The function PočetStatníchSvátků counts the number of national holidays and discards the days given as the third parameter (mostly Saturday and Sunday)). Intristing is perhaps only the way of entering the comparison value with the array contents = ANY().

CREATE OR REPLACE FUNCTION NationalHolidays(start date, finish date, OUT time date, OUT name varchar(50)) RETURNS SETOF RECORD AS $$ DECLARE holiday VARCHAR [][] DEFAULT ARRAY [['New year', '0101'], ['Easter sunday', 'xx'], ['Easter monday','x1'], ['Holiday of work','0501''], ['Holiday of freedom',''0508''], ['The day of Cyrila and Metoděj','0705'], ['The day of burning master Jan Hus','0706'], ['The day of czech statehood','0928'], ['The day of beginning of independent Czechoslovak state', '1028'], ['The day of fight to freedom and democracy','1117'], ['Christmas eve','1224'], ['1. holiday of Christmas','1225'], ['2. holiday of Christmas','1226']]; year integer; stepd date; d varchar; BEGIN stepd := start; WHILE stepd <= finish LOOP year := CAST(EXTRACT(year FROM stepd) AS INTEGER); FOR i IN array_lower(holiday,1)..array_upper(holiday,1) LOOP d := holiday[i][2]; IF d = 'xx' THEN time := Easter_sunday(year); ELSIF d = 'x1' THEN time := Easter_sunday(year) + 1; ELSE time := to_date(d||year,'MMDDYYYY'); END IF; IF time BETWEEN stepd AND finish THEN name := holiday[i][1]; RETURN NEXT; END IF; END LOOP; stepd := date_trunc('year', stepd) + interval '1 year'; END LOOP; RETURN; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION NumberOfNationalHolidays(start date, finish date, nwd INTEGER[]) RETURNS INTEGER AS $$ DECLARE holiday VARCHAR [] DEFAULT ARRAY ['0101', 'xx', 'x1', '0501', '0508', '0705', '0706', '0928', '1028', '1117', '1224', '1225', '1226']; year INTEGER; stepd date; d varchar; hol date; pss INTEGER DEFAULT 0; OM INTEGER; BEGIN stepd := start; WHILE stepd <= finish LOOP year := CAST(EXTRACT(year FROM stepd) AS INTEGER); FOR i IN array_lower(holiday,1)..array_upper(holiday,1) LOOP d := holiday[i]; IF d = 'xx' THEN hol := Easter_sunday(year); ELSIF d = 'x1' THEN hol := Easter_sunday(year) + 1; ELSE hol := to_date(d||year,'MMDDYYYY'); END IF; IF NOT EXTRACT(dow FROM hol) = ANY (nwd) THEN IF hol BETWEEN stepd AND finish THEN pss := pss + 1; END IF; END IF; END LOOP; stepd := date_trunc('year', stepd) + interval '1 year'; END LOOP; RETURN pss; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

The function AddWDays2 is simpler and for a smaller number of days it is probably also faster. I save an array of holidays for one year on into the variable svatky, to which I add work days. The function StatniSvatky returns a table and so it is necessary to transfer it to a constructions array ARRAY(SELECT FROM). Then I decrease the number of work days for each work day and I go on.

CREATE OR REPLACE FUNCTION AddWDaysSlow2(start date, c integer) RETURNS date AS $$ DECLARE st date = start; ed date; holidays date[]; d integer = c; k integer; BEGIN ed := st + 365; SELECT INTO holidays ARRAY(SELECT time FROM NationalHolidays(st, ed)); LOOP k := CAST(EXTRACT(DOW FROM st) AS INTEGER); WHILE k = 6 OR k = 0 OR st = ANY(holidays) LOOP st := st + 1; k := CAST(EXTRACT(DOW FROM st) AS INTEGER); IF st > ed THEN ed = st + 365; SELECT INTO holidays ARRAY(SELECT time FROM NationalHolidays(st, ed)); END IF; END LOOP; EXIT WHEN d = 0 OR d = 1; st := st + 1; d := d - 1; END LOOP; RETURN st; END; $$ LANGUAGE plpgsql;

The second (the first) function AddWDays is more complicated and faster for longer intervals. It uses the function AddWDay, which adds x days for weekends to the number of work days. Then it repeatedly checks for the need of corections for national holidays. If a parameter of AddWDay is Saturday or Sunday, it is moved to Monday. After that I go back to Sunday and increase the number of work days for the amount of days I've gone back. Then I can simply transfer work days to real days.

CREATE OR REPLACE FUNCTION AddWDays(start date, c integer) RETURNS date AS $$ DECLARE st date = start; ed DATE; corection INTEGER; hol INTEGER; d INTEGER; BEGIN ed := AddWDay(st,c); corection = NumberOfNationalHolidays(st, ed, ARRAY[6,0]); WHILE corection > 0 LOOP st := ed + 1; ed := AddWDay(st, corection); corection = NumberOfNationalHolidays(st, ed, ARRAY[6,0]); END LOOP; RETURN FirstWDay(ed); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION AddWDay(d date, c integer) RETURNS date AS $$ DECLARE st DATE; wd INTEGER; k INTEGER; BEGIN k := CAST(EXTRACT(DOW FROM d) AS INTEGER); IF k = 0 THEN st := d + 1; k := 1; ELSIF k = 6 THEN st := d + 2; k := 1; ELSE st := d; END IF; st := st - k; wd := c + k - 2; st := st + ((wd) / 5) * 7 + ((wd) % 5) + 1; RETURN st; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

If the first parameter of the function DiffWDays is Saturday or Sunday I move the beginning to Monday, if the second parameter is Saturday or Sunday I transfer the end to Friday. If the beginning is after the end I have to increase number of weeks. In the given interval I subtract the number of national holidays. The function ISODOW returns an ISO number of the day (Saturday 6, Sunday 7).

CREATE OR REPLACE FUNCTION ISODOW(date) RETURNS INTEGER AS $$ SELECT CAST(EXTRACT(DOW FROM $1 - 1) + 1 AS INTEGER); $$ LANGUAGE sql STRICT; CREATE OR REPLACE FUNCTION DiffWDays(d1 date, d2 date) RETURNS integer AS $$ DECLARE wc INTEGER; st DATE; ED DATE; p INTEGER; BEGIN p := ISODOW(d1); IF p > 5 THEN st := d1 + (8 - p); ELSE st := d1; END IF; p := ISODOW(d2); IF p > 5 THEN ed := d2 - (p - 5); ELSE ed := d2; END IF; wc := (d2 - d1) / 7; IF EXTRACT(DOW FROM st - 1) + 1 > EXTRACT(DOW FROM ed - 1) + 1 THEN wc := wc + 1; END IF; RETURN (d2 - d1 - (wc * 2) - NumberOfNationalHolidays(st,ed, ARRAY[6,7]) + 1); END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;

Recursive call of SRF functions

Every SRF functions call has its own so-called context where is set of current returned values stored. This feature will appear only when recursive calling of SRF function, when having incorrectly designed function we think we are loosing the results. The example of incorrectly designed function is:


False supposition is that transfer of values of nested function is caused by select into rec * from foo() call. The correct form of function is this:


PostgreSQL 8.4 support Common Table Expression - recursive SELECTs. We can use it without recursive SRF functions:

WITH RECURSIVE x AS (SELECT * FROM foo WHERE uid = 3 -- find a root UNION ALL SELECT foo.* FROM foo JOIN x ON x.uid = -- append rows where pid is same as some searched uid SELECT * FROM x; -- show content of stacked table x without changes

The proposal of polymorphic functions, polymorphic functions usage

Polymorphic functions use some of next polymorphic types: anyarray, anyelement during the definition of returned type or argument.

If the function returns the polymorphic type then at least one argument from arguments of function must be of polymorphic type. The polymorphic type is dynamically determined according to the current type of the argument. If the function contains more parameters of anyelement type the real parameters must be of the same type. The same requirement holds for array.

CREATE OR REPLACE FUNCTION polyfoo1(a anyelement) RETURNS anyelement AS $$ BEGIN RETURN a + 1; END; $$ LANGUAGE plpgsql; testdb011=> select polyfoo1(date '1.1.2004'), polyfoo1(10); polyfoo1 | polyfoo1 ------------+---------- 2004-01-02 | 11 (1 row) CREATE OR REPLACE FUNCTION polyfoo2(a anyelement, b integer) RETURNS anyelement AS $$ BEGIN RETURN a - b; END; $$ LANGUAGE plpgsql; pavel@postgres:5432=# select polyfoo2(date '1.1.2004',100); polyfoo2 ------------ 2003-09-23 (1 row) pavel@postgres:5432=# select polyfoo2(200,100); polyfoo2 ---------- 100 (1 row)

The body of the aggregation function whose result is array of values of subset is example of using of polymorphic function.

CREATE OR REPLACE FUNCTION aggregate_array(state ANYARRAY, a ANYELEMENT) RETURNS ANYARRAY AS $$ SELECT CASE WHEN state IS NULL THEN ARRAY[a] WHEN a IS NULL THEN state ELSE array_append(state,a) END; $$ LANGUAGE sql; testdb011=> SELECT count(*), aggarray(surname) FROM people GROUP BY surname ~ '.*á'; count | aggarray -------+--------------------------------------------- 1 | {Gregorová} 6 | {Kolík,Čurda,Hojka,Chytlík,Stěhule,Novotný} (2 rows)

Sometimes polymorphic functions are referred to as generic functions. Internally, PostgreSQL hasn't got any date type ANYELEMENT. Every time the activation of any function is required PostgreSQL always investigates whether a ready code for required function exists (it tests harmony with number and type of arguments). If the function hasn't been started yet PostgreSQL searches for its source code and prepares it for its first start. If PostgreSQL don't find harmony with polymorphic function then it replaces unknown types by required types and prepares function with parameters, which are already all determined – standard mechanism is used.

If we need to identify variant of polymorphic function we can use IS OF operator (or function pg_typeof).

IF a IS OF (INTEGER) THEN RAISE NOTICE 'Parameter a is integer'; ELSIF a IS OF (varchar) THEN RAISE NOTICE 'Parameter a is string'; END IF;

PL/pgSQL function with (IN) OUT parameters

PL/pgSQL is specific because of its support only to the proposals of function, not procedures. The method of transfer of the final value is the main different between procedure and function. Classically, procedure returns result by variables, which are transfer by reference. The problem is that PostgreSQL don´t support variables, except PL/pgSQL. In older versions of PostgreSQL we had to use complex type, when we needed to obtain more then one value from the code. The current versions of PostgreSQL support OUT variables so we needn't declare a special out type.

CREATE OR REPLACE FUNCTION foo(IN a int, IN b int, OUT aa int, OUT bb int) AS $$ BEGIN aa := a*a; bb := b*b; END $$ SELECT foo(10,10);

News in PostgreSQL 8.3

In this version we can simply return complete result of SQL statement by using a RETURN QUERY statement. It is replacement of repeated call RETURN NEXT statement. RETURN QUERY like RETURN NEXT doesn't ending function execution.

CREATE OR REPLACE FUNCTION dirty_series(m integer, n integer) RETURNS SETOF integer AS $$ BEGIN RETURN QUERY SELECT * FROM generate_series(1,m) g(i) WHERE i % n = 0; RETURN; END; $$ LANGUAGE plpgsq;

In this version you can assigng to function a specific values of system variables. This behave is similar to T-SQL or MySQL, where current settings (in function registration time) is stored together with procedure. Typical use case is change of search_path variable for SECURITY DEFINER. Syntax is clean from example:

CREATE FUNCTION report_guc(text) RETURNS TEXT AS $$ SELECT current_setting($1) $$ LANGUAGE sql SET regex_flavor = basic; ALTER FUNCTION report_guc(text) RESET search_path SET regex_flavor = extended;

News in PostgreSQL 8.4

Till this time any dynamic SQL was some risk of SQL injection. Readability of code with dynamic SQL wasn't well too. This problems can be removed with using a USING clause for EXECUTE statement. We are able to parametrize dynamic SQL via this clause similary like PREPARED statement does.

unsecure dynamic SQL:

EXECUTE 'SELECT * FROM ' || tabname || ' WHERE value = \'' || value || '\'';

secure dynamic SQL:

EXECUTE 'SELECT * FROM ' || tabname::regclass || ' WHERE value = $1' USING value;

Dynamic SQL can be more secure and more readable now - with less work. But still - use it carefully and as few as possible (for example - when we need generate execution plan with knowledge actual parameters).

CASE statement

The syntax of this statement coming from SQL/PSM language - it is standard language for stored procedures (used in DB2 or MySQL):

/* simple case */ CASE x WHEN 1, 2 THEN msg := 'one or two'; ELSE msg := 'other value than one or two'; END CASE; /* search case */ CASE WHEN x BETWEEN 0 AND 10 THEN msg := 'value is between zero and ten'; WHEN x BETWEEN 11 AND 20 THEN msg := 'value is between eleven and twenty'; END CASE;

Customized exceptions

PostgreSQL allows to pass on lot of informations in any exception. This functionality wasn't accessible from PL/pgSQL RAISE statement. RAISE statement allowed to enter only message text. All other fields of exception was default. This limit is history now. Syntax of RAISE statement was enhanced - so developer can specify SQLSTATE, ERRCODE, MESSAGE, DETAIL, HINT:

RAISE division_by_zero; RAISE SQLSTATE '22012'; RAISE EXCEPTION 'Nonexistent ID --> %', user_id USING HINT = 'Please check your user id'; RAISE 'Duplicate user ID: %', user_id USING ERRCODE = 'unique_violation'; RAISE 'Duplicate user ID: %', user_id USING ERRCODE = '23505';

We can catch custom exception now. It wasn't possible in older versions:


Variadic functions

Variadic functions are functions with mutable number of parameters. PostgreSQL contains some of buildin variadic functions - least,greatest,coalesce. Internal implementation isn't too difficult, but it requires changes inside PostgreSQL parser. The workaround on user level was using a function overloading - but it isn't nice solution too - it can duplicate code. Final solution is a variadic function support. All used parameters are stored to array and this array is passed as variadic parameter to function:

CREATE FUNCTION myleast(VARIADIC a numeric[]) RETURNS NUMERIC AS $$ SELECT min($1[i]) FROM generate_subscripts($1,1) g(i) $$ LANGUAGE SQL IMMUTABLE; CREATE FUNCTION Time: 5,274 ms postgres=# SELECT myleast(1,2,3,4); myleast --------- 1 (1 row)

Sometimes we have to call variadic function like normal function. Then we can use a VARIADIC modificator:

postgres=# SELECT myleast(VARIADIC ARRAY[1,3,4,-5,6,8]); myleast --------- -5

Table functions

ANSI SQL knows functions that returns table. PostgreSQL support this kind of functions too - PostgreSQL use little bit different syntax and different name - Set Returned Functions. PostgreSQL 8.4 knows ANSI SQL syntax now:

--ANSI SQL create or replace function tab(i int) returns table(a int, b int) as $$ select i, i+1 from generate_series(1, $1) g(i); $$ language sql immutable strict; postgres=# select * from tab(2); a | b ---+--- 1 | 2 2 | 3 (2 rows) -- PostgreSQL syntax create or replace function tab(in int, out a int, out b int) returns setof record as $$ select i, i+1 from generate_series(1, $1) g(i); $$ language sql immutable strict;

Default values for function parameters

Possibility to define defaults for function parameters increase developer's komfort and decrease code duplication. Implementation is similar to Firebird - missing parameters are fill in from right by default values.

postgres=# CREATE OR REPLACE FUNCTION instr(varchar, varchar, int = 1) RETURNS int AS $$ SELECT position($2 IN substring($1 FROM $3)); $$ LANGUAGE sql; CREATE FUNCTION Time: 2,738 ms postgres=# SELECT instr('abcabc','a'); instr ------- 1 (1 row) Time: 1,907 ms postgres=# SELECT instr('abcabc','a',2); instr ------- 3 (1 row) Time: 1,381 ms


I was pleasantly surprised by success of RETURN QUERY. I wasn't expect so this enhancing can be commited, and implementation was really simple - a few lines in parser. 8.4 RETURN QUERY has support for dynamic SQL - RETURN QUERY EXECUTE:


can be replaced by:


News in 9.0

The biggest change of PL/pgSQL in 9.0 is parser refactoring. Now PL/pgSQL uses true SQL parser for integrated SQL. Other changes are plus/minus small.

SQL and PL/pgSQL identifier collision modes

Very significant minus of PL/pgSQL syntactic check was impossibility to find collision between SQL and PL/pgSQL identifiers. Code with ambiguous identifiers doesn't work well usually, but there wasn't any syntactic or runtime error. Now, PL/pgSQL parser can check uniqueness of identifiers inside block:

CREATE OR REPLACE FUNCTION foo() RETURNS void AS $$ DECLARE a integer; BEGIN FOR a IN SELECT a FROM omega LOOP RAISE NOTICE '%', a; END LOOP; END; $$ LANGUAGE plpgsql; CREATE FUNCTION Time: 3,501 ms postgres=# select foo(); ERROR: column reference "a" is ambiguous LINE 1: select a from omega ^ DETAIL: It could refer to either a PL/pgSQL variable or a table column. QUERY: select a from omega CONTEXT: PL/pgSQL function "foo" line 3 at FOR over SELECT rows

Position and mixed notation for passing function parameters

Usually values are passed to parameter variables per position. There are two next manner - named and mixed notation. PostgreSQL use following syntax for named notation - parameter_name := value (note - this syntax isn't final - probably will be defined little bit different (ANSI SQL conform) parameter_name => value (this syntax cannot be implemented now, because module hstore uses operator =>):

CREATE OR REPLACE FUNCTION foo(a varchar, b varchar) RETURNS void AS $$ BEGIN RAISE NOTICE 'A:% B:%', a, b; RETURN; END; $$ LANGUAGE plpgsql; SELECT foo('Hello','World'); SELECT foo('Hello', b := 'World'); -- mixed notation SELECT foo(b := 'World', a := 'Hello'); -- named notation

Other changes

  • IN function parameters are writable now
  • Language PL/pgSQL is available immediately after installation.

New in 9.1


FOREACH cycle helps to us with iteration over array. SLICE clause can be used for iteration over multidimensional arrays. A iteration over very large arrays can be significantly faste with FOREACH statement than with FOR statement.

CREATE OR REPLACE FUNCTION public.ft() RETURNS void LANGUAGE plpgsql AS $function$ DECLARE _a int; _b int; _xx foo[]; BEGIN _xx := (SELECT ARRAY(SELECT (a,b) FROM foo)); FOREACH _a, _b IN ARRAY _xx LOOP RAISE NOTICE 'a: %, b: %', _a, _b; END LOOP; END; $function$; postgres=# SELECT ft(); NOTICE: a: 10,b: 20 NOTICE: a: 30,b: 40 ft ──── (1 row) CREATE OR REPLACE FUNCTION public.ft() RETURNS void LANGUAGE plpgsql AS $function$ DECLARE _a int[]; _xx int[] := ARRAY[[1,2,3], [4,5,6]]; BEGIN FOREACH _a SLICE 1 IN ARRAY _xx LOOP RAISE NOTICE '%', _a; END LOOP; END; $function$; postgres=# SELECT ft(); NOTICE: {1,2,3} NOTICE: {4,5,6} ft ──── (1 row)

New in 9.2


When I worked on some project, I needed catch exception, write it to log, and forward it. It was impossible in previous versions, because there was a acccess to SQLERRM and SQLCODE variables only. Now we have access to almost complete data from exception (stack of calls are not accessible) due using GET STACKED DIAGNOSTICS.

create or replace function stacked_diagnostics_test() returns void as $$ declare _detail text; _hint text; _message text; begin perform raise_test(); exception when others then get stacked diagnostics _message = message_text, _detail = pg_exception_detail, _hint = pg_exception_hint; raise notice 'message: %, detail: %, hint: %', _message, _detail, _hint; end; $$ language plpgsql; select stacked_diagnostics_test();

New in 9.3

a expression with result of composite type can be used in RETURN and RETURN NEXT statements.

Access to exception info additional fields

Now, GET STACKED DIAGNOSTICS statement can be used for access to fields:: COLUMN_NAME, TABLE_NAME, SCHEMA_NAME, CONSTRAINT_NAME and PG_DATATYPE_NAME. A new related fields can be used in RAISE statement too: COLUMN, TABLE, SCHEMA, CONSTRAINT a DATATYPE.

CREATE OR REPLACE FUNCTION stacked_diagnostics_test() RETURNS void AS $$ DECLARE _column_name text; _constraint_name text; _datatype_name text; _table_name text; _schema_name text; BEGIN RAISE EXCEPTION USING column = '>>some column name<<', constraint = '>>some constraint name<<', datatype = '>>some datatype name<<', table = '>>some table name<<', schema = '>>some schema name<<'; EXCEPTION WHEN OTHERS THEN GET STACKED DIAGNOSTICS _column_name = column_name, _constraint_name = constraint_name, _datatype_name = pg_datatype_name, _table_name = table_name, _schema_name = schema_name; RAISE NOTICE 'column %, constraint %, type %, table %, schema %', _column_name, _constraint_name, _datatype_name, _table_name, _schema_name; END; $$ LANGUAGE plpgsql;

New in 9.4

Checking call stack everywhere

In previous version you can check call stack in exception handler with GET STACKED DIAGNOSTICS var = PG_EXCEPTION_CONTEXT statement. In 9.4 you can get call stack everywhere with GET DIAGNOSTICS statement:

CREATE OR REPLACE FUNCTION public.inner_func() RETURNS integer AS $$ DECLARE stack text; BEGIN GET DIAGNOSTICS stack = PG_CONTEXT; RAISE NOTICE e'--- Call Stack ---\n%', stack; RETURN 1; END; $$ LANGUAGE plpgsql;

New in 9.5

Significant internal change is reimplementation of type casting.

ASSERT statement

ASSERT statements ensures function's preconditions. It raises unhandled exception if first expression is not the true. Second optional argument can to specify message attached to raised exception.

postgres=# DO $$BEGIN ASSERT false; END;$$; ERROR: assertion failed CONTEXT: PL/pgSQL function inline_code_block line 1 at ASSERT postgres=# DO $$BEGIN ASSERT false, 'some is wrong'; END;$$; ERROR: some is wrong CONTEXT: PL/pgSQL function inline_code_block line 1 at ASSERT

Recommendation for design of stored procedures in PL/pqSQL language

You can find a number of recommendation for writing of the stored procedures in PL/SQL by Oracle Company on the Internet. Implementation of PL/pgSQL is different and therefore not every recommendation is relevant to PostgreSQL. You can write unreadable and not easy-to-maintain, and eventually ineffective, code in PL as well as in other programming languages. If you will keep following recommendations you can decrease the risk of unsuitable code:

  • Use a programmer editor and code save to file. Using a files is better for code comments, better for code organization
  • Be careful of conflict of local variables and database objects:
  1. use prefix of variables (e.g. for local variables use “_” symbol)
  2. use qualified attributes (table. column) at all SQL command in procedures
  • Declare variables by derived type - %TYPE and %ROWTYPE
  • At PL use native SQL everywhere that is possible and wise. It’s possible that PL code may be quicker than complicated request. Try to not use dynamic SQL. Think that you can replace cycle in function by classic SELECT containing construction CASE.
-- bad code FOR _c1, _c2 IN SELECT c1,c2 FROM tab1 LOOP IF _c1 > 20 THEN INSERT INTO tab2 VALUES(20,_c2); ELSE INSERT INTO tab3 VALUES(_c1,_c2); END IF; END LOOP; -- well code INSERT INTO tab2 SELECT 20,c2 FROM tab1 WHERE c1 > 20; INSERT INTO tab3 SELECT c1,c2 FROM tab1 WHERE c1 <= 20;
  • The function should contain only one RETURN command – one way in, one way out.
  • Duplicate code shouldn’t appaer in aplication – don’t write redundant code.
  • Use ASSERT function, eventually its modification:
CREATE OR REPLACE FUNCTION Assert(bool, varchar) RETURNS void AS $$ BEGIN IF NOT $1 OR $1 IS NULL THEN IF $2 IS NOT NULL THEN RAISE EXCEPTION 'Assert failure: %', $2; END IF; RAISE NOTICE 'Assert. Message is null'; END IF; END;$$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION Assert_IsNotNull(anyelement, varchar) RETURNS void AS $$ BEGIN PERFORM Assert($1 IS NOT NULL, $2); END; $$ LANGUAGE plpgsql;
  • Make uderstood on list of errors defined by user beforehand. Follow the put-up notation for text of exceptions.
  • Don’t test NULL variable that is declared as NOT NULL. It’s on PL to find out that this variable won’t contain NULL. Don’t test content of variables by forced command for location of fruitfulness of command, but test content of logical system variable FOUND.
  • Use label <<label>> for loops and blocks, use Boolean type so that your code will be the most readable.
  • Don’t rely on automatic converting of date and timestamp type, tht depand on configuration. Use function to_char and to_date.
  • Don't use IF to fill logic variable:
is_ok := _age > 18;
  • Use every variable just in one meaning. Don't reuse variables.
  • Starting category of function is VOLATILE. If you don't use tables and functions like random() or currval(), use IMMUTABLE or STABLE categories. Process time of function may be half.
  • Don't enclose SQL commands to simply functions uselessly.
  • Try to reduce cursors and temporary tables.
  • Prefer standardized function against your own function.
  • In triggers don't correct data “secretly”, consider whether you can use CHECK.
  • Each procedure should contain up to 50 or 60 lines.
  • Use unit tests (for checking error identification, for checking propriety of corrections).
  • Enclose the call of exception to own procedure.
  • Use only one of two possible method (exception or return error code) of error indication in our code.
  • Minimalise repeated assign statement to variable (use a COALESCE function).
--bad DECLARE v varchar; BEGIN v := 'a'; v := v || 'b'; v := v || 'c'; RETURN v; END; --well BEGIN RETURN 'a' || 'b' || 'c'; END; -- bad DECLARE s varchar := ''; BEGIN IF x1 IS NULL THEN s := s || 'NULL,' ELSE s := s || x1; END IF; IF x2 IS NULL THEN s := s || 'NULL, ' ELSE s := s || x2; END IF; ... -- well DECLARE s varchar; BEGIN s := COALESCE(x1 || ',', 'NULL,') || COALESCE(x2 || ',', 'NULL,') ...
  • Use a SQL language for simple functions instead PL/pgSQL:
--bad CREATE OR REPLACE FUNCTION last_day(IN d date) RETURNS date AS $$ BEGIN RETURN CAST(date_trunc('month',current_date + interval '1month') AS date) - 1; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT; --well CREATE OR REPLACE FUNCTION last_day(IN d date) RETURNS date AS $$ SELECT CAST(date_trunc('month', $1 + interval '1month') AS date) - 1; $$ LANGUAGE sql;

Overview of the generators¶

One can assign a variable (SET) with a random data generators. This means each time the variable will be used it will have a different value.

However the variable is still used as usual:


Sequence and seeding¶

Common parameters for data generators are sequence and seed.

sequence means that a sequence of values is generated in a random order, in other words each value appears only once before the sequence starts again: this is useful for columns with a constraint. For example, this generator:

It can generate such values:

Where each number appears once before the sequence starts repeating.

sequence parameter must be an integer: if it is 0 then no sequence is generated (default) and if something other than 0 then generate a sequence.

seed is an integer value for initializing a generator: two generators with the same parameters and the same seed will generate exactly the same values.

seed must be an integer: it is used directly to initialize the random data generator.


Data generators¶

Optional parameters are put into brackets:

Integer numbers¶

is an integer, is an integer, is an integer and is an integer.


Real numbers¶

is a number, is a number, is an integer that indicates the number of decimals (should be less than 30), is an integer and is an integer.



is a string representing a date, is a string representing a date, is an integer and is an integer.



is a string representing a time, is a string representing a time, is an integer and is an integer.


Timestamps (date/times)¶

is a string representing a timestamp, is a string representing a timestamp, is an integer and is an integer.

DATETIME(min,max,[sequence],[seed]);DATETIME('2008-05-01 14:00:00','2008-05-05 15:00:00',1);


is an integer representing the minimum length of a word, is an integer representing the maximum length of a word, is an integer representing the number of words (default: ) and is an integer.

In the above example we generate 5 words (separated with a space) whose size is between 10 and 20 characters.


Strings from regular expressions¶

is a string representing a simplified regular expressions and is an integer.

Simplified regular expressions are composed of:

  • Sets of possible characters like for characters between and + and
  • Single characters

It is possible to specify the minimum and maximum length of the preceding set or single character:

  • like which stands for length between and
  • like which stands for length of
  • Default (when nothing is specified) is length of

Note: be careful with spaces because means one followed by three spaces because the is about the last character or set of characters which is a space in this example.

If you need to use or , they must be escaped because they are special characters. Remember to use double backslash: for three .


Strings from dictionary files¶

is a string representing the path to a text file, is an integer, is an integer and is a string representing the file character set (default is system encoding).

This generates a random integer between 1 and the number of lines in the file and then returns that line. If the file does not exist then an exception is thrown.

supports the most known encoding like utf-8, utf-16le, utf-16be, iso-8859-1, ...


Reference to another field¶

is a string representing a table, is a string representing a column of the table, is an integer and is an integer.

This is useful for generating data to put into foreign-key-constrained columns.



Leave a Reply

Your email address will not be published. Required fields are marked *