Notes about data types in relational database systems

Photo by Mika Baumeister on Unsplash

In this article we will go over the details of some of the most used data types in relational database systems and the factors that go into deciding which type to use for each situation.

Let’s start with the types as defined in the SQL standard.

Data types in standard SQL

CHAR(N) data type

In standard SQL, the CHAR(N) datatype represents a string of fixed length. All stored values occupy N characters on the database and are filled with spaces if a smaller value is inserted. CHAR(N) may be more cache-friendly than VARCHAR(N) because the DBMS can store it in-record, but this depends on the DBMS (Oracle for intance stores CHAR(N) outside of the record exactly the same as a VARCHAR(N), so there’s no performance gain).

VARCHAR(N) data type

In standard SQL, the VARCHAR(N) data type represents a string of variable length with a maximum length of N characters. If a value is smaller than N characters, only the used characters are stored. Values larger than N characters cannot be stored. DBMSs usually implement this datatype by having the record hold a pointer to the value which is stored separately.

Exact numeric data types

Exact numeric types hold numeric values without digits after the decimal or with a firm number of digits after the decimal. All exact numeric types are signed.

NUMERIC(<p>,<s>) and DECIMAL(<p>,<s>) denotes two types which are nearly the same. <p> (precision) defines a fixed number of all digits within the type and <s> (scale) defines how many of those digits follow the decimal place. Numeric values with more than (p – s) digits before the decimal place cannot be stored and numeric values with more than s digits after the decimal place are truncated to s digits after the decimal place. p and s are optional. It must always be: p ≥ s ≥ 0 and p > 0.

SMALLINT, INTEGER and BIGINT denote data types without a decimal place. The SQL standard did not define their size, but the size of SMALLINT shall be smaller than the size of INTEGER and the size of INTEGER shall be smaller than the size of BIGINT.

Approximate numeric

Approximate numeric types hold numeric values in floating-point format. All approximate numeric types are signed. Their primary use cases are scientific computations.

There are three types: FLOAT (<p>), REAL and DOUBLE PRECISION. In the FLOAT datatype, p denotes the number of bits of the mantissa (note it’s not the number of digits). The precision of REAL and DOUBLE PRECISION is implementation defined.

Data types in Oracle

CHAR(N)

The CHAR datatype stores fixed-length, single-byte character strings. When you create a table with a CHAR column, you must specify a string length (in bytes or characters) between 1 and 2000 bytes for the CHAR column width. The default is 1 byte. Oracle then guarantees that:

  • When you insert or update a row in the table, the value for the CHAR column has the fixed length.
  • If you give a shorter value, then the value is blank-padded to the fixed length.
  • If a value is too large, Oracle Database returns an error.

VARCHAR2(N)

The VARCHAR2 datatype stores variable-length, single-byte character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle Database stores each value in the column as a variable-length field unless a value exceeds the column’s maximum length, in which case Oracle Database returns an error. Using VARCHAR2 and VARCHAR saves on space used by the table.

The VARCHAR datatype is synonymous with the VARCHAR2 datatype.

Internal representation of CHAR and VARCHAR2 types in Oracle

Oracle stores the CHAR datatype exactly the same as a VARCHAR2, just blank-padding the value. There is no in-record storage of CHAR values, so in Oracle the CHAR datatype offers no performance improvement at all over VARCHAR2. If we also take into account the fact that CHAR uses more space to store the same value, there is little reason to use CHAR in any situation, so VARCHAR2 is recommended.

NCHAR and NVARCHAR2 data types

NCHAR and NVARCHAR2 are Unicode datatypes that store Unicode character data. The character set of NCHAR and NVARCHAR2 datatypes can only be either AL16UTF16 or UTF8 and is specified at database creation time as the national character set. AL16UTF16 and UTF8 are both Unicode encoding.

  • The NCHAR datatype stores fixed-length character strings that correspond to the national character set.
  • The NVARCHAR2 datatype stores variable length character strings.

When you create a table with an NCHAR or NVARCHAR2 column, the maximum size specified is always in character length semantics. Character length semantics is the default and only length semantics for NCHAR or NVARCHAR2.

NUMBER data type

The NUMBER datatype defines a sigmed fixed-point number. The following illustrates the syntax of the NUMBER data type:

NUMBER[(precision [, scale])]

The Oracle NUMBER data type has precision and scale.

  • The precision is the total number of digits in a number (including digits after the decimal point). It ranges from 1 to 38.
  • The scale is the number of digits to the right of the decimal point in a number. It ranges from -84 to 127.

If precision and scale are not specified explicitely, their values depend on the syntax used to define the type.

  • NUMBER(p) is equivalent to NUMBER(p, 0) and defines an integer (i.e. fixed point number with precision 0).
  • NUMBER (both parameters ommited) defines a number that can store numeric values with the maximum range and precision.

Oracle allows the scale to be negative, for example the following number will round the numeric value to hundreds: NUMBER(5,-2).

Oracle contains a number of aliases that you can use to define numeric columns as shown in the following table:

ANSI data typeOracle NUMBER data type
INTNUMBER(38)
SMALLINTNUMBER(38)
NUMBER(p,s)NUMBER(p,s)
DECIMAL(p,s)NUMBER(p,s)

Data types in MySQL

Numeric data types

MySQL supports all standard SQL numeric data types. These types include the exact numeric data types (INTEGER, SMALLINT, DECIMAL, and NUMERIC), as well as the approximate numeric data types (FLOAT, REAL, and DOUBLE PRECISION). The keyword INT is a synonym for INTEGER, and the keywords DEC and FIXED are synonyms for DECIMAL. MySQL treats DOUBLE as a synonym for DOUBLE PRECISION (a nonstandard extension). MySQL also treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled.

The type can be declared unsigned with the following syntax:

CREATE TABLE classes ( 2 total_member INT UNSIGNED 3);

CHAR and VARCHAR data types

The CHAR and VARCHAR in MySQL follow the syntax and semantics of standard MySQL. For more details see https://dev.mysql.com/doc/refman/8.0/en/char.html.

Time types

The DATE type is used for values with a date part but no time part. MySQL retrieves and displays DATE values in ‘YYYY-MM-DD’ format. The supported range is ‘1000-01-01’ to ‘9999-12-31’.

The DATETIME type is used for values that contain both date and time parts. MySQL retrieves and displays DATETIME values in ‘YYYY-MM-DD hh:mm:ss’ format. The supported range is ‘1000-01-01 00:00:00’ to ‘9999-12-31 23:59:59’.

The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of ‘1970-01-01 00:00:01’ UTC to ‘2038-01-19 03:14:07’ UTC.

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server’s time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable.

Sources

Leave a comment