Creating and Using Files: Data Types and Data Storage

You must assign a data type to each column in a table.

By Brian Meyers and Jim Buck

Editor’s Note: This article is excerpted from chapter 3 of Programming in ILE RPG, Fifth Edition.

The column’s data type determines how its values are stored, how much storage the column occupies, and what kinds of operations can be performed on the column when it is used within a program. Commonly used data types fall into three general categories:

To understand the various data types, let’s first examine how the computer stores data. You probably know that any numeric value can be converted from its familiar decimal, or base 10, value to a corresponding value in binary, or base 2, notation. The computer stores all data in this binary format, with all data represented as a combination of 1s and 0s. At first glance, data representation should be a simple matter of converting values from one base to another. Many characters and values that you want to represent to the computer are not numbers, though— instead, they are letters of the alphabet, for example, or special characters such as $ and {.

A bit is a binary digit that can have a value of 1 or 0 (sometimes called on or off). A string of eight bits is called a byte. A byte can have up to 256 different possible combinations of bit states; the system can use these combinations to represent up to 256 different characters.

IBM developed a coding scheme to allow a data character—numeric or non-numeric—to be represented to the computer. This coding scheme, based on the English alphabet, is called Extended Binary Coded Decimal Interchange Code, or EBCDIC (generally pronounced as eb-si-dik). EBCDIC assigns a unique eight-bit binary pattern to each representable character. Capital A, for example, is 11000001 in EBCDIC, and the digit 1 is represented as 11110001. The leftmost four bits are often called zone or high-order bits, and the rightmost four bits are digit or low-order bits. Because eight bits constitute a byte, it takes one byte of storage to store each character in EBCDIC. IBM i stores all non-numeric, or character, data values this way: one byte represents a character.

Instead of reading long strings of 1s and 0s, most programmers find it easier to represent byte values by using hexadecimal (base 16) notation for the high and low order bits, according to the following table:

Using this table, you can see that the bit pattern 11000001 can be referred to as hex character C1 (usually noted as X’C1′).

All computers use some kind of an encoding system to represent characters as bit patterns, but not all systems use EBCDIC, which is unique to some IBM systems. You may already be familiar with the encoding system called ASCII (American Standard Code for Information Interchange), generally pronounced as as-kee. EBCDIC was developed separately from ASCII. Consequently, the bit patterns in ASCII are not the same as those in EBCDIC. For example, the letter A in EBCDIC is represented by a hex C1, but in ASCII it is a hex 41. In addition, the two schemes use different collating sequences; that is, they do not sort all the characters in the same sequence. In EBCDIC, lowercase characters sort before uppercase characters, and letters sort before numbers (e.g., a, b, c, … A, B, C, … 7, 8, 9). In ASCII, the opposite is true (e.g., 1, 2, 3, … A, B, C, … x, y, z). Figure 3.5 shows some commonly used characters and their equivalent EBCDIC and ASCII encoding (in EBCDIC sequence).

Figure 3.5: EBCDIC and ASCII coding

Character Data Types

Character data can be any symbol that the system supports. Alphabetic letters, numbers, and special characters (e.g., punctuation marks, currency symbols) are all character data. A character string is a sequence of characters. An RPG program typically processes character data by assigning values from one character variable to another, by concatenating (joining) character strings, or by converting character data to another data type.

CHAR

When SQL defines a column as CHAR (or CHARACTER), it is defining a fixed-length character string. You specify the length attribute in parentheses. All values in the column have the same length in storage. Values that don’t fill the entire length are padded with blanks. The following column definition defines Address, a fixed-length character column 35 characters (bytes) long:

Most character columns in an IBM i database are fixed length. When you don’t specify a length, the column is one byte—the maximum length is 32,766.

VARCHAR

Specifying a data type of VARCHAR (or CHAR VARYING or CHARACTER VARYING) defines a column as a variable-length character string. You specify the maximum size (up to 32,740) in parentheses. The following column definition defines Email, a variable-length character column 1–256 characters long:

You use a variable-length character string when the length is uncertain. Typically, the value in a variable-length column is not padded with blanks to fill the entire allotted storage space. Instead, the same column in different rows contains different lengths, depending upon the column value. Use CHAR when most values in a column are likely to be the same size, and use VARCHAR when you anticipate the values will vary considerably in size.

Numeric Data Types

Numeric data is limited to numeric digits 0–9. Numeric variables also incorporate a sign embedded in the rightmost position of the variable. An RPG program typically processes numeric data by assigning values from one numeric variable to another, by evaluating arithmetic expressions, or by converting numeric data to another data type.

SQL defines numeric data with a precision and a scale. Precision refers to the total number of digits available in the number without regard to any sign. Scale indicates how many of those digits are fractional (i.e., to the right of the decimal point).

Numeric data types that SQL supports include numeric, decimal, and integer (as well as other less-frequently used types). Why are there different data types for the same class of values? RPG does not differentiate among numeric data types in determining the kinds of operations that you can perform on a field or the kinds of output editing that are possible; however, the data type of a numeric field—zoned numeric, packed decimal, or integer— determines how that field is represented and stored within the database.

To understand the various data representations, recall the EBCDIC coding scheme used to represent characters, in this case, numbers. The following table shows the EBCDIC codes for the digits 0–9:

Notice that the high-order bits (zone) of all digits are identical: 1111. This means that the zone portion is redundant for numeric data; that is, if the database already knows that the data is numeric, it knows that the zones of the data are all 1111, or F. The database can take advantage of this knowledge to store numeric data efficiently.

NUMERIC

Specifying a data type of NUMERIC (or NUM) defines a column as a zoned (or signed) decimal number. Zoned decimal numbers take a full byte to store each digit of a numeric value. Thus, a three-digit number occupies three bytes. The zone of the rightmost digit stores the sign of the data: 1111 (F) represents a positive value, whereas 1101 (D) represents a negative value. Zoned representation, then, is nearly identical to character representation except that the sign is embedded in the rightmost digit’s byte:

The following column definition defines CredLimit, a zoned decimal numeric column with a precision (total length) of nine digits and a scale (number of decimal places) of two digits:

Accordingly, CredLimit can hold a value up to 9,999,999.99. The maximum precision for a numeric column is 63 digits, and the scale can be any number from 0 to the precision of the number. If you omit scale, it is assumed to be 0; if you omit both precision and scale— specifying simply NUMERIC—SQL assumes NUMERIC(5,0).

Zoned decimal values can be useful when a program needs to present numeric values as text. For positive numeric values, the zoned numbers render as the correct characters (136, in the preceding example). For negative numbers, however, the last digit renders as an alphabetic character J–R (X’D1’–X’D9′).

DECIMAL

Specifying a data type of DECIMAL (or DEC) defines a column as a packed decimal number. Packed numbers take advantage of the redundancy built into digit representation by simply not bothering to store the zones of numbers. In packed format, only the digit, or low order, bits of a number are stored, with the sign of the number represented by an additional four bits. These sign bits always occupy a packed decimal value’s last four bit positions:

Data in packed format takes just over half the amount of storage necessary to store the same number in zoned decimal format. A three-digit number occupies two bytes. This may not seem like a great difference, but larger numbers generate larger storage savings: a 63-digit number in packed format occupies only 32 bytes.

The following column definition defines the same CredLimit column in the earlier example, but this time it is a packed decimal number:

If you omit scale, it is assumed to be 0; if you exclude both precision and scale—specifying simply DECIMAL—SQL assumes DECIMAL(5,0). Because of the method packed numbers use to store numeric values, programmers usually define packed decimal columns with an odd precision so that the complete value (with the sign added) occupies whole bytes.

Notice that the character representation for packed numbers is not generally easily translatable to the numeric value. The hexadecimal representation usually makes sense, but the bit patterns for packed numbers render as nondisplayable or unrelated EBCDIC characters.

Most numeric columns in the typical IBM i database are packed decimal numbers. Although some programmers prefer to define numeric columns as zoned—because it’s easier to print or view the raw data in this format—the computer works more efficiently with numbers stored in packed decimal format. For most numeric processes, RPG implicitly converts numeric values to packed decimal format before processing them.

INTEGER, SMALLINT, BIGINT

You’ve seen that packed decimal numbers are more compact than zoned decimal numbers. Integer representation is even more compact than packed decimal representation. Integers store numeric values in a binary format. Each bit in a string is assigned a decimal value, and the values of the on bits are simply added together to determine the number’s signed value. SQL supports three integer-related data types. Which one you use will depend upon the range of values that the numbers must hold:

The following column definition defines the column Terms, an integer:

Notice that there is no need to indicate a precision or scale. The precision (10) is set by the INT data type, and the scale for all integers is 0.

Although the use of integers is common in RPG programs, its use in database objects is less so. Most numeric database columns are DECIMAL or NUMERIC data types, which have more flexible precision and scale options than Integer data types. If, however, the anticipated data values fit into the allowed range and scale for any of the Integer data types, they are an efficient and useful data type for numeric data.

Date and Time Data Types

IBM i supports several date and time data types, which represent dates, times, and time- stamps. An RPG program typically processes dates by assigning one date variable to another, by perform date arithmetic and comparisons, or by converting a date to another data type. The date and time data types are neither character nor numeric data types; they are distinct data types that the computer can easily recognize, process, and manipulate as dates without requiring special coding or conversion. They are sometimes called native dates, as opposed to legacy dates, which are character strings or numbers that have valid date or time values. Legacy dates require special processing to convert them to dates or to use them as dates.

DATE

Specifying a data type of DATE defines a column as a date. A date is a three-part value, incorporating a month, day, and year between 0001–9999. When defining a date, you do not need to specify a length because the system determines the length automatically. The following column definition defines the column Renewal, a date:

An ILE RPG program can retrieve a date value and process it. The presentation of the date in the program conforms to any of eight different formats in use by the program. Chapter 8 covers the date data type in more detail.

TIME

Specifying a data type of TIME defines a column as a time of day, integrating hours, minutes, and seconds. Time data follows many of the same principles as the date data type. When defining a time, you do not need to specify a length. The following column definition defines the column CallTime, a time:

An ILE RPG program can retrieve a time value and process it. As with dates, the presentation of the time in the program conforms to any of five different time formats in use by the program, which Chapter 8 details.

TIMESTAMP

A timestamp is a combination of a date and time in a single variable. Specifying a data type of TIMESTAMP defines a timestamp column. Timestamps have six or seven parts: year, month, day, hour, minute, second, and optionally fractional seconds. The following column definition defines the column LastUpdate, a timestamp with microseconds included:

By default, timestamps contain microseconds (six digits to represent fractional seconds):

If you need a different scale, the column definition can include a length 0–12 to represent the

desired number of fractional seconds’ digits. For example,

omits all fractional seconds:

Next time: Creating Views with SQL. Can’t wait? Want to learn more about Programming in ILE RPG? Pick up the book in the MC Bookstore today!

JW_DISQUS_VIEW_THE_DISCUSSION_THREAD

Jim Buck’s career in IT has spanned more than 35 years, primarily in the college education, manufacturing, and healthcare industries. Past president (13 years) of the Wisconsin Midrange Computer Professional Association, he has served on several teams developing IBM and COMMON certification tests. Jim has co-authored several IBM i textbooks with Bryan Meyers that are used by many companies and in colleges worldwide. Other accomplishments include: recipient of the 2007 IBM System i Innovation – Education Excellence Award, 2014 COMMON President’s Award, and 2013/2016/2017 IBM Champion – Power Systems.

Jim is the president and founder of imPower Technologies, where he provides professional IBM i training and consulting services. He is active in the IBM i community, working to help companies train their employees in the latest IBM technologies and develop the next generation of IBM i professionals.

MC Press books written by Jim Buck available now on the MC Press Bookstore.

Created: 2023-01-31 19:23:34

Category: RPG

You may already be familiar with Structured Query Language (SQL), an industry-standard language for …

Created: 2023-01-03 19:23:34

Category: RPG

In this excerpt series, you learn how IBM i handles database files. You also discover the difference…

Created: 2022-01-13 19:23:34

Category: RPG

Part three of getting started ILE RPG program addresses free vs. fixed and internal documentation. R…

Created: 2023-01-31 19:23:34

Category: RPG

You may already be familiar with Structured Query Language (SQL), an industry-standard language for …

Created: 2023-01-03 19:23:34

Category: RPG

In this excerpt series, you learn how IBM i handles database files. You also discover the difference…

Created: 2022-01-13 19:23:34

Category: RPG

Part three of getting started ILE RPG program addresses free vs. fixed and internal documentation. R…