IUIE Documentation

Indiana University - IUIE Documentation

Report Object Instructions/Help

Getting the Most Out of Datagroup Extracts


IUIE Documentation Home Page
Frequently Asked Questions
Microsoft Excel Tips
Overview of SIRS

Overview

Datagroups are the primary data extraction tools for the new information environment. In the short term as specialized Report Objects are still in development, specific institutional data in pre-formatted reports may not be readily available to meet all business needs. It is incumbent upon end users to develop a detailed knowledge of the types of datagroups that are available in the IUIE and how to extract from those datagroups the information necessary to meet their functional and reporting needs. Datagroups are highly flexible collections of PeopleSoft table data that can be customized to meet the users needs through some basic SQL operators, filters and delimeters.

The goal of this website is to provide an introduction to data extraction techniques using the datagroups published in the IUIE fortified with general, introductory SQL procedures.

For instructional purposes, we'll be using the datagroup All Classes, Meetings & Instructors which is located in the Data Extracts folder along the following navigation path:

Master Catalog >> Student >> Records >> Schedule of Classes >> Data Extracts

(Here's a pretty picture of the navigation path)

The initial user interface for this datagroup can be found at this link. (You will be prompted to login to the IUIE before you can visit this site.)

Screenshot:

Okay, then, let's get down to it.

Today, I Learned that I Hate My Job, or
A Newbie's Guide to IUIE Datagroups, SQL, and the World Beyond the Edge of the Map

What are datagroups, really? They're a bit of fluffy and aesthetically pleasing Java code wrapped around an Oracle database table. The datagroup interface you see in the IUIE tells the table to return all of the rows from a table that fit the criteria you specify through parameter fields, column selection and, if you choose, more advanced restriction and sorting techniques such as SQL commands. Some of the datagroups are single tables (Class Associations, Class Components); some are several tables joined together into useful groups (All Classes & Meetings - In general, the '&' is a good indicator that there's more than one table cavorting beneath the IUIE's normally placid surface.). In general, Datagroups in the IUIE save you the trouble of mucking about with the code in the tables themselves, or having to create ODBC connections to the tables for manipulation with the database tool of your choice.

It's important to keep this in mind. Datagroups are powerful data extraction tools that can in many cases give you instant access to data without having to spend hours or days investigating the new data structures developed for PeopleSoft reporting. The IUIE is a new world for most of us, and it has a different sort of interface and user conventions than the old Legacy system had. This will inevitably take some time to get used to. (Though some of us may never fully adapt to the inexplicable devotion to pastels demonstrated by the IUIE design staff. That will just always be a little creepy.)

But the bottom line is that datagroups will make your life easier once you learn how to navigate them. They'll give you damned near anything you want if you treat them right and talk to them in a way they can understand. This document is designed to teach you how to do that.

So where should we start? Look again at the screenshot above. Here are a couple of things you would do well to notice:

1. Initially the only flexibility that you have as the end user is the selection of Institution, Term and/or Department parameters. If you leave any of these text boxes blank, you will receive all Institutions, Terms or Departments present on the table. Needless to say, as more data is added to the system through Legacy conversion, this could amount to millions and millions of rows over time.

(N.B. Microsoft Excel, which is the default/preferred output for datagroup extracts imposes an external ~65,000 row limit on all data extracts. If your extract is larger than 65,000 rows, Excel will not tell you that it has failed to return all the rows from your query.)

2. Additionally, you have some control over the columns that you can return in your extract. To limit the colums returned, be sure that you have marked the "Selected Columns" option under the "Select Columns to be Included" heading.

3. A query of this sort would return several thousand rows of raw data, so let's assume that we limit our selection criteria to something like this:

A report like this is still going to return almost a hundred columns of data (because we didn't select specific columns in this case) and almost five hundred rows of unsorted information records. The returned set is exactly what it seems to be - every class section offered in a given department for a given semester. While this much raw data can prove useful in certain circumstances, it doesn't answer the specific sorts of questions you may encounter in your average workday.

For instance, what if you needed an alphabetical list of instructors in a given department teaching courses in non-standard sessions? You could use the tools available in Excel to get at this data, but it would involve scanning the data, experimenting with sort options and deleting scads of useless rows. Additionally, Excel limits you to three sort criteria. What if you want to run the same report for multiple semesters and multiple departments? Suddenly you've run out of options on the Excel side.

More importantly, what if you want to select data based on a criteria other than merely Institution, Term or Department?

To get the sort of flexibility we need to answer questions like this one and to customize our datagroup extract reports so that they'll be immediately useful without loads of finagling with Excel, we have to rely on some of the Advanced features of datagroup extracts and some general SQL knowledge.

SQL: Where the Fun Starts

SQL is an acronym for Structured Query Language. It is a powerful and versatile database data extraction and reporting language. The basic components of any SQL query are four simple pieces:

SELECT these columns
FROM a table
WHERE these criteria are met, and
ORDER BY these columns

The "ORDER BY" and "WHERE" components are optional. In fact, in the extract we discussed above, the SQL code that produced our dataset of five hundred English class sections on the Bloomington campus for First Semester 2004-2005 looked like this:

SELECT * [1]
FROM SR_CMB_CLS_INSTR_GT [2]
WHERE INST_CD = 'IUBLA'
And ACAD_TERM_CD = '4048'
And CRS_SUBJ_DEPT_CD = 'ENG'

[1] The asterisk (*) is SQL shorthand for SELECT ALL COLUMNS on the table.
[2] SR_CMB_CLS_INSTR_GT is the technical name of the table the datagroup All Classes, Meetings & Instructors references

When you fill in a parameter box for a data extract, you're essentially providing dynamic selection criteria to a WHERE clause. The IUIE is doing all the code conversions behind the scenes. Our goal in this tutorial is to pop the hood on the IUIE and tinker about in the engine compartment to make our queries run more efficiently. In working with datagroups, it's important to remember that by selecting a particular datagroup and indicating which columns you'd like to have returned on your report, you've already done the SELECT and FROM statements. The customizations that you make to your data extract using SQL are solely indicated in the WHERE and ORDER BY steps.

SQL Operators

Operators in SQL (and in most programming languages) are the tools you use to build logical criteria -- they're the way you tell a database to "give me these results, but not those".

Here's a quick reference of SQL operators:

*Note: The IUIE uses Oracle SQL conventions. If you've used SQL before (especially in tandem with Microsoft Access, which has its own proprietary SQL conventions), be aware that there may be some operator and logic differences between Oracle and the platform you're used to programming against.

As stated above, the SQL operators can be thought of as arguments that will specify the exact data you want to pull out of a table through a datagroup. These operators join a column with a criteria in the WHERE clause.

For example:

WHERE INST_CD = 'IUBLA'
Where the institution is Bloomington

WHERE ACAD_TERM_CD <> '4048'
Where the academic term is not First Semester 2004-2005

WHERE CLS_ASSCT_MAX_UNT_NBR >= 3
Where the maximum credit hours are greater than or equal to three

Notice in the argument above that our criteria (3) has not been set off with single quotes, as we did with the Academic Term Code and the Institution Code in the prior two examples. If your database field has been defined as a VARCHAR or DATE field, then the value in your criteria must be set off by single quotes. If the database field has been defined as a numeric value, you should NOT use single quotes.

How do you tell if a field is VARCHAR, DATE or NUMERIC without being able to look at the table definitions? Click on the column name in the datagroup you're working with. In the All Classes, Meetings & Instructors datagroup, if I click on the column name Minimum Enrollment, the IUIE returns this help page:

The Datatype explanation tells me that Minimum Enrollment is a Number field (with a maximum length of 22 characters, if you want to go into that level of detail). So if I write a WHERE clause that involves an operation against Minimum Enrollment (CLS_MIN_ENRL_NBR), I do not want to set off my arguments in single quotes. If I do, the query will return no data.

A note on logical operators (AND, OR, NOT):

Logical operators work like conjunctions in grammar. You use them to tie together related clauses in an argument. If you have multiple criteria that all must be true for a database row to be returned, separate your WHERE clause arguments with AND. If any one of your criteria can be true, separate your arguments with OR. If you have more than one criteria, you must use logical operators to connect them.

For example, compare the following WHERE clauses run against a Schedule of Classes table:

WHERE INST_CD = 'IUBLA'
OR INST_CD = 'IUCOA'
AND ACAD_TERM_CD = '4048'

WHERE INST_CD = 'IUBLA'
AND INST_CD = 'IUCOA'
AND ACAD_TERM_CD = '4048'

WHERE INST_CD = 'IUBLA'
AND INST_CD = 'IUCOA'
OR ACAD_TERM_CD = '4048'

The first WHERE clause will bring me class section results that are on the schedule for First Semester 2004-2005 (4048) and have Bloomington or Columbus as their Institutions. (A pretty sizeable set.)

The second WHERE clause will return only class sections scheduled for First Semester 2004-2005 that have an Institution code of both Bloomington and Columbus. (A very small set.)

The third one asks the table to return class sections with an Institution Code of Bloomington and Columbus (small set, as we indicated above), or with an Academic Term of First Semester 2004-2005 (a big set). This is bad code. The database would return very few, if any, results from the first criteria, but the second one would be true for every class section on any campus offered for First Semester 2004-2005, so that's exactly what you'd get as your output. The point: Be careful with your logical operators.

Educational Resources

We've really just scratched the surface of the SQL language and the things it can do so far. Below are a list of links to introductory SQL tutorials to help you get a handle on the basic commands and logic structures.

http://www.sqlcourse.com/

http://www.w3schools.com/sql/default.asp

http://www.1keydata.com/sql/sql.html

http://www.db.cs.ucdavis.edu/teaching/sqltutorial/


Theory into Practice: Using SQL with the Advanced Datagroup function

At the bottom of any datagroup report page, you will find an button. If you click here, you will be taken to a page that looks like this:

Once again, you may enter Institution, Term and/or Department parms for your WHERE clause. Note that these values will carry over from the previous page if you decide after entering them that you'd like to add your own customized SQL to a datagroup extract report. The new box (Additional Criteria) is for your carefully honed SQL code.

Let's go back to our previous example. You've been asked to obtain a list of instructors teaching courses in the English department for the term First Semester 2004-2005, but you specifically want only the courses which are not offered in the standard session. Just for fun, let's say that you also want to exclude courses without an assigned instructor, and you want to look at all campuses.

To solve this problem, there are a few things you'll need which may not have customarily been in your tool set in the Legacy environment. With the IUIE, since you're generating your own reports rather than relying on a local data provider, it is essential that you develop an understanding of what the data tables look like. What are the department abbreviations in PeopleSoft? What are the Academic Term abbreviations? What are the codes for Majors, Room Characteristics, Cancelled Sections? Most of these you'll learn fairly quickly just from navigating and using the PeopleSoft system. When you need to find the values for an obscure code, try this handy technique:

1. Figure out which table your code lives on. For Schedule of Classes tables, try this reference.
2. Select the appropriate datagroup (avoiding the joined datagroups, if possible).
3. Select the column that has the codes you need (also select the associated "description" column for your own reference)
4. At the bottom of the page, check the "Return Distinct Rows Only" box.
5 . Run it.

The Distinct Row checkbox tells the database to only return unique values from the database. So even though an entry may be duplicated a thousand times, this feature will only report the first instance.

For example, say I want to know what all the new Report Codes are in PeopleSoft. I learn from the spreadsheet linked above that Legacy Report Codes are now Course Attribue Value Codes. Course Attribute Value Codes can be found on the Class Attributes table in the IUIE (SR_CLS_ATTRIB_GT). I fill out my selected columns, check my Distinct Rows box, etc., just as you see below.

And the IUIE tells me what the new Report Codes are (in part):

The nature of the IUIE, by exporting data access and reporting ability to academic and administrative units, makes it essential that you as the end user of the data develop a personal library of report values and database codes like the ones we've just retrieved above. You need to understand, additionally, how the data is structured so you can understand what tables to query, how those tables are related to one another and what the data within them means in context, so that the reports you extract from the IUIE answer precisely the questions you've asked, and not merely some approximation of the question.

Going back to our example, we'll need to know the system code for the English department (ENG) and that the Academic Term Code for First Semester 2004-2005 is 4048. We also need to know that in PeopleSoft, the Standard Session Code is 1.

Finally, and perhaps most importantly, we'll need to be aware of the SQL operating behind the scenes of the datagroup extract. For instance, it's important to recognize that the parameter fields on this page imply that you're supplying criteria for the WHERE clause -- that the WHERE clause actually precedes the Institution text box in the screenshot above. Thus, your additional SQL code supplied in the Additional Criteria box should begin with an AND (or an OR). You'll also need to be aware that the columns from your SELECT component are being derived from the column selection options farther down the page. We'll talk about advanced column selection details in a bit. For now, here's what the code would look like to satisfy our report requirements to this point:

Though this is technically the logic that will satisfy our query, we have a small problem. We've been asked to produce a list of instructors across all Institutions. It won't do to have Indianapolis instructors mixed with Columbus instructors, or East professors mixing with Bloomington professors. Unless we add some type of sorting logic to our query, we're going to get a list of rows that fall in the same order as they've been entered into the database. To separate these values into further logical components, we can add an ORDER BY statement.

ORDER BY statments simply tell the database in what order (d'oh!) to report the output rows. You can be as granular as you want in your sort. We'll sort by Institution, Instructor name, Session Code, Course Subject Code and Course Catalog Number in this exampe. Now our Additional Criteria box will look like this:

You can see a sample of the output for this query here.

Note: If you include an ORDER BY statement in the Additional Criteria, do not attempt to limit the Maximum Number of Rows to Return (using the option provided at the bottom of the page). If you do so, your query will not execute, or will return an error message.

Questions?

Where did you get the column names for your WHERE clause arguments?

Click on any of the column descriptions on a datagroup interface page. This will tell you the technical name of the column (as well as the datatype, as we saw above with our Numeric column).

Hey, that document you linked to show us the conversion values between Legacy, PeopleSoft and the IUIE only includes Schedule of Classes information! What do I do when I need enrollment information codes?

Again, because I knew you'd ignore me the first time:

1. Figure out which table your code lives on.
2. Select the appropriate datagroup (avoiding the joined datagroups, if possible).
3. Select the column that has the codes you need (also select the associated "description" column for your own reference)
4. At the bottom of the page, check the "Return Distinct Rows Only" box.
5 . Run it.

I know this is tedious, but we're all going to have to do it until we get some of this stuff down. Hopefully most of the datagroups have been named intuitively enough that you can make some decent guesses about what tables hold what information. Nobody is going to kill you if you run a datagroup extract that doesn't have the information you need. Jump in and get your hands dirty.

Or, I suppose, there's always the off chance that someone in your department has already compiled some sort of documentation of the PeopleSoft coding schemes that you don't know about. You're welcome to ask around, but as we all know, documentation is for weenies.

What if I keep getting error messages from the IUIE in spite of your fabulous tutorial here? Who do I contact for help?

Click the button at the top of the page. You can send an e-mail to the IUIE Help Desk. They're pretty spiffy.

You said you were going to talk about the tricks and potential pitfalls of column selection.

Yes, yes. Moving on, then.

Selecting Columns: Building Your own SELECT Statement Dynamically

You've got your parameters filled out.
You've got your nifty and splendiferous SQL code finally hammered into shape.
You've only ripped out a bit of your hair, and your tummy has finally stopped clenching over that breakfast roll.

Let's talk about the data you actually want to see on this report of yours. If you know anything about the IUIE datagroups, it's that somebody has gone out of their way to make sure that you have access to an obsecenely varied and seemingly redundant mass of data. Not just Academic Term Codes, but Academic Term Short Descriptions. Not enough? We'll throw in the Academic Term Looooong Description for free. (There was some talk of actually developing an Academic Term Mind-Bogglingly Stupendously Ultralong Description, but that was ultimately nixed due to funding issues.) Chances are that you do not want all of the data available in a given table returned to you every time you run a query. We are so sure of this, in fact, that the default SELECT statement for an IUIE datagroup is

which may be translated as:

SELECT (err..., um, yeah)
FROM this table

If you attempt to run a datagroup just like this, the IUIE will give you a rather rude message about how you are required to choose some columns to include on your report before you can reasonably expect the environment to bend over backwards to meet your needs.

You select columns by clicking on the little boxes next to the column title descriptions a bit of the way down the page. Feel free to follow along on the page shot below:

I've selected Academic Term, Institution, Academic Group Code, Subject Dept Code and Course Catalog Number. As far as the IUIE is concerned, my SELECT statement now reads:

SELECT ACAD_TERM_CD, INST_CD, ACAD_GRP_CD, CRS_SUBJ_CD, CRS_CATLG_NBR
FROM SR_CMB_CLS_INSTR_GT
WHERE yadda, yadda, yadda

What I want you to remember:

1. If I have an ORDER BY statement in my Additional Criteria, I can only reference the columns in my SELECT statement above.

2. Notice that Academic Term End Date and Academic Term Begin Date give you formatting options. The default is MM/DD/YYYY on an Oracle database (the backbone of the IUIE). All Date and Time fields default to MM/DD/YYYY as their output option. If you looked at the spreadsheet report linked above, you'll have noticed that this works fine for date fields, but for things like Class Start Time, what you end up with is an odd date figure followed by the time. Save yourself some aggravation and pick the appropriate mask from these drop down boxes before you run your report.

(For the technical among you, this is a to_char function. If you'd like to know more about to_char functions in SQL, feel free to check out this site: http://www.oreilly.com/catalog/sqlnut/chapter/ch04.html. If you're not one of those people who wants to know more, chances are that you're a little miffed for having to have read this whole ponderous document just to try to get something like the report you used to get from some anonymous techie person for free in the past, so because I feel your pain, we'll just move along to even more sublime heights of documentary bliss and try to wrap this up as quickly as possible.)

Odds & Ends

What's this last bit, then?

Open in Spreadsheet or Browser?

If you choose "Browser", all of your wittily etched SQL and careful column selections will be blown away. Do not pick this option. Trust me on this.

Select an Output Destination?

Whatever. No really, I mean that. What-ever.

Field delimiter? Record delimiter? Aagh!

More useful than it looks, really. This is handy if you want to export the file coming out of the IUIE in something other than a tab delimited file. Excel likes its columns in tab delimited files. Your database in Access may not like them so much, and may want each record to have a set of quotation marks around it. You can customize the output file to meet the needs (or merely the preferences) of the software you want to import the data extract into. Flexibility rocks.

Return distinct rows?

Get rid of those pesky duplicate rows. Say you want to generate a list of report codes or other course attributes from the Class Attributes datagroup. Many (MANY) of the report codes are listed multiple (MULTIPLE) times, tied to the Classes table by the CLS_KEY field. We're talking about thousands of rows for fewer than a hundred report codes. Click this button and all but a unique set go away.

It's handy.

Questions?

 

 


 

 

 

 


 
For additional information about this report, please contact the Bloomington Campus Office of the Registrar at 855-2472 or e-mail us at REGR-IUIE@indiana.edu.

Last updated: November 14, 2003
URL:http://www.iu.edu
Copyright 2003, The Trustees of Indiana University