Why is FormReturn taking so long to publish my Form ID publication that has 5,000+ records?

The problem is caused by a limitation of the SQL queries that create the publication records. The issue is that the publication records are wrapped in a transaction that continues to consume memory until all records are published and the transaction is committed. If the size of the transaction exceeds the size of the allocated memory (1GB), you will end up with an out of memory exception and the PDF files will not be created. So the reason why is:

  1. It's taking a long time to publish because as the garbage collector (a programatic internal system that reclaims a program's memory as it is finished with internal variables) works much harder to reclaim as much available memory as is possible as it approaches its memory limit, everything slows down exponentially.

  2. The program will eventually hit its allocated memory limit and silently throws an out of memory exception. It will not create PDF files if this happens.

The solution is:

Create 10 publications of 1450 records each. I am assuming that there are one or two pages for the publication? Doing this will mean the publication transaction will commit a long time before the program begins to run out of memory.

It is something that isn't really discussed on the website or in the help documentation because most people don't create very large publications, however, it is good practice to break down very large publications in to smaller, more manageable batches. When you have to reprocess forms (when a bubble has marks that extend to another bubble), or when the page is warped dramatically (due to the scanner feeding), or when the shading is too light, or for any other reason the forms need to be fixed when they have an "error" flag set once scanned in: someone will have to manually reprocess the form. Depending on how well the forms are filled out, there is normally around a 3-5% number of forms that need to be manually fixed. Batching these publications into smaller publications will help you to split up the task of reprocessing pages. This is particularly important when you want more than one person to help you with the reprocessing of forms. You can work it so you can copy your database to another computer (or network the main database) and have someone else process another batch while you process your batch. This reduces the time spent on processing pages.

Also, you shouldn't scan in 15000 forms in one go, you should do 1500 at a time, check that the data (or scanning) is okay by checking the error rate in FormReturn, then do another 1500. This means you don't have to go searching through all of the scanned pages to find the ones you may want to just re-scan instead of reprocessing.