Id field with insertDataFromSelect

procept · 6 April 2025 08:13

I am trying to insert data into a table by using insertDataFromSelect(), but, it seems this function behaves differently than insertData().

The value for the fields id, datecreated, and datemodified in the target table are not auto-created, which results in a DB error, of course.

While it is easy enough to set values for the date fields, the id field is a different story. I can’t use the MySQL function UUID(), because the MySQL uuid is formatted differently than the CFML uuid (36 chars vs. 35 chars).

I went through the code in the Preside core and found that insertDataFromSelect() is only used on objects that have no id or on objects that have an id field with a generator set to “increment”.

My feeling is that there should be a way to auto-create the new ids even when they are CFML uuids. Am I missing something or doing something wrong?

Thank you!

seb · 7 April 2025 08:07

You’re right, there probably should be a way to use this with auto-creation of ID and dates. However, the very nature - and purpose - of the function is to perform big inserts purely on the SQL side without the overhead of individual insert statements. So any ID generation would also have to be done on the SQL side.

For now, you can generate a pseudo-CF ID in MySQL using the following:

ucase( regexp_replace( uuid(), '([[:word:]]+)\-([[:word:]]+)\-([[:word:]]+)\-([[:word:]]+)\-([[:word:]]+)', '\\1-\\2-\\3-\\4\\5' ) )

Be aware also that, unlike in CFML, adjacent MySQL-generated UUIDs can look very similar to each other, which can cause confusion. These are 3 IDs created in the same statement:

19660711-1386-11F0-9F7906E3A70FC579
19660722-1386-11F0-9F7906E3A70FC579
19660733-1386-11F0-9F7906E3A70FC579

There may also be scenarios where the ID on the original table would serve perfectly well as the ID on the inserted data, depending on your use case - so you could just select it from the original table and then insert it to the new one…

seb · 7 April 2025 08:13

If you want more distinct IDs, try the following. Not sure if the md5() will cause any noticeable overhead when handling large volumes of data…

regexp_replace( ucase( md5( uuid() ) ), '(.{8})(.{4})(.{4})(.{12})', '\\1-\\2-\\3-\\4' )

And then this may or may not be more performant than the regexp_replace:

INSERT( INSERT( INSERT( UCASE( MD5( UUID() ) ), 9, 0, '-' ), 14, 0, '-' ), 19, 0, '-' )

seb · 7 April 2025 08:28

Performance doesn’t appear to be an issue. Both the above examples generated 11,000 rows in ~150ms, which I think will be fine for any scenario…

I have created a new feature ticket to look into adding this functionality:

https://presidecms.atlassian.net/browse/PRESIDECMS-3048

procept · 7 April 2025 08:51

Hey Seb,

thank you for taking the time to look into this, I highly appreciate your solutions!

I just did a test on my local system with roughly 1,000,000 records. The times for the different approaches range from ~ 550ms to ~ 780 ms.

Because I expect no more than 1/10 of that volume, I will go for one of the solutions with the more distinct uuids