Contributors mailing list archives

contributors@odoo-community.org

Browse archives

Avatar

Re: Large Data Files

by "Jerôme Dewandre" <jerome.dewandre.mail@gmail.com> - 20/08/2024 23:51:41
Hello,

Thank you very much for your quick responses :) 

Tom Blauwendraat: I am running on v16

Holger Brunn: adapting the script with .with_context(tracking_disable=True) to Disable email notification divides the running time by at least 4

Goran Sunjka: It is indeed an interesting idea, I was wondering if I could store a hash of the row in Postgres to check if an existing record was updated to separate "create" and "update" action


Daniel Reis: This is indeed the problem I encountered.


Thank you all for your replies, it helps a lot :)

Jérôme


On Tue, Aug 20, 2024 at 7:47 PM Daniel Reis <notifications@odoo-community.org> wrote:
I would expect this code to just abort for a non trivial quantity of records.
The reason why is that this is a single worker doing a single database transaction.
So the worker process will probably hit the time and CPU limits and be killed, and no records would be saved because of a transaction rollback.
And if you increase those limits a lot, you will probably cause long table locks on the database, and hurt other users and processes.

Going direct to the database can work if the data is pretty simple.
It can work but it can also be a can of worms.

One approach is to have an incremental approach to the data loading.
In the past I have used external ETL tools or scripts to do this.
Keeping it inside Odoo, one of the tools that can help is the Job Queue, possibly along with something like base_import_async:
https://github.com/OCA/queue/tree/16.0/base_import_async

Thanks

--
DANIEL REIS
MANAGING PARTNER

Meet with me.
M: +351 919 991 307
E: dreis@OpenSourceIntegrators.com
A: Avenida da República 3000, Estoril Office Center, 2649-517 Cascais

[Logo OpenSourceIntegrators.com]



On 20/08/2024 16:32, Jerôme Dewandre wrote:
Hello,

I am currently working on a syncro with a legacy system (adesoft) containing a large amount of data that must be synchronized on a daily basis (such as meetings).

It seems everything starts getting slow when I import 30.000 records with the conventional "create()" method.

I suppose the ORM might be an issue here. Potential workaround:

1. Bypass the ORM to create a record with self.env.cr.execute (but if I want to delete them I will also need a custom query)
2. Bypass the ORM with stored procedures (https://www.postgresql.org/docs/current/sql-createprocedure.html)
3. Increase the CPU/RAM/Worker nodes
4. Some better ideas?

What would be the best way to go?

A piece of my current test (df is a pandas dataframe containing the new events): 

@api.model
    def create_events_from_df(self, df):
        Event = self.env['event.event']
        events_data = []
        for _, row in df.iterrows():
            event_data = {
                'location': row['location'],
                'name': row['name'],
                'date_begin': row['date_begin'],
                'date_end': row['date_end'],
            }
            events_data.append(event_data)
       
        # Create all events in a single batch
        Event.create(events_data)

Thanks in advance if you read this, and thanks again if you replied :)

Jérôme

_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe


_______________________________________________
Mailing-List: https://odoo-community.org/groups/contributors-15
Post to: mailto:contributors@odoo-community.org
Unsubscribe: https://odoo-community.org/groups?unsubscribe

Reference