How do I skip duplicate documents in a bulk insert? #1465

rohitkhatri · 2017-01-09T06:52:50Z

I'm trying to insert documents in bulk, I have created a unique index in my collection and want to skip documents which are duplicate while doing bulk insertion. This can be accomplished with native mongodb function:

db.collection.insert(
	<document or array of documents>,
	{
		ordered: false
	}
)

How can I achieve this in mongoengine?

The text was updated successfully, but these errors were encountered:

wojcikstefan · 2017-01-15T04:17:00Z

Unfortunately before we can support the ordered kwarg, we'll have to migrate to PyMongo 3.0+'s collection.insert_one and collection.insert_many methods. Right now we're still using the deprecated collection.insert, which doesn't support it.

In the meantime, you can use write_concern={'continue_on_error': True}. Note, however, that this won't be supported in future releases and is a hack around a poor implementation of the write_concern kwarg. You'll also have to wrap your insert in a try-except, catching NotUniqueError:

In [28]: from mongoengine import *

In [29]: connect('testdb')
Out[29]: MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())

In [30]: class Doc(Document):
    ...:     txt = StringField(unique=True)
    ...:

In [31]: Doc.drop_collection()

In [32]: Doc.objects.insert([Doc(txt='1'), Doc(txt='2')])
Out[32]: [<Doc: Doc object>, <Doc: Doc object>]

In [33]: try:
    ...:     Doc.objects.insert([Doc(txt='1'), Doc(txt='2'), Doc(txt='3')], write_concern={'continue_on_error': True})
    ...:
    ...: except NotUniqueError:
    ...:     pass
    ...:
    ...:

In [34]: Doc.objects.count()
Out[34]: 3

rohitkhatri · 2017-01-15T06:20:56Z

Thanks :-)

doaa-altarawy · 2018-11-01T19:20:55Z

Is this write_concern={'continue_on_error': True} not supported anymore?

bagerard · 2018-11-01T20:42:28Z

No its not... But I think it makes sense to re-open this so that support for ordered in bulk insert method (which uses insert_many behind the scene) can be added someday

sohaibfarooqi · 2018-11-23T02:54:06Z

Is there any workaround for this in current mongoengine release?

SiddharthPant · 2018-12-08T08:58:24Z

For now I am using raw pymongo from mongoengine as a workaround for this. So for a mongoengine Document class DocClass you will access the underlying pymongo collection and execute query like below:

from pymongo.errors import BulkWriteError


try:
    doc_list = [doc.to_mongo() for doc in me_doc_list] # Convert ME objects to what pymongo can understand
    DocClass._get_collection().insert_many(doc_list, ordered=False)

except BulkWriteError as bwe:
    print("Batch Inserted with some errors. May be some duplicates were found and are skipped.")
    print(f"Count is {DocClass.objects.count()}.")

except Exception as e:
    print( { 'error': str(e) })

Prophetofcthulhu · 2019-09-20T09:45:40Z

Anybody is working on this issue, or is it even in backlog?

fauzieuy · 2022-10-13T03:31:11Z

Does anyone has a way around for this? since continue_on_error is unexpected from write_concern arguments

wojcikstefan closed this as completed Jan 15, 2017

wojcikstefan added the Question label Jan 15, 2017

wojcikstefan changed the title ~~How to do bulk insert with ordered false in mongoengine~~ How do I skip duplicate documents in a bulk insert? Jan 15, 2017

bagerard reopened this Nov 1, 2018

bagerard mentioned this issue Sep 25, 2019

Document.objects.insert does not accept kwargs #2169

Open

annakuchko mentioned this issue Mar 12, 2021

Hw3 annakuchko/Internet_Data_Collection_and_Processing#3

Merged

rafaharo mentioned this issue Nov 28, 2022

add ordered argument to insert method #2570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How do I skip duplicate documents in a bulk insert? #1465

How do I skip duplicate documents in a bulk insert? #1465

rohitkhatri commented Jan 9, 2017

wojcikstefan commented Jan 15, 2017 •

edited

Loading

Uh oh!

rohitkhatri commented Jan 15, 2017

Uh oh!

doaa-altarawy commented Nov 1, 2018

Uh oh!

bagerard commented Nov 1, 2018

Uh oh!

sohaibfarooqi commented Nov 23, 2018

Uh oh!

SiddharthPant commented Dec 8, 2018

Uh oh!

Prophetofcthulhu commented Sep 20, 2019

Uh oh!

fauzieuy commented Oct 13, 2022 •

edited

Loading

Uh oh!

How do I skip duplicate documents in a bulk insert? #1465

How do I skip duplicate documents in a bulk insert? #1465

Comments

rohitkhatri commented Jan 9, 2017

wojcikstefan commented Jan 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohitkhatri commented Jan 15, 2017

Uh oh!

doaa-altarawy commented Nov 1, 2018

Uh oh!

bagerard commented Nov 1, 2018

Uh oh!

sohaibfarooqi commented Nov 23, 2018

Uh oh!

SiddharthPant commented Dec 8, 2018

Uh oh!

Prophetofcthulhu commented Sep 20, 2019

Uh oh!

fauzieuy commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wojcikstefan commented Jan 15, 2017 •

edited

Loading

fauzieuy commented Oct 13, 2022 •

edited

Loading