Skip to content

How do I skip duplicate documents in a bulk insert? #1465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rohitkhatri opened this issue Jan 9, 2017 · 8 comments
Open

How do I skip duplicate documents in a bulk insert? #1465

rohitkhatri opened this issue Jan 9, 2017 · 8 comments
Labels

Comments

@rohitkhatri
Copy link

I'm trying to insert documents in bulk, I have created a unique index in my collection and want to skip documents which are duplicate while doing bulk insertion. This can be accomplished with native mongodb function:

db.collection.insert(
	<document or array of documents>,
	{
		ordered: false
	}
)

How can I achieve this in mongoengine?

@wojcikstefan
Copy link
Member

wojcikstefan commented Jan 15, 2017

Unfortunately before we can support the ordered kwarg, we'll have to migrate to PyMongo 3.0+'s collection.insert_one and collection.insert_many methods. Right now we're still using the deprecated collection.insert, which doesn't support it.

In the meantime, you can use write_concern={'continue_on_error': True}. Note, however, that this won't be supported in future releases and is a hack around a poor implementation of the write_concern kwarg. You'll also have to wrap your insert in a try-except, catching NotUniqueError:

In [28]: from mongoengine import *

In [29]: connect('testdb')
Out[29]: MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())

In [30]: class Doc(Document):
    ...:     txt = StringField(unique=True)
    ...:

In [31]: Doc.drop_collection()

In [32]: Doc.objects.insert([Doc(txt='1'), Doc(txt='2')])
Out[32]: [<Doc: Doc object>, <Doc: Doc object>]

In [33]: try:
    ...:     Doc.objects.insert([Doc(txt='1'), Doc(txt='2'), Doc(txt='3')], write_concern={'continue_on_error': True})
    ...:
    ...: except NotUniqueError:
    ...:     pass
    ...:
    ...:

In [34]: Doc.objects.count()
Out[34]: 3

@wojcikstefan wojcikstefan changed the title How to do bulk insert with ordered false in mongoengine How do I skip duplicate documents in a bulk insert? Jan 15, 2017
@rohitkhatri
Copy link
Author

Thanks :-)

@doaa-altarawy
Copy link

Is this write_concern={'continue_on_error': True} not supported anymore?

@bagerard
Copy link
Collaborator

bagerard commented Nov 1, 2018

No its not... But I think it makes sense to re-open this so that support for ordered in bulk insert method (which uses insert_many behind the scene) can be added someday

@bagerard bagerard reopened this Nov 1, 2018
@sohaibfarooqi
Copy link

Is there any workaround for this in current mongoengine release?

@SiddharthPant
Copy link

For now I am using raw pymongo from mongoengine as a workaround for this. So for a mongoengine Document class DocClass you will access the underlying pymongo collection and execute query like below:

from pymongo.errors import BulkWriteError


try:
    doc_list = [doc.to_mongo() for doc in me_doc_list] # Convert ME objects to what pymongo can understand
    DocClass._get_collection().insert_many(doc_list, ordered=False)

except BulkWriteError as bwe:
    print("Batch Inserted with some errors. May be some duplicates were found and are skipped.")
    print(f"Count is {DocClass.objects.count()}.")

except Exception as e:
    print( { 'error': str(e) })

@Prophetofcthulhu
Copy link

Anybody is working on this issue, or is it even in backlog?

@fauzieuy
Copy link

fauzieuy commented Oct 13, 2022

Does anyone has a way around for this? since continue_on_error is unexpected from write_concern arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants