Hello,
I'm ingesting data with python to elasticsearch and I wrote a separate python script which deletes duplicate entries. When I start the deduplicating script from the bash console, it works perfectly. Now, when I call the py file from my other scripts, it works once, then I reproduce the duplicates, and the second time the deduplicating script stops deleting after a couple of entries - in the error log I see that a hash was not found on elasticsearch. What then helps is to restart the python server, then it works again for one time. What could be problem? I can't restart the pythonanywhere server every time after one deduplicating process. It should work automatically.
Error log of the 2nd deleting:
2021-07-02 10:33:39,301: Found credentials in shared credentials file: ~/.aws/credentials
2021-07-02 10:33:40,128: POST https://search-xxxxx-mynnywnysa2f3jytnh5syqixvm.eu-central-1.es.amazonaws.com:443/_bulk [status:200 request:0.736s]
2021-07-02 10:33:40,230: POST https://search-xxxxxe-mynnywnysa2f3jytabcdefg.eu-central-1.es.amazonaws.com:443/cdr/_search?size=20 [status:200 request:0.101s]
2021-07-02 10:33:53,308: POST https://search-xxxxx-mynnywnysa2f3jyabcdefg.eu-central-1.es.amazonaws.com:443/cdr/_search?scroll=5m&size=1000 [status:200 request:0.193s]
2021-07-02 10:33:53,411: POST https://search-xxxxx-mynnywnysa2f3jyabcdefg.eu-central-1.es.amazonaws.com:443/_search/scroll [status:200 request:0.098s]
2021-07-02 10:33:53,508: DELETE https://search-xxxxx-mynnywnysa2f3jyabcdefg.eu-central-1.es.amazonaws.com:443/_search/scroll [status:200 request:0.096s]
2021-07-02 10:33:53,604: POST https://search-xxxxx-mynnywnysa2f3jyabcdefg.eu-central-1.es.amazonaws.com:443/cdr/doc/_mget [status:200 request:0.096s]
2021-07-02 10:33:53,706: DELETE https://search-xxxxx-mynnywnysa2f3jyabcdefg.eu-central-1.es.amazonaws.com:443/cdr/_doc/cV3EZnoBaFIiCCZjy_6s [status:404 request:0.102s]
2021-07-02 10:33:53,707: Exception on /deduplicate [POST]
Traceback (most recent call last):
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/flask/app.py", line 2070, in wsgi_app
response = self.full_dispatch_request()
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/flask/app.py", line 1515, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/flask/app.py", line 1513, in full_dispatch_request
rv = self.dispatch_request()
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/flask/app.py", line 1499, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/xxxx/mysite/flask_app.py", line 63, in deduplicate
dp.main()
File "/home/xxxx/mysite/duplicates.py", line 88, in main
loop_over_hashes_and_remove_duplicates()
File "/home/xxxx/mysite/duplicates.py", line 71, in loop_over_hashes_and_remove_duplicates
es.delete(index='cdr', id=dup_id)
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 168, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 605, in delete
"DELETE", _make_path(index, doc_type, id), params=params, headers=headers
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/transport.py", line 415, in perform_request
raise e
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/transport.py", line 388, in perform_request
timeout=timeout,
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/connection/http_requests.py", line 204, in perform_request
self._raise_error(response.status_code, raw_data)
File "/home/xxxx/.virtualenvs/my-virtualenv/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 331, in _raise_error
status_code, error_message, additional_info
elasticsearch.exceptions.NotFoundError: NotFoundError(404, '{"_index":"cdr","_type":"_doc","_id":"cV3EZnoBaFIiCCZjy_6s","_version":1,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2605,"_primary_term":1}')
Thank you!