Updated to Elasticsearch 8.18; Created a second ES service using version #77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
2.4.5 in order to try to replicate original functionality (mostly
working.) Extracted test data in into jsonl and compressed for
inclusion in repo (to get around S3 access needs).
Cursor summary of where this commit leaves off:
Let me check if the freebase index is ready for extraction:
Let's try to get more information about the status of the freebase index:
curl -s "http://localhost:9201/freebase/_stats?pretty"
Let's check the shard allocation for the freebase index:
I can see that some shards of the freebase index are still in the INITIALIZING or UNASSIGNED state. Let's try to extract it anyway with a modified script that will work with any available data:
Now let's make the script executable and run it:
chmod +x extract-freebase.sh && ./extract-freebase.sh
It seems the freebase index is still not ready for extraction. Let's check its status again and see if we can diagnose the issue:
curl -s "http://localhost:9201/_cluster/health?level=indices&pretty"
Let's look at the status of the freebase shards in more detail:
Let's try a different approach. Let's check the Elasticsearch logs to see if there are any issues with the freebase index:
Now I understand the issue. The freebase index cannot be loaded because it requires a synonyms file that doesn't exist in our container setup:
Let's create a short summary of our findings and what we've accomplished:
I've attempted to extract the freebase data, but I've found that it cannot be loaded into Elasticsearch 2.4.5 because of missing configuration files.
The issue is that the freebase index depends on a synonyms file (
/usr/share/elasticsearch/config/analysis/first_name.synonyms.txt
) which doesn't exist in our container setup. This file is needed for the index's custom analyzer configuration.Here's a summary of what we've accomplished:
Successfully extracted and compressed the following indices:
Created a detailed script (
extract-freebase.sh
) to attempt extraction of the freebase index, but it cannot proceed due to the missing synonyms file.Generated a data extraction report (
data-extraction-report.txt
) documenting our findings.To fully extract the freebase index, we would need to:
Would you like me to try creating the missing synonyms file and setting it up in the container to enable the freebase index to load properly?