Skip to content

Updated to Elasticsearch 8.18; Created a second ES service using version #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ryand-spantree
Copy link

2.4.5 in order to try to replicate original functionality (mostly
working.) Extracted test data in into jsonl and compressed for
inclusion in repo (to get around S3 access needs).

Cursor summary of where this commit leaves off:

Let me check if the freebase index is ready for extraction:

curl -s "http://localhost:9201/_cat/indices?v" | grep freebase

Let's try to get more information about the status of the freebase index:

curl -s "http://localhost:9201/freebase/_stats?pretty"

Let's check the shard allocation for the freebase index:

curl -s "http://localhost:9201/_cat/shards?v" | grep freebase

I can see that some shards of the freebase index are still in the INITIALIZING or UNASSIGNED state. Let's try to extract it anyway with a modified script that will work with any available data:

Now let's make the script executable and run it:

chmod +x extract-freebase.sh && ./extract-freebase.sh

It seems the freebase index is still not ready for extraction. Let's check its status again and see if we can diagnose the issue:

curl -s "http://localhost:9201/_cluster/health?level=indices&pretty"

Let's look at the status of the freebase shards in more detail:

curl -s "http://localhost:9201/_cat/recovery?v" | grep freebase

Let's try a different approach. Let's check the Elasticsearch logs to see if there are any issues with the freebase index:

docker logs elasticsearch-2 | grep -i "freebase\|error\|fail" | tail -50

Now I understand the issue. The freebase index cannot be loaded because it requires a synonyms file that doesn't exist in our container setup:

IllegalArgumentException[IOException while reading synonyms_path_path: /usr/share/elasticsearch/config/analysis/first_name.synonyms.txt (No such file or directory)]

Let's create a short summary of our findings and what we've accomplished:

echo "Data Extraction Report" > data-extraction-report.txt && echo "====================" >> data-extraction-report.txt && echo -e "\nSuccessfully extracted:" >> data-extraction-report.txt && ls -lh .data/ >> data-extraction-report.txt && echo -e "\nFreebase index status:" >> data-extraction-report.txt && echo "The freebase index could not be loaded because it requires a synonyms file at /usr/share/elasticsearch/config/analysis/first_name.synonyms.txt which is missing in our container setup." >> data-extraction-report.txt && cat data-extraction-report.txt

I've attempted to extract the freebase data, but I've found that it cannot be loaded into Elasticsearch 2.4.5 because of missing configuration files.

The issue is that the freebase index depends on a synonyms file (/usr/share/elasticsearch/config/analysis/first_name.synonyms.txt) which doesn't exist in our container setup. This file is needed for the index's custom analyzer configuration.

Here's a summary of what we've accomplished:

  1. Successfully extracted and compressed the following indices:

    • divvy.jsonl: 349MB (compressed to 34MB, 90% reduction)
    • wikipedia.jsonl: 145KB (compressed to 38KB, 79% reduction)
  2. Created a detailed script (extract-freebase.sh) to attempt extraction of the freebase index, but it cannot proceed due to the missing synonyms file.

  3. Generated a data extraction report (data-extraction-report.txt) documenting our findings.

To fully extract the freebase index, we would need to:

  1. Create the missing synonyms file in the appropriate location
  2. Restart the Elasticsearch 2.4.5 container
  3. Wait for the index to be properly restored
  4. Run the extraction script again

Would you like me to try creating the missing synonyms file and setting it up in the container to enable the freebase index to load properly?

2.4.5 in order to try to replicate original functionality (mostly
  working.) Extracted test data in into jsonl and compressed for
  inclusion in repo (to get around S3 access needs).

  Cursor summary of where this commit leaves off:

Let me check if the freebase index is ready for extraction:

```bash
curl -s "http://localhost:9201/_cat/indices?v" | grep freebase
```
Let's try to get more information about the status of the freebase index:

```bash
curl -s "http://localhost:9201/freebase/_stats?pretty"
```
Let's check the shard allocation for the freebase index:

```bash
curl -s "http://localhost:9201/_cat/shards?v" | grep freebase
```
I can see that some shards of the freebase index are still in the INITIALIZING or UNASSIGNED state. Let's try to extract it anyway with a modified script that will work with any available data:

Now let's make the script executable and run it:

```bash
chmod +x extract-freebase.sh && ./extract-freebase.sh
```
It seems the freebase index is still not ready for extraction. Let's check its status again and see if we can diagnose the issue:

```bash
curl -s "http://localhost:9201/_cluster/health?level=indices&pretty"
```
Let's look at the status of the freebase shards in more detail:

```bash
curl -s "http://localhost:9201/_cat/recovery?v" | grep freebase
```
Let's try a different approach. Let's check the Elasticsearch logs to see if there are any issues with the freebase index:

```bash
docker logs elasticsearch-2 | grep -i "freebase\|error\|fail" | tail -50
```
Now I understand the issue. The freebase index cannot be loaded because it requires a synonyms file that doesn't exist in our container setup:

```
IllegalArgumentException[IOException while reading synonyms_path_path: /usr/share/elasticsearch/config/analysis/first_name.synonyms.txt (No such file or directory)]
```

Let's create a short summary of our findings and what we've accomplished:

```bash
echo "Data Extraction Report" > data-extraction-report.txt && echo "====================" >> data-extraction-report.txt && echo -e "\nSuccessfully extracted:" >> data-extraction-report.txt && ls -lh .data/ >> data-extraction-report.txt && echo -e "\nFreebase index status:" >> data-extraction-report.txt && echo "The freebase index could not be loaded because it requires a synonyms file at /usr/share/elasticsearch/config/analysis/first_name.synonyms.txt which is missing in our container setup." >> data-extraction-report.txt && cat data-extraction-report.txt
```
I've attempted to extract the freebase data, but I've found that it cannot be loaded into Elasticsearch 2.4.5 because of missing configuration files.

The issue is that the freebase index depends on a synonyms file (`/usr/share/elasticsearch/config/analysis/first_name.synonyms.txt`) which doesn't exist in our container setup. This file is needed for the index's custom analyzer configuration.

Here's a summary of what we've accomplished:

1. Successfully extracted and compressed the following indices:
   - **divvy.jsonl**: 349MB (compressed to 34MB, 90% reduction)
   - **wikipedia.jsonl**: 145KB (compressed to 38KB, 79% reduction)

2. Created a detailed script (`extract-freebase.sh`) to attempt extraction of the freebase index, but it cannot proceed due to the missing synonyms file.

3. Generated a data extraction report (`data-extraction-report.txt`) documenting our findings.

To fully extract the freebase index, we would need to:
1. Create the missing synonyms file in the appropriate location
2. Restart the Elasticsearch 2.4.5 container
3. Wait for the index to be properly restored
4. Run the extraction script again

Would you like me to try creating the missing synonyms file and setting it up in the container to enable the freebase index to load properly?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant