repo-ml-analysis-service is a FastAPI-based Python microservice for lightweight ML analysis related to GitHub repositories and project data.
It is designed to work independently or as a microservice within DailyRepo pipeline.
Currently, it performs keyword frequency and clustering analysis using ML. Future functionality will include star history growth and trend analysis for repositories.
- Accepts a list of keywords and returns the top N most frequently used keywords using a small transformer model.
- POST /analyze-keywords
curl --location 'http://localhost:8080/analyze-keywords' \
--header 'Content-Type: application/json' \
--data '{
"topics": [
"llms",
"python",
"rag",
"ai",
"cursor",
"cursor-ai",
"cursorai",
"roocode",
"task-manager",
"tasks",
"tasks-list",
"windsurf",
"windsurf-ai",
"agents",
"ai",
"genai",
"llm",
"llms",
"openai",
"tutorials",
"android",
"css",
"desktop",
"html",
"neovim",
"tmux",
"agents",
"ai",
"context-window",
"framework",
"llms",
"linux",
"game-development"
],
"topN": 5,
"includeRelated": true,
"distance_threshold": 0.25,
"includeClusterSizes": true,
"batchSize": 128
}'
Response:
{
"topKeywords": ["llms", "ai", "cursor", "tasks", "windsurf"],
"related": {
"llms": ["llm"],
"ai": [],
"cursor": ["cursor-ai"],
"tasks": ["tasks-list"],
"windsurf": ["windsurf-ai"]
},
"clusterSizes": {
"llms": 2,
"ai": 1,
"cursor": 2,
"tasks": 2,
"windsurf": 2
}
}
OR typescript interface:
interface KeywordAnalysisResponse {
topKeywords: string[];
related: {
[key: string]: string[];
};
clusterSizes: {
[key: string]: number;
};
}
- Star history growth and trend analysis.
- Repository scoring and ranking based on growth and activity.
- After the server is deployed, there is a short wait time for the transformer model to be downloaded and loaded into memory. This is normal and expected.
MIT License