-
Notifications
You must be signed in to change notification settings - Fork 47
Home
Wiki Documentation for https://github.com/cafferychen777/mLLMCelltype
Generated on: 2025-05-03 21:34:20
- Introduction to mLLMCelltype
- Installation and Setup
- Core Functionality and Usage
- Customization and Advanced Features
- Understanding Uncertainty Metrics
- Troubleshooting and FAQ
README.md
assets/mLLMCelltype_logo.png
Related topics: Installation and Setup, Core Functionality and Usage
mLLMCelltype is a tool designed to predict cell types from gene expression data using multi-modal Large Language Models (mLLMs). It leverages the power of LLMs to integrate different data modalities, such as gene expression and cell annotations, to improve cell type classification accuracy.
The primary purpose of mLLMCelltype is to provide a more accurate and versatile method for cell type identification compared to traditional machine learning approaches. By utilizing mLLMs, it can capture complex relationships between genes and cell types, leading to improved performance, especially when dealing with noisy or incomplete data. The tool takes gene expression data as input and outputs predicted cell types, along with confidence scores.
The repository contains the following key elements:
- README.md: Provides an overview of the project, including its purpose, usage instructions, and contributors.
- assets/mLLMCelltype_logo.png: Contains the logo for the mLLMCelltype project.
The architecture of mLLMCelltype involves several stages:
- Data Input: Accepts gene expression data (e.g., scRNA-seq data) and optionally, existing cell annotations.
- Feature Extraction: Extracts relevant features from the gene expression data.
- mLLM Integration: Feeds the extracted features into a pre-trained mLLM.
- Cell Type Prediction: The mLLM predicts the cell type based on the input features.
- Output: Provides the predicted cell type and associated confidence scores.
graph TD
A[Gene Expression Data] --> B(Feature Extraction)
B --> C(mLLM Integration)
C --> D{Cell Type Prediction}
D --> E[Predicted Cell Type and Confidence]
Detailed setup and usage instructions are typically found in the README.md
file. Here's a general outline:
-
Installation:
- Clone the repository:
git clone https://github.com/cafferychen777/mLLMCelltype.git
- Install the required dependencies (specified in
requirements.txt
or similar):pip install -r requirements.txt
- Clone the repository:
-
Data Preparation:
- Format your gene expression data into a compatible format (e.g., a CSV file where rows are cells and columns are genes).
-
Configuration:
- Configure the mLLM settings (e.g., model name, API key).
-
Execution:
- Run the main script with the appropriate parameters:
python main.py --data_path data.csv --model_name my_mLLM
- Run the main script with the appropriate parameters:
-
Output Interpretation:
- Analyze the output file containing the predicted cell types and confidence scores.
While specific code examples would be found in the project's scripts, here's a conceptual example of how the mLLM might be used for cell type prediction:
# Conceptual example (replace with actual implementation)
import mLLM
def predict_cell_type(gene_expression_data, model_name="default_mLLM"):
"""
Predicts cell type based on gene expression data using an mLLM.
Args:
gene_expression_data (dict): A dictionary of gene names and expression values.
model_name (str): The name of the mLLM to use.
Returns:
str: The predicted cell type.
"""
model = mLLM.load_model(model_name)
prediction = model.predict(gene_expression_data)
return prediction
# Example usage
data = {"geneA": 2.5, "geneB": 1.0, "geneC": 3.2}
cell_type = predict_cell_type(data)
print(f"Predicted cell type: {cell_type}")
The following diagram illustrates the relationships between the key components of the mLLMCelltype system:
graph TD
A[User Input: Gene Expression Data] --> B(Data Preprocessing)
B --> C{Feature Selection}
C --> D[mLLM Model]
D --> E{Cell Type Prediction}
E --> F[Output: Predicted Cell Types]
README.md
R/DESCRIPTION
python/setup.py
python/requirements.txt
Related topics: Core Functionality and Usage
This page provides instructions for installing and setting up the mLLMCelltype repository.
The mLLMCelltype repository aims to predict cell types using multi-modal Large Language Models (mLLMs). The setup involves installing both R and Python dependencies, along with configuring the necessary environment.
-
README.md
: Provides a high-level overview of the project, including its purpose, usage instructions, and relevant links. It serves as the entry point for understanding the project. -
R/DESCRIPTION
: An R package description file containing metadata about the R package, such as its name, version, dependencies, and description. -
python/setup.py
: A Python setup script used to build and install the Python package. It specifies the package's dependencies and other installation-related information. -
python/requirements.txt
: A text file listing the Python packages required to run the Python components of the project. This file is used bypip
to install the necessary dependencies.
Clone the mLLMCelltype repository to your local machine:
git clone https://github.com/cafferychen777/mLLMCelltype.git
cd mLLMCelltype
Navigate to the R
directory and install the required R packages.
cd R
R
In the R console, run:
install.packages("remotes")
remotes::install_deps(dependencies = TRUE)
This will install the dependencies specified in the DESCRIPTION
file. The DESCRIPTION
file contains the following information (example):
Package: mLLMCelltype
Title: Multi-Modal Large Language Model for Cell Type Prediction
Version: 0.1.0
Description: An R package to integrate with Python-based mLLMs for cell type prediction.
Authors@R: person("Caffrey", "Chen", email = "[email protected]", role = c("aut", "cre"))
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
R (>= 3.5.0)
Imports:
Seurat,
SingleR,
tidyverse
Suggests:
knitr,
rmarkdown
Navigate to the python
directory and create a virtual environment (recommended).
cd ../python
python3 -m venv venv
source venv/bin/activate # On Linux/macOS
# venv\Scripts\activate # On Windows
Install the Python dependencies using pip
:
pip install --upgrade pip
pip install -r requirements.txt
The requirements.txt
file lists the Python packages required for the project. An example requirements.txt
might look like this:
torch
transformers
pandas
scikit-learn
If the Python code is structured as a package, you can install it using setup.py
:
python setup.py install
The setup.py
file is used to build and install the Python package. It contains metadata about the package and its dependencies. An example setup.py
might look like this:
from setuptools import setup, find_packages
setup(
name='mLLMCelltype',
version='0.1.0',
packages=find_packages(),
install_requires=[
'torch',
'transformers',
'pandas',
'scikit-learn'
],
)
The README.md
file should contain any specific environment variables or configuration steps required to run the mLLMCelltype code. For example, it might specify API keys or file paths that need to be set. Follow the instructions in the README.md
to configure your environment.
Here's a diagram illustrating the relationship between the main components:
graph TD
A[R Package] --> B(Python Package);
B --> C{mLLM Models};
A --> D[Seurat Object];
D --> B;
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#ccf,stroke:#333,stroke-width:2px
style C fill:#ccf,stroke:#333,stroke-width:2px
style D fill:#f9f,stroke:#333,stroke-width:2px
Refer to the README.md
file for detailed usage instructions and examples. The basic workflow involves loading data in R (Seurat object), passing it to the Python module, running the mLLM model, and then retrieving the results back into R.
R/R/cell_type_annotation.R
R/R/consensus_annotation.R
python/mllmcelltype/annotate.py
python/mllmcelltype/consensus.py
python/examples/consensus_example.py
Related topics: Customization and Advanced Features, Understanding Uncertainty Metrics
This page details the core functionality of the mLLMCelltype
repository, focusing on cell type annotation and consensus-building across different modalities.
The mLLMCelltype
repository provides tools for automated cell type annotation using multi-modal Large Language Models (mLLMs). It encompasses both R and Python implementations for annotating cell types based on gene expression data and building consensus annotations from multiple sources. The core functionalities are implemented in the following files:
-
R/R/cell_type_annotation.R
: R implementation for cell type annotation. -
R/R/consensus_annotation.R
: R implementation for building consensus annotations. -
python/mllmcelltype/annotate.py
: Python implementation for cell type annotation. -
python/mllmcelltype/consensus.py
: Python implementation for building consensus annotations. -
python/examples/consensus_example.py
: Example script demonstrating how to use the consensus annotation functionality in Python.
The overall architecture involves annotating cell types using individual methods and then combining these annotations into a consensus annotation. The R and Python implementations provide similar functionalities but cater to different user preferences and integration needs.
graph TD
A[Expression Data] --> B(Cell Type Annotation - R);
A --> C(Cell Type Annotation - Python);
B --> D(Consensus Annotation - R);
C --> E(Consensus Annotation - Python);
D --> F[Final Cell Type Assignments];
E --> F;
This R script likely contains functions to perform cell type annotation based on gene expression data. While the exact implementation details are not available without access to the code, it would typically involve:
- Data Input: Reading gene expression data (e.g., from a Seurat object or a matrix).
- Feature Selection: Identifying marker genes or features relevant for cell type identification.
- Annotation: Using a pre-trained model or a reference dataset to assign cell types to individual cells.
- Output: Returning a data frame or a vector containing cell type assignments.
The Python implementation mirrors the functionality of the R script but is implemented in Python. It likely uses libraries such as scanpy
, anndata
, or pandas
for data manipulation and machine learning libraries for annotation.
# python/mllmcelltype/annotate.py (Example - may not be exact)
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
def annotate_cell_types(expression_data: pd.DataFrame, model: RandomForestClassifier) -> pd.Series:
"""
Annotates cell types based on gene expression data using a pre-trained model.
Args:
expression_data: Gene expression data (rows are cells, columns are genes).
model: A pre-trained RandomForestClassifier model.
Returns:
A pandas Series containing cell type assignments for each cell.
"""
predictions = model.predict(expression_data)
return pd.Series(predictions, index=expression_data.index)
This R script focuses on combining multiple cell type annotations into a single, more robust consensus annotation. This is useful when annotations are obtained from different methods or datasets. A typical implementation would involve:
- Input: Taking multiple cell type annotation vectors or data frames as input.
- Normalization/Mapping: Mapping cell type names across different annotation sources to a common vocabulary.
- Consensus Building: Using a voting scheme or a more sophisticated algorithm to determine the consensus cell type for each cell.
- Output: Returning a vector or data frame containing the consensus cell type assignments.
The Python implementation provides similar consensus-building functionality as the R script.
# python/mllmcelltype/consensus.py (Example - may not be exact)
import pandas as pd
from collections import Counter
def build_consensus(annotations: list[pd.Series]) -> pd.Series:
"""
Builds a consensus cell type annotation from multiple input annotations.
Args:
annotations: A list of pandas Series, where each Series contains cell type annotations.
Returns:
A pandas Series containing the consensus cell type assignments.
"""
consensus_annotations = {}
for cell_id in annotations[0].index:
cell_annotations = [anno[cell_id] for anno in annotations]
most_common = Counter(cell_annotations).most_common(1)[0][0]
consensus_annotations[cell_id] = most_common
return pd.Series(consensus_annotations)
This script demonstrates how to use the build_consensus
function in python/mllmcelltype/consensus.py
. It likely involves:
- Generating or loading example cell type annotations from different methods.
- Calling the
build_consensus
function with these annotations. - Printing or saving the resulting consensus annotation.
# python/examples/consensus_example.py
import pandas as pd
from mllmcelltype.consensus import build_consensus
# Example annotations
annotation1 = pd.Series({"cell1": "T cell", "cell2": "B cell", "cell3": "T cell"})
annotation2 = pd.Series({"cell1": "T cell", "cell2": "B cell", "cell3": "NK cell"})
annotation3 = pd.Series({"cell1": "T cell", "cell2": "B cell", "cell3": "T cell"})
annotations = [annotation1, annotation2, annotation3]
# Build consensus annotation
consensus_annotation = build_consensus(annotations)
print(consensus_annotation)
The data flow within the consensus annotation process can be visualized as follows:
sequenceDiagram
participant User
participant Annotation1
participant Annotation2
participant ConsensusBuilder
User->>Annotation1: Provide Annotation Data
User->>Annotation2: Provide Annotation Data
User->>ConsensusBuilder: Call build_consensus([Annotation1, Annotation2])
ConsensusBuilder->>ConsensusBuilder: Iterate through cells
ConsensusBuilder->>ConsensusBuilder: Determine most frequent cell type
ConsensusBuilder-->>User: Return Consensus Annotation
-
Install the package: Assuming the package is structured correctly, you would install it using pip:
pip install mllmcelltype
-
Use the annotation and consensus functions:
from mllmcelltype.annotate import annotate_cell_types # if available from mllmcelltype.consensus import build_consensus import pandas as pd # Example usage (adjust based on actual function signatures) # Assuming you have expression_data and a pre-trained model # cell_type_predictions = annotate_cell_types(expression_data, model) # Example usage for consensus annotation1 = pd.Series({"cell1": "T cell", "cell2": "B cell"}) annotation2 = pd.Series({"cell1": "T cell", "cell2": "B cell"}) annotations = [annotation1, annotation2] consensus = build_consensus(annotations) print(consensus)
-
Install the package: Assuming the package is structured correctly, you would install it using
devtools
:# Install devtools if you don't have it # install.packages("devtools") devtools::install_github("cafferychen777/mLLMCelltype") # Or install from local directory
-
Use the annotation and consensus functions:
library(mLLMCelltype) # Example usage (adjust based on actual function signatures) # Assuming you have expression_data and a pre-trained model # cell_type_predictions <- annotate_cell_types(expression_data, model) # Example usage for consensus annotation1 <- c("T cell", "B cell") annotation2 <- c("T cell", "B cell") annotations <- list(annotation1, annotation2) consensus <- build_consensus(annotations) print(consensus)
R/R/prompt_templates.R
R/R/custom_model_manager.R
python/mllmcelltype/prompts.py
python/mllmcelltype/providers/__init__.py
Related topics: Core Functionality and Usage
This page details customization options and advanced features within the mLLMCelltype
repository, focusing on prompt engineering, custom model integration, and provider management.
Prompt engineering is crucial for guiding the LLMs to produce accurate cell type predictions. The repository provides mechanisms for customizing prompts in both R and Python.
This file likely contains R functions or data structures to define and manage prompt templates used within the R-based components of mLLMCelltype
.
Purpose:
The prompt_templates.R
file allows users to modify the prompts sent to the LLM. This is essential for adapting the model to specific datasets, improving accuracy, or experimenting with different prompting strategies.
Functionality:
The file likely contains functions to:
- Load default prompt templates.
- Modify existing templates.
- Create new templates.
- Apply templates to data.
Example (Hypothetical):
# R/R/prompt_templates.R
# Function to load a prompt template
load_prompt_template <- function(template_name) {
# Example: Load a template from a file
template_path <- file.path("path/to/templates", paste0(template_name, ".txt"))
if (file.exists(template_path)) {
readChar(template_path, file.info(template_path)$size)
} else {
stop("Template not found: ", template_name)
}
}
# Function to modify a prompt template
modify_prompt_template <- function(template, new_instruction) {
# Example: Replace a placeholder in the template
gsub("\\{\\{INSTRUCTION\\}\\}", new_instruction, template)
}
# Default prompt
default_prompt <- "Predict cell type based on these markers: \\{\\{MARKERS\\}\\}"
Explanation:
The hypothetical example shows functions to load and modify prompt templates. The load_prompt_template
function reads a template from a file. The modify_prompt_template
function replaces placeholders in the template with user-defined instructions. The default_prompt
variable shows a basic template.
Integration:
This file integrates with other R scripts that use LLMs for cell type prediction. The R scripts would call functions from prompt_templates.R
to retrieve and customize prompts before sending them to the LLM.
This file serves a similar purpose to R/R/prompt_templates.R
, but for the Python-based components.
Purpose:
The prompts.py
file provides a way to customize the prompts used by the Python components of mLLMCelltype
. This allows for fine-tuning the model's behavior and adapting it to different datasets or experimental setups.
Functionality:
The file likely contains:
- Default prompt templates (as strings or functions).
- Functions to load, modify, and manage prompts.
- Classes to represent prompt templates.
Example:
# python/mllmcelltype/prompts.py
class PromptTemplate:
def __init__(self, template_string):
self.template = template_string
def format(self, **kwargs):
return self.template.format(**kwargs)
DEFAULT_PROMPT = PromptTemplate("Predict the cell type based on these markers: {markers}")
def create_prompt(markers):
return DEFAULT_PROMPT.format(markers=markers)
Explanation:
The PromptTemplate
class encapsulates a prompt string and provides a format
method to insert variables. DEFAULT_PROMPT
is an instance of PromptTemplate
with a default prompt. The create_prompt
function uses the DEFAULT_PROMPT
and inserts the markers.
Integration:
Python scripts within mLLMCelltype
will import the prompts.py
module and use its functions or classes to generate prompts before interacting with the LLM.
The repository allows users to integrate their own custom LLMs, rather than relying solely on pre-configured options.
This R file likely manages the integration of custom LLMs within the R components.
Purpose:
The custom_model_manager.R
file enables users to define and register their own LLM models for use within the mLLMCelltype
workflow. This is useful when users have access to specialized models or want to experiment with different LLM architectures.
Functionality:
The file likely contains functions to:
- Register a custom model.
- Specify the API endpoint for the model.
- Define the input/output format for the model.
- Handle authentication.
Example (Hypothetical):
# R/R/custom_model_manager.R
# Function to register a custom model
register_custom_model <- function(model_name, api_endpoint, auth_token) {
# Store model information in a configuration file or data structure
model_config <- list(api_endpoint = api_endpoint, auth_token = auth_token)
saveRDS(model_config, file = paste0(model_name, ".rds"))
cat("Custom model registered: ", model_name, "\n")
}
# Function to call a custom model
call_custom_model <- function(model_name, prompt) {
model_config <- readRDS(paste0(model_name, ".rds"))
api_endpoint <- model_config$api_endpoint
auth_token <- model_config$auth_token
# Make API call to the custom model
response <- httr::POST(
api_endpoint,
body = list(prompt = prompt),
add_headers(Authorization = paste("Bearer", auth_token)),
encode = "json"
)
# Extract the prediction from the response
content <- httr::content(response, "text")
# Assuming the response is a JSON string
json_data <- jsonlite::fromJSON(content)
prediction <- json_data$prediction
return(prediction)
}
Explanation:
The register_custom_model
function stores the API endpoint and authentication token for a custom model. The call_custom_model
function retrieves this information and makes an API call to the custom model, extracting the prediction from the response.
Integration:
Other R scripts will use functions from this file to register and call custom models, allowing them to leverage different LLMs for cell type prediction.
The repository uses the concept of "providers" to abstract the underlying LLM APIs. This allows the system to support multiple LLMs (e.g., OpenAI, Cohere) without requiring significant code changes.
This file initializes the providers
package in Python and likely defines the base classes or interfaces for different LLM providers.
Purpose:
The __init__.py
file in the providers
directory sets up the provider system, making it easy to add and manage different LLM providers.
Functionality:
The file likely:
- Defines an abstract base class for providers.
- Imports specific provider implementations (e.g., OpenAI, Cohere).
- Provides a mechanism to select a provider.
Example:
# python/mllmcelltype/providers/__init__.py
from abc import ABC, abstractmethod
class BaseProvider(ABC):
@abstractmethod
def generate_text(self, prompt):
pass
from .openai_provider import OpenAIProvider
from .cohere_provider import CohereProvider
PROVIDER_MAP = {
"openai": OpenAIProvider,
"cohere": CohereProvider,
}
def get_provider(provider_name, **kwargs):
provider_class = PROVIDER_MAP.get(provider_name)
if not provider_class:
raise ValueError(f"Unknown provider: {provider_name}")
return provider_class(**kwargs)
Explanation:
The BaseProvider
class defines the interface for all providers, requiring a generate_text
method. The file imports OpenAIProvider
and CohereProvider
(which are assumed to be in separate files within the providers
directory). The PROVIDER_MAP
dictionary maps provider names to their classes. The get_provider
function returns an instance of the specified provider.
Integration:
Other Python scripts will use the get_provider
function to obtain an instance of the desired LLM provider and then call the generate_text
method to interact with the LLM.
graph TD
A[User Code] --> B{Provider Selection};
B --> C[OpenAIProvider];
B --> D[CohereProvider];
C --> E((OpenAI API));
D --> F((Cohere API));
E --> G[LLM Response];
F --> G;
G --> A;
Explanation of the Diagram:
- User Code: Represents the Python scripts that use the LLM functionality.
-
Provider Selection: The
get_provider
function in__init__.py
handles the selection of the appropriate provider based on user configuration. - OpenAIProvider/CohereProvider: Specific provider implementations that handle communication with the respective LLM APIs.
- OpenAI API/Cohere API: External LLM APIs.
- LLM Response: The response from the LLM.
R/R/check_consensus.R
R/R/print_consensus_summary.R
python/mllmcelltype/compare.py
images/mLLMCelltype_visualization.png
Related topics: Core Functionality and Usage
This page details the uncertainty metrics used in the mLLMCelltype
repository, focusing on consensus checking and comparison of cell type predictions. We will examine the R and Python code involved, their functionalities, and how they contribute to the overall architecture.
The primary goal is to quantify the uncertainty associated with cell type predictions generated by different methods or models. This involves assessing the agreement (consensus) among predictions and comparing them to known ground truth or reference datasets.
This R script focuses on evaluating the consensus among different cell type annotations for the same cells. It likely implements functions to calculate metrics such as:
- Agreement rate: The percentage of cells where all annotations agree.
- Pairwise agreement: The average agreement between all pairs of annotations.
- Entropy-based metrics: Quantifying the diversity of annotations for each cell.
Example (Hypothetical):
# R/R/check_consensus.R
# Example function to calculate agreement rate
check_consensus <- function(annotations) {
# annotations is a matrix where rows are cells and columns are annotations
agreement <- apply(annotations, 1, function(x) length(unique(x)) == 1)
agreement_rate <- mean(agreement)
return(agreement_rate)
}
This script generates a summary report of the consensus analysis. It takes the results from check_consensus.R
and presents them in a user-friendly format, possibly including:
- Tables summarizing agreement metrics.
- Histograms visualizing the distribution of agreement scores.
- Scatter plots comparing different annotation methods.
Example (Hypothetical):
# R/R/print_consensus_summary.R
# Example function to generate a summary table
print_consensus_summary <- function(consensus_results) {
# consensus_results is a list containing the results from check_consensus.R
summary_table <- data.frame(
Metric = c("Agreement Rate", "Pairwise Agreement"),
Value = c(consensus_results$agreement_rate, consensus_results$pairwise_agreement)
)
print(summary_table)
}
This Python script compares cell type predictions with a known ground truth or reference dataset. It likely implements functions to calculate metrics such as:
- Accuracy: The percentage of cells correctly classified.
- Precision, Recall, F1-score: Metrics for each cell type, evaluating the ability to correctly identify cells of that type.
- Confusion matrix: A table showing the number of cells of each true type that were predicted as each predicted type.
Example (Hypothetical):
# python/mllmcelltype/compare.py
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix
def compare_with_ground_truth(predictions, ground_truth):
"""
Compares cell type predictions with ground truth labels.
Args:
predictions (np.ndarray): Predicted cell type labels.
ground_truth (np.ndarray): Ground truth cell type labels.
Returns:
dict: A dictionary containing comparison metrics.
"""
accuracy = accuracy_score(ground_truth, predictions)
confusion_mat = confusion_matrix(ground_truth, predictions)
return {"accuracy": accuracy, "confusion_matrix": confusion_mat}
This image likely showcases visualizations related to cell type prediction and uncertainty. It could include:
- UMAP or t-SNE plots colored by predicted cell type.
- Heatmaps showing the expression of marker genes for each cell type.
- Visualizations of the consensus analysis, such as agreement rates across different methods.
- Confusion matrix visualization.
The uncertainty surrounding cell type predictions is assessed using components in both R and Python. The R scripts (check_consensus.R
, print_consensus_summary.R
) evaluate the consistency among different annotation methods applied to the same dataset. They calculate consensus metrics and generate summary reports. Concurrently, the Python script (compare.py
) focuses on external validation by comparing a set of predictions against a known ground truth dataset, calculating standard performance metrics like accuracy and F1-score. Visualizations often integrate results from both consensus analysis and ground truth comparison.
graph TD
A[Annotations] --> B(R: check_consensus.R)
B --> C[Consensus Metrics]
C --> D(R: print_consensus_summary.R)
D --> E[Consensus Summary Report]
F[Cell Type Predictions] --> G(Python: compare.py)
H[Ground Truth Cell Types] --> G
G --> I[Comparison Metrics]
E --> J[Visualization]
I --> J
J --> K[Final Visualization of Results]
-
Install R packages: Ensure that necessary R packages are installed (e.g.,
dplyr
,ggplot2
). -
Install Python packages: Ensure that necessary Python packages are installed (e.g.,
scikit-learn
,numpy
). - Prepare input data: The R scripts require a matrix of cell type annotations, where rows are cells and columns are different annotation methods. The Python script requires predicted cell type labels and ground truth labels.
- Run the scripts: Execute the R and Python scripts, providing the appropriate input data.
- Interpret the results: Analyze the consensus metrics and comparison metrics to assess the uncertainty of cell type predictions.
The check_consensus.R
and print_consensus_summary.R
scripts are closely related, with the latter relying on the output of the former. The compare.py
script operates independently, comparing predictions to ground truth. All components contribute to understanding the uncertainty associated with cell type predictions.
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/usage_question.md
This page provides solutions to common issues and answers frequently asked questions related to the mLLMCelltype repository. It focuses on the bug report and usage question issue templates.
The bug_report.md
file (.github/ISSUE_TEMPLATE/bug_report.md) provides a template for users to report bugs they encounter while using the mLLMCelltype tool. This template ensures that bug reports contain all the necessary information for developers to understand, reproduce, and fix the issue.
The template includes sections for:
- Description: A clear and concise description of the bug.
- Steps To Reproduce: Detailed steps to reproduce the bug.
- Expected Behavior: What the user expected to happen.
- Actual Behavior: What actually happened.
- Screenshots: Visual evidence of the bug (if applicable).
- Environment: Information about the user's environment (OS, Python version, etc.).
- Additional Context: Any additional information that might be helpful.
Here's the content of .github/ISSUE_TEMPLATE/bug_report.md
:
---
name: Bug report
about: Create a report to help us improve
title: "[BUG] "
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Environment (please complete the following information):**
- OS: [e.g. iOS]
- Python Version [e.g. 3.8]
- Commit ID [e.g. 8e8e8e8]
**Additional context**
Add any other context about the problem here.
Bug reports are crucial for the maintenance and improvement of the mLLMCelltype tool. They provide developers with direct feedback from users, allowing them to identify and fix issues that might not be apparent during development. The issue template streamlines this process.
When encountering a bug:
- Click on the "Issues" tab in the GitHub repository.
- Click on "New Issue."
- Choose the "Bug report" template.
- Fill out the template with as much detail as possible.
- Submit the issue.
The usage_question.md
file (.github/ISSUE_TEMPLATE/usage_question.md) provides a template for users to ask questions about how to use the mLLMCelltype tool. This template helps users articulate their questions clearly and ensures that developers have enough information to provide helpful answers.
The template includes sections for:
- Question: A clear and concise question about how to use the tool.
- Context: Background information about what the user is trying to achieve.
- Attempts: A description of what the user has already tried.
- Code Snippets: Relevant code snippets that illustrate the user's problem.
Here's the content of .github/ISSUE_TEMPLATE/usage_question.md
:
---
name: Usage Question
about: Ask a question about how to use this project
title: "[USAGE] "
labels: usage
assignees: ''
---
**Question**
A clear and concise question about how to use the project.
**Context**
Provide any background information that might be helpful.
**Attempts**
Describe what you've already tried.
**Code Snippets**
If applicable, provide code snippets to illustrate your problem.
Usage questions help improve the usability and documentation of the mLLMCelltype tool. By answering user questions, developers can identify areas where the tool is unclear or difficult to use, and they can improve the documentation and user interface accordingly. The issue template ensures that these questions are well-structured.
When you have a question about how to use the tool:
- Click on the "Issues" tab in the GitHub repository.
- Click on "New Issue."
- Choose the "Usage question" template.
- Fill out the template with as much detail as possible.
- Submit the issue.
graph TD
A[User Discovers Issue] --> B{Is it a Bug?};
B -- Yes --> C[Create Bug Report Issue];
B -- No --> D{Is it a Usage Question?};
D -- Yes --> E[Create Usage Question Issue];
D -- No --> F[Other Issue Type];
C --> G[Developer Review];
E --> G;
F --> G;
G --> H{Issue Resolved?};
H -- Yes --> I[Close Issue];
H -- No --> J[Further Investigation/Discussion];
J --> G;