@@ -6558,11 +6558,15 @@ def conditional_join(
6558
6558
and non-equi joins.
6559
6559
6560
6560
If the join is solely on equality, `pd.merge` function
6561
- is more efficient and should be used instead.
6561
+ is more efficient and should be used instead. Infact,
6562
+ for multiple conditions where equality is involved,
6563
+ a `pd.merge`, followed by filter(via `query` or `loc`)
6564
+ is more efficient. This is even more evident when joining
6565
+ on strings.
6562
6566
If you are interested in nearest joins, or rolling joins,
6563
6567
`pd.merge_asof` covers that. There is also the IntervalIndex,
6564
- which can be more efficient for range joins, if the intervals
6565
- do not overlap.
6568
+ which can be more efficient for range joins, especially if
6569
+ the intervals do not overlap.
6566
6570
6567
6571
This function returns rows, if any, where values from `df` meet the
6568
6572
condition(s) for values from `right`. The conditions are passed in
@@ -6573,11 +6577,8 @@ def conditional_join(
6573
6577
6574
6578
The operator can be any of `==`, `!=`, `<=`, `<`, `>=`, `>`.
6575
6579
6576
- If the join operator is a non-equi operator, a binary search is used
6577
- to get the relevant rows; this avoids a cartesian join, and makes the
6578
- process less memory intensive. If it is an equality operator, it simply
6579
- uses pandas' `merge` or `get_indexer_for` method to retrieve the relevant
6580
- rows.
6580
+ A binary search is used to get the relevant rows; this avoids
6581
+ a cartesian join, and makes the process less memory intensive.
6581
6582
6582
6583
The join is done only on the columns.
6583
6584
MultiIndex columns are not supported.
@@ -6617,7 +6618,7 @@ def conditional_join(
6617
6618
Join on equi and non-equi operators is possible::
6618
6619
6619
6620
df1.conditional_join(
6620
- right = df2,
6621
+ df2,
6621
6622
('id', 'id', '=='),
6622
6623
('value_1', 'value_2A', '>='),
6623
6624
('value_1', 'value_2B', '<='),
@@ -6634,7 +6635,7 @@ def conditional_join(
6634
6635
The default join is `inner`. left and right joins are supported as well::
6635
6636
6636
6637
df1.conditional_join(
6637
- right = df2,
6638
+ df2,
6638
6639
('id', 'id', '=='),
6639
6640
('value_1', 'value_2A', '>='),
6640
6641
('value_1', 'value_2B', '<='),
@@ -6653,7 +6654,7 @@ def conditional_join(
6653
6654
6654
6655
6655
6656
df1.conditional_join(
6656
- right = df2,
6657
+ df2,
6657
6658
('id', 'id', '=='),
6658
6659
('value_1', 'value_2A', '>='),
6659
6660
('value_1', 'value_2B', '<='),
@@ -6675,7 +6676,7 @@ def conditional_join(
6675
6676
Join on just the non-equi joins is also possible::
6676
6677
6677
6678
df1.conditional_join(
6678
- right = df2,
6679
+ df2,
6679
6680
('value_1', 'value_2A', '>'),
6680
6681
('value_1', 'value_2B', '<'),
6681
6682
how='inner',
@@ -6695,7 +6696,7 @@ def conditional_join(
6695
6696
relevant dataframe::
6696
6697
6697
6698
df1.conditional_join(
6698
- right = df2,
6699
+ df2,
6699
6700
('value_1', 'value_2A', '>'),
6700
6701
('value_1', 'value_2B', '<'),
6701
6702
how='inner',
@@ -6714,7 +6715,7 @@ def conditional_join(
6714
6715
Pandas merge/join is more efficient::
6715
6716
6716
6717
df1.conditional_join(
6717
- right = df2,
6718
+ df2,
6718
6719
('col_a', 'col_a', '=='),
6719
6720
sort_by_appearance = True
6720
6721
)
@@ -6726,7 +6727,7 @@ def conditional_join(
6726
6727
Join on not equal -> ``!=`` ::
6727
6728
6728
6729
df1.conditional_join(
6729
- right = df2,
6730
+ df2,
6730
6731
('col_a', 'col_a', '!='),
6731
6732
sort_by_appearance = True
6732
6733
)
@@ -6746,7 +6747,7 @@ def conditional_join(
6746
6747
(this is the default)::
6747
6748
6748
6749
df1.conditional_join(
6749
- right = df2,
6750
+ df2,
6750
6751
('col_a', 'col_a', '>'),
6751
6752
sort_by_appearance = False
6752
6753
)
@@ -6768,6 +6769,11 @@ def conditional_join(
6768
6769
.. note:: All the columns from `df` and `right`
6769
6770
are returned in the final output.
6770
6771
6772
+ .. note:: For multiple condtions, If there are nulls
6773
+ in the join columns, they will not be
6774
+ preserved for `!=` operator. Nulls are only
6775
+ preserved for `!=` operator for single condition.
6776
+
6771
6777
Functional usage syntax:
6772
6778
6773
6779
.. code-block:: python
@@ -6779,8 +6785,8 @@ def conditional_join(
6779
6785
right = pd.DataFrame(...)
6780
6786
6781
6787
df = jn.conditional_join(
6782
- df = df ,
6783
- right = right ,
6788
+ df,
6789
+ right,
6784
6790
*conditions,
6785
6791
sort_by_appearance = True/False,
6786
6792
suffixes = ("_x", "_y"),
@@ -6791,7 +6797,7 @@ def conditional_join(
6791
6797
.. code-block:: python
6792
6798
6793
6799
df = df.conditional_join(
6794
- right = right ,
6800
+ right,
6795
6801
*conditions,
6796
6802
sort_by_appearance = True/False,
6797
6803
suffixes = ("_x", "_y"),
@@ -6821,12 +6827,7 @@ def conditional_join(
6821
6827
At least one of the values must not be ``None``.
6822
6828
:returns: A pandas DataFrame of the two merged Pandas objects.
6823
6829
:raises ValueError: if columns from `df` or `right` is a MultiIndex.
6824
- :raises ValueError: if `right` is an unnamed Series.
6825
6830
:raises ValueError: if condition in *conditions is not a tuple.
6826
- :raises ValueError: if condition is not length 3.
6827
- :raises ValueError: if `left_on` and `right_on` in condition are not
6828
- both numeric, or string, or datetime.
6829
-
6830
6831
6831
6832
.. # noqa: DAR402
6832
6833
"""
0 commit comments