SQL Query to PySpark / Python : Forums : PythonAnywhere

Hello Community,

I have coded the following logic into SQL as follows:

Join very_large_dataframe to small_product_dimension_dataframe on column [B] Only join records to small_product_dimension_dataframe where O is greater then 10 Keep only Column [P]

SELECT
  small_product_dimension_dataframe.P
FROM dbo.small_product_dimension_dataframe
INNER JOIN dbo.very_large_dataframe
  ON small_product_dimension_dataframe.B = very_large_dataframe.B
WHERE small_product_dimension_dataframe.O > 10

I would like help with the equivalent code in PySpark.

I have made a start withn the following:

df = very_large_dataframe.join(small_product_dimension_dataframe,
                                                        (very_large_dataframe.B == small_product_dimension_dataframe.B))

I would like help amending the pyspark to include col P and WHERE small_product_dimension_dataframe.O > 10

cpatte7372 | 3 posts | Jan. 14, 2022, 8:49 p.m. | permalink