pyspark exception handling database operations : Forums : PythonAnywhere

We are replacing our legacy ETL tool with PySpark code. Our ETL program fetches rows from source databases(Oracle) and then inserts the final transformed dataset to Oracle database. We are planning to use dataframes and temporary tables in spark for ETL processing . When we write the final output to Oracle table we want to log the bad records to a text file if we hit any database exception and continue processing the other database records. For example we will hold 10 rows in our spark temporary table and try to insert them to final oracle table . if few rows fail due to any database constraint violation we need to write these failed rows to a text file and continue to process other rows. Please let me know how this can be achieved.

deleted-user-4258906 | 1 post | July 29, 2018, 1:43 p.m. | permalink