Hello i'm trying to understand how Spark wprks and I'm learning PySpark.
I know know Python and the Pandas library.
I understand that if I want to read a big cvs file with Pandas usin dataframe, it may not work (or it will take a long time to read).
As such PySpark is an alternative.
I read some artcicles and I understaoof the first thing to do is to create a SparkContext.
I understant the SparkContext will manage the cluster which will read the csv file and transform datas.
So I hade this code in a juptyter notebook
# Import de SparkContext du module pyspark
from pyspark import SparkContext
sc = SparkContext('local')
sc
if i execute this code twice,t he 2nd time I will get an error because I cant' have 2 spark contexts. Why can't i have 2 sparks contexts?
I wanted to try this:
# Import de SparkContext du module pyspark from pyspark import SparkContext
sc1 = SparkContext('local')
sc2 = SparkContext('local')
I have 2 different names: sc1 and sc2. Een id i execute only one time, I have an error. Why cant' I have 2 sparks context sc1and sc2?
thank you