================= Starting AXS ================= After you have :doc:`installed AXS `, you have three options how to run AXS: - using `pyspark` shell - from a Python program - from a Jupyter notebook All three options are determined by different ways of starting a Spark session, which is required for running AXS. And in all three options you have to decide which Spark "master" (or the type of Spark cluster) to use. The easiest way is to run it locally, using `local[n]` "master" (where `n` is the number of tasks you wish to run in parallel). Other ways are to start a `Standalone Spark cluster `__ and connect to it, or connect to an existing `YARN `__ or `Mesos `__ cluster. There is also an option to run Spark in `Kubernetes `__. Starting AXS using pyspark shell ================================ Start `pyspark` specifying `master` and other options as you see fit. For example: .. code-block:: bash pyspark --master=local[10] --driver-memory=64g Then, inside the shell, create an instance of AXS catalog, providing the `SparkSession` instance already initialized in the shell: .. code-block:: python from axs import AxsCatalog db = AxsCatalog(spark) Starting AXS using a Python program =================================== To use AXS from a Python program, first set the `SPARK_HOME` environment variable to the directory where you installed AXS' version of Spark. For example: .. code-block:: python export SPARK_HOME=/opt/axs-dist To connect to a Spark Standalone cluster, first start it from the AXS installation directory: .. code-block:: bash sbin/start-all.sh Then, after you start an ordinary Python shell, you will first need to create an instance of `SparkSession` and connect to the Spark cluster. For example: .. code-block:: python spark = SparkSession.builder.\ master("spark://localhost:7078").\ config("spark.cores.max", "10"). \ config("spark.executor.cores", "1").\ config("spark.driver.memory", "12g"). \ config("spark.executor.memory", "12g"). \ appName("AXS application").getOrCreate() from axs import AxsCatalog db = AxsCatalog(spark) In this example the Spark session is pointing to a Spark standalone master listening at port `7078` at `localhost`. Of course, you can also specify the master to be `local[10]`, as in the previous example. Starting AXS from a Jupyter notebook ==================================== To correctly setup a Jupyter environment, it is best to create a new Jupyter kernel and then edit the corresponding `kernel.json` file. You can get a list of current kernels using the following command: .. code-block:: bash jupyter kernelspec list Then edit the `kernel.json` file in the dispalyed folder (e.g. `/usr/local/share/jupyter/kernels/mypythonkernel/kernel.json`) and change the variables in the `env` section. Here as a working example when AXS in installed in `/opt/axs-dist`: .. code-block:: json { "argv": [ "/opt/anaconda/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ], "display_name": "AXS", "language": "python", "env": { "SPARK_HOME": "/opt/axs-dist", "PYLIB": "/opt/axs-dist/python/lib:/opt/spark-axs/python/lib/py4j-0.10.7-src.zip", "PYTHONPATH": "/opt/axs-dist/python:/opt/anaconda/lib/python36.zip:/opt/anaconda/lib/python3.6:/opt/anaconda/lib/python3.6/lib-dynload:/opt/anaconda/lib/python3.6 /site-packages:/opt/anaconda/lib/python3.6/site-packages/IPython/extensions", "JUPYTERPATH": "/opt/axs-dist/python:/epyc/opt/anaconda/lib/python36.zip:/opt/anaconda/lib/python3.6:/opt/anaconda/lib/python3.6/lib-dynload:/opt/anaconda/lib/python3. 6/site-packages:/opt/anaconda/lib/python3.6/site-packages/IPython/extensions", "PATH": "/opt/axs-dist/python/lib/py4j-0.10.7-src.zip:/opt/spark-axs/bin:/epyc/opt/jdk1.8.0_161/bin:/opt/anaconda/bin:/usr/lib64/qt-3.3/bin:/bin:/usr/local/sbin:/usr/local/ bin:/usr/sbin:/usr/bin:/var/lib/jupyterhub/miniconda/bin", "JAVA_HOME": "/opt/jdk1.8.0_161" } } Then start a new notebook with the kernel you just edited. After that, you can start a Spark session and create an instance of `AxsCatalog` just as you did in the Python case above.