So, we’ll stick to Pyspark in this guide. You can also use Spark with R and Scala, among others, but I have no experience with how to set that up. is a bit of a hassle to just learn the basics though (although Amazon EMR or Databricks make that quite easy, and you can even build your own Raspberry Pi cluster if you want…), so getting Spark and Pyspark running on your local machine seems like a better idea. Setting up your own cluster, administering it etc. Whether it’s for social science, marketing, business intelligence or something else, the number of times data analysis benefits from heavy duty parallelization is growing all the time.Īpache Spark is an awesome platform for big data analysis, so getting to know how it works and how to use it is probably a good idea.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |