·

Challenges and Future Directions for AI Spark Big Model

发布时间:2024-07-21 11:12:40阅读量:641
学术文章
·
介绍文
转载请注明来源

Introduction

The rapid evolution of big data technologies and artificial intelligence has radically transformed many aspects of society, businesses, people and the environment, enabling individuals to manage, analyze and gain insights from large volumes of data (Dwivedi et al., 2023). The AI Spark Big Model is one effective technology that has played a critical role in addressing significant data challenges and sophisticated ML operations. For example, the adoption of Apache Spark in various industries has resulted in the growth of a number of unique and diverse Spark applications such as machine learning, processing streaming data and fog computing (Ksolves Team, 2022). As Pointer (2024) stated, in addition to SQL, streaming data, machine learning, and graph processing, Spark has native API support for Java, Scala, Python, and R. These evolutions made the model fast, flexible, and friendly to developers and programmers. Still, the AI Spark Big Model has some challenges: the interpretability of the model, the scalability of the model, the ethical implications, and integration problems. This paper addresses the negative issues linked to the implementation of these models and further explores the  potential future developments that Spark is expected to undergo.

Challenges in the AI Spark Big Model

One critical problem affecting the implementation of the Apache Spark model involves problems with serialization, precisely, the cost of serialization often associated with Apache Spark (Simplilearn, 2024). Serialization and deserialization are necessary in Spark as they help transfer data over the network to the various executors for processing. However, these processes can be expensive, especially when using languages such as Python, which do not serialize data as effectively as Java or Scala. This inefficiency can have a significant effect on the performance of Spark applications. In Spark architecture, applications are partitioned into several segments sent to the executors (Nelamali, 2024). To achieve this, objects need to be serialized for network transfer. If Spark encounters difficulties in serializing objects, it results in the error: org. Apache. Spark. SparkException: Task not serializable. This error can occur in many situations, for example, when some objects used in a Spark task are not serializable or when closures use non-serializable variables (Nelamali, 2024). Solving serialization problems is essential for improving the efficiency and stability of Spark applications and their ability to work with data and execute tasks in distributed systems.

Figure 1: Figure showing the purpose of Serialization and deserialization

The second challenge affecting the implementation of Spark involves the management of memory. According to Simplilearn, 2024, the in-memory capabilities of Spark offer significant performance advantages because data processing is done in memory, but at the same time, they have drawbacks that can negatively affect application performance. Spark applications usually demand a large amount of memory, and poor memory management results in frequent garbage collection pauses or out-of-memory exceptions. Optimizing memory management for big data processing in Spark is not trivial and requires a good understanding of how Spark uses memory and the available configuration parameters (Nelamali, 2024). Among the most frequent and annoying problems is the OutOfMemoryError, which can affect the Spark applications in the cluster environment. This error can happen in any part of Spark execution but is more common in the driver and executor nodes. The driver, which is in charge of coordinating the execution of tasks, and the executors, which are in charge of the data processing, both require a proper distribution of memory to avoid failures (Simplilearn, 2024). Memory management is a critical aspect of the Spark application since it affects the stability and performance of the application and, therefore, requires a proper strategy for allocating and managing resources within the cluster.

The use of Apache Spark is also greatly affected by the challenges of managing large clusters. When data volumes and cluster sizes increase, the problem of cluster management and maintenance becomes critical. Identifying and isolating job failures or performance issues in large distributed systems can be challenging (Nelamali, 2024). One of the problems that can be encountered is when working with large data sets; actions sometimes produce errors if the total size of the results exceeds the value of Spark Driver Max Result Size set by Spark. Driver. maxResultSize. When this threshold is surpassed, it triggers the error: org. Apache. Spark. SparkException: Job aborted due to stage failure: The total size of serialized results of z tasks (x MB) is more significant than Spark Driver maxResultSize (y MB) (Nelamali, 2024). These errors highlight the challenges of managing big data processing in Spark, where complex solutions for cluster management, resource allocation, and error control are needed to support large-scale computations.

Figure 2: The Apache Spark Architecture

Another critical issue that has an impact on the Apache Spark deployment is the Small Files Problem. Spark could be more efficient when dealing with many small files because each task is considered separate, and the overhead can consume most of the job's time. This inefficiency makes Spark less preferable for use cases that involve many small log files or similar data sets. Moreover, Spark also depends on the Hadoop ecosystem for file handling (HDFS) and resource allocation (YARN), which adds more complexity and overhead. Nelamali, 2024 argues that although Spark can operate in standalone mode, integrating Hadoop components usually improves Spark's performance.

The implementation of Apache Spark is also affected by iterative algorithms as there is a problem of support for complex analysis. However, due to the system's architecture being based on in-memory processing, in theory, Spark should be well-suited for iterative algorithms. However, it can be noticed that it can be inefficient sometimes (Sewal & Singh, 2021). This inefficiency is because Spark uses resilient distributed datasets (RDDs) and requires users to cache intermediate data in case it is used for subsequent computation. After each iteration, there is data writing and reading, which performs operations in memory, thus noting higher times of execution and resources requested and consumed, which affects the expected boost in performance. Like Spark, which has MLlib for extensive data machine learning, some libraries may not be as extensive or deep as those in the dedicated machine learning platforms (Nguyen et al., 2019). Some users may be dissatisfied with Spark’s provision since MLlib may present basic algorithms, hyper-parameter optimization, and compatibility with other extensive ML frameworks. This restriction tends to make Spark less suitable for more elaborate analytical work, and a person may have to resort to the use of other tools as well as systems to obtain a certain result.

The Future of Spark

a. Enhanced Machine Learning (ML)

Since ML assumes greater importance in analyzing BD, Spark’s MLlib is updated frequently to manage the increasing complexity of ML procedures (Elshawi et al., 2018). This evolution is based on enhancing the number of the offered algorithms and tools that would refine performance, functionality, and flexibility. Future enhancements is more likely to introduce deeper learning interfaces that can be directly integrated into the Spark platform while implementing more neural structures in the network. Integration of TensorFlow and PyTorch, along with the optimized library for GPU, will be helpful in terms of time and computational complexity required for training and inference associated with high dimensional data and large-scale machine learning problems. Also, the focus will be on simplifying the user interface through better APIs, AutoML capabilities, and more user-friendly interfaces for model optimization and testing (Simplilearn, 2024). These advancements will benefit data scientists and engineers who deal with big data and help democratize ML by providing easy ways to deploy and manage ML pipelines in distributed systems. Better support for real-time analysis and online education will also help organizations gain real-time insights, thus improving decision-making.

b. Improved Performance and Efficiency

Apache Spark's core engine is continuously improving to make it faster and more efficient as it continues to be one of the most popular technologies in the ample data space. Some of the areas of interest are memory management and other higher levels of optimization, which minimize the overhead of computation and utilization of resources (Simplilearn, 2024). Memory management optimization will reduce the time taken for garbage collection and enhance the management of in-memory data processing, which is vital for high throughput and low latency in big data processing. Also, improvements in the Catalyst query optimizer and Tungsten execution engine will allow for better execution of complicated queries and data transformations. These enhancements will be beneficial in cases where large amounts of data are shuffled and aggregated, often leading to performance issues. Future attempts to enhance support for contemporary hardware, like faster storage devices such as NVMe and improvements in CPU and GPU, will only increase Spark's capacity to process even more data faster (Armbrust et al., 2015). Moreover, future work on AQE will enable Spark to adapt the execution plans at runtime by using statistics, which will enhance data processing performance. Altogether, these improvements will guarantee that Spark remains a high-performance and scalable tool that will help organizations analyze large datasets.

c. Integration with the Emerging Data Sources

With the growth of the number of data sources and their types, Apache Spark will transform to process many new data types. This evolution will enhance the support for the streaming data originating from IoT devices that give real-time data that requires real-time analyses. Improved connectors and APIs shall improve data ingestion and processing in real-time, hence improving how quickly Spark pulls off high-velocity data (Dwivedi et al., 2023). In addition, the exact integration with the cloud will also be improved in Spark, where Cloud platforms will take charge of ample data storage and processing. This involves more robust integration with cloud-native storage, data warehousing, and analytics services from AWS, Azure, and Google Cloud. Also, Spark will leverage other types of databases, such as NoSQL, graph, and blockchain databases, to enable the user to conduct analytics on different types and structures of data. Thus, Spark will allow organizations to offer the maximum value from the information they deal with, regardless of its source and form, providing more comprehensive and timely information.

d. Cloud-Native Features

Since cloud computing is becoming famous, Apache Spark is also building inherent compatibility for cloud-based environments that makes its use in cloud environments easier. The updates focusing on the cloud surroundings are the Auto-Scaling Services for the provisioning and configuring tools that simplify the deployment of Spark Clusters on cloud solutions (Simplilearn, 2024). These tools will allow integration with cloud-native storage and compute resources and allow users to grow their workloads on the cloud. New possibilities in resource management will enable the user to control and allocate cloud resources more effectively according to their load, releasing resources in case of low utilization and adapting costs and performance characteristics in this way. Spark will also continue to provide more backing to serverless computing frameworks, enabling users to execute Spark applications without handling the underlying infrastructure. This serverless approach will allow for automatic scaling, high availability, and cost optimization since users only pay for the time the computing resources are used. Improved support for Kubernetes, one of the most popular container orchestration systems, will strengthen Spark's cloud-native features and improve container management, orchestration, and integration with other cloud-native services (Dwivedi et al., 2023). These enhancements will help to make Spark more usable and cost-effective for organizations that are using cloud infrastructure to support big data analytics while at the same time reducing the amount of overhead required to do so.

e. Broader Language Support

Apache Spark is expected to become even more flexible as the support for other programming languages is expected to be added to the current list of Scala, Java, Python, and R languages used in Spark development. Thus, by including languages like Julia, which is famous for its numerical and scientific computing performance, Spark can draw developers working in specific niches that demand high data processing (Simplilearn, 2024). Also, supporting languages like JavaScript could bring Spark to the large community of web developers, allowing them to perform big data analytics within a familiar environment. The new language persists in compatibility to integrate Spark's various software environments and processes that the developers deem essential. Besides, this inclusiveness increases the span of control, thereby making extensive data analysis more achievable, while the increased number of people involved in the Spark platform ideas fosters creativity as more people get a chance to participate as well as earn from the platform (Dwivedi et al., 2023). Thus, by making Spark more available and setting up the possibility to support more programming languages, it would be even more embedded into the vast data platform, and more people would come forward to develop the technology.

f. Cross-Platform and Multi-Cluster Operations

In the future, Apache Spark will experience significant developments aimed at enhancing the long-awaited cross-system interoperability and organizing several clusters or the cluster of one hybrid or multiple clouds in the future (Dwivedi et al., 2023). Such improvements will help organizations avoid having Spark workloads run on one platform or cloud vendor alone, making executing more complex and decentralized data processing tasks possible. The level of interoperability will be enhanced in a way that there will be data integration and data sharing between the on-premise solutions, private clouds and public clouds to enhance data consonance (Simplilearn, 2024). These developments will offer a real-time view of the cluster and resource consumption, which will help to mitigate the operational overhead of managing distributed systems. Also, strong security measures and compliance tools will guarantee data management and security in different regions and environments (Dwivedi et al., 2023). With cross-platform and multi-cluster capabilities, Spark will help organizations fully leverage their data architecture, allowing for more flexible, scalable, and fault-tolerant big data solutions that meet the organization's requirements and deployment topology.

g. More robust Growth of community and Ecosystem

Apache Spark's future is, therefore, closely linked with the health of the open-source ecosystem, which is central to the development of Apache Spark through contributions and innovations. In the future, as more developers, researchers, and organizations use Spark, we can expect to see the development of new libraries and tools that expand its application in different fields (Simplilearn, 2024). Community-driven projects may promote the creation of specific libraries for data analysis, machine learning, and other superior functions, making Spark even more versatile and efficient. These should provide new features and better performance, encourage best practice and comprehensive documentation and make the project approachable for new members if and when they are needed. The cooperation will also be healthy in developing new features for real-time processing and utilising other resources and compatibility with other technologies, as noted by Armbrust et al., 2015. The further development of the Ecosystem will entail more active and creative users who can test and improve the solutions quickly. This culture of continual improvement and expansion of new services will ensure that Spark continues to evolve; it will remain relevant today and in the future for big data analytics and will remain desirable for the market despite the dynamics of the technological landscape.

Conclusion

Despite significant progress, Apache Spark has numerous difficulties associated with big data and machine learning problems when using flexible and fault-tolerant structures: serialization, memory, and giant clusters. Nonetheless, there are a couple of factors that have currently impacted Spark. Nevertheless, the future of Spark is quite bright, with expectations of having better features in machine learning, better performance, integration with other data sources, and the development of new features in cloud computing. More comprehensive language support, single/multiple clusters, more cluster operations, and growth of the Spark community and Ecosystem will further enhance its importance in big data and AI platforms. Thus, overcoming these challenges and using future progress, Spark will go on to improve and offer improved and more efficient solutions in different activities related to data processing and analysis.

References

  1. Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., ... & Zaharia, M. (2015, May). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1383-1394).
  2. Dwivedi, Y. K., Sharma, A., Rana, N. P., Giannakis, M., Goel, P., & Dutot, V. (2023). Evolution of artificial intelligence research in Technological Forecasting and Social Change: Research topics, trends, and future directions. Technological Forecasting and Social Change, p. 192, 122579.
  3. Elshawi, R., Sakr, S., Talia, D., & Trunfio, P. (2018). Extensive data systems meet machine learning challenges: towards big data science as a service. Big data research, 14, 1-11.
  4. Ksolves Team (2022). Apache Spark Benefits: Why Enterprises are Moving To this Data Engineering Tool. Available at: https://www.ksolves.com/blog/big-data/spark/apache-spark-benefits-reasons-why-enterprises-are-moving-to-this-data-engineering-tool#:~:text=Apache%20Spark%20is%20rapidly%20adopted,machine%20learning%2C%20and%20fog%20computing.
  5. Nelamali, M. (2024). Different types of issues while running in the cluster. https://sparkbyexamples.com/spark/different-types-of-issues-while-running-spark-projects/
  6. Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., López García, Á., Heredia, I., ... & Hluchý, L. (2019). Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artificial Intelligence Review, 52, 77-124.
  7. Pointer. K. (2024). What is Apache Spark? The big data platform that crushed Hadoop. Available at: https://www.infoworld.com/article/2259224/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html#:~:text=Berkeley%20in%202009%2C%20Apache%20Spark,machine%20learning%2C%20and%20graph%20processing.
  8. Sewall, P., & Singh, H. (2021, October). A critical analysis of Apache Hadoop and Spark for big data processing. In 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC) (pp. 308–313). IEEE.
  9. Simplilearn (2024). The Evolutionary Path of Spark Technology: Lets Look Ahead! Available at: https://www.simplilearn.com/future-of-spark-article#:~:text=Here%20are%20some%20of%20the,out%2Dof%2Dmemory%20errors.
  10. Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering, 34(1), 71-91.
评论区

暂无评论,来发布第一条评论吧!

弦圈热门内容

弦圈更新日志:关于智力值和金币

今天我完善了弦圈的签到功能,并将其放置于首页。接着我新增了智力值和金币这两个用户特征,算是完成之前的计划 10月底至11月初,弦圈功能更新:上传附件。接着,我还顺带完善了一下附件上传功能。智力值其实就是用户的经验值,初始值是100,通过每天签到即可获得10点。而金币呢,看起来是付费的,但其实金币是免费获得的。我的设想是,每天签到获得智力值,而智力值可以存在小金库中产生金币,像钱存在银行中有利息一样,这就是所谓知识的力量😇!书中自有黄金屋!目前小金库还没来得急写,之后相关功能会陆续补上。并且以前的一些功能,还会进一步完善和优化,提高网站的使用体验。

Algebraic Topology I: 对教材跟概念的一些论述

关键词:Homotopy, Homology, Groupoid, Foundamental Group, Van Kampen Theorem, Covering Space, Covering Projection, Fibration with unique path lifting, Cofibration.Tammo tom Dieck 在他的代数拓扑教材中写了非常漂亮的前言,在点出代数拓扑精髓的同时还包含一些形而上学的哲思,并且简略地介绍了代数拓扑里面的两个核心词汇,同伦(homotopy) 跟同调 (homology)。我简要地部分翻译如下:代数拓扑是连续数学跟离散数学交相辉映的学科。在连续数学里面,我们用拓扑空间和连续映射这样普遍的形式语言将其公理化。而离散数学则是被我们用来表达代数和组合概念的。在数学语言中,我们用实数来概念化连续形式,但我们建立实数时却是要用到整数。下面举个例子,我们直觉地认为时间是一个连续的没有间断的流动过程,是由一系列不停止的瞬间后继构成的。但在实践中,我们却使用被定义为有周期性的离散模型工具跟自然过程。同样地,我们意识到空间是一个连续体,但我们 ...

记录一下:弦圈在知乎正当宣传遭遇被恶意举报?

记录一下昨天在知乎上遇到的离谱事情,我的一个回答无端端的被删除了,很有可能是因为推广网站导致得罪了某些人,从而举报我垃圾广告。当然也有朋友说,这其实就是知乎因为我引流所以封我,这确实不好说。最后申诉也没用,只能说这真的离谱到家了。我回答的提问是《有哪些网站比较有深度?》,正常理解这问题就是要你推荐网站的,那我推荐自己的网站,带上链接,多说几句介绍一下,不是很合理吗?我的回答可以说完全契合这个问题,甚至说该问题就是给我这种想要推广的人量身定做的。如果说我是因为在别的毫不相干的问题下,强行推广我的网站,那删我还情有可原。结果我发了那么多个回答,偏偏这个最不可能的。我想是不是因为那个提问是广告提问,回答是广告回答,所以我宣传了导致强了别人的风头。但我查了查问问题的人跟回答问题的不是同一个,而且网站名都似乎是大网站,还不至于这样,只能说遇到一些“不认同数学网站是有深度网站”的人吧😅以下是我当晚发在知乎的原文。这几天,我在知乎加大了弦圈 弦圈 - 找到属于你的圈子 (manitori.xyz) 的宣传力度,但也不是像生产电子垃圾那样胡乱安插广告。每个回答,我都认真看、认真写的,并且保证回答跟问题 ...

英语不好,读不懂英文数学教材怎么办?

问题:最近我得到一本英文 GTM1 的 PDF。起初我截图发到微信上,再通过机翻来阅读。后来觉得麻烦,就打印下来。结果它马上给我一个下马威。第三节开头给了一个定义,然后就出现了一个长达三行半的复杂句子,我辛辛苦苦把每个不认识的词都标出来,但是除了开头的「定义 3.1 是不完全的」,后面我就不知道它说的是什么了。而且我发现书里面有很多很多我不认识的词,一个一个查只怕一年也读不完。经常在知乎看到「数学书是所有英文教材里文字最好懂的」这样的评论,大概我的英语水平太差了吧。(我的英语水平:我现在初三,120分的试卷一般考110~112)所以现在我应该怎么办?怎样比较快速地提高英语水平使得我能够看懂数学书。(补充一句:我的数学水平对看书不是很成问题)我的回答:看不懂英文怎么办?那就老老实实遇到不懂的单词,就查一下什么意思,然后拿个笔记本记下来,这样还能方便偶尔复习巩固记忆。每次遇到不懂的单词,就这样操作,时间长了有感觉了,就可以不记笔记了,遇到不懂的查,脑子过一遍,继续看,代入到语境中去理解。你是初三,真巧我看人生中第一本数学英文教材的时候也是初三,当时刚刚中考完,我还依稀记得当时看的教材是泛函 ...

把加法与乘法结构拆掉再复原?望月新一如何引发代数几何变革

据《朝日新闻》,望月新一关于ABC猜想的论文可能将要发表,审核它的期刊是《数理解析研究所公刊》(PRIMS)。媒体对此的报道大抵聚焦在两点上:一是这个期刊就是他的工作单位主办的,一是这个论文几乎无人能懂。作为一个数学研究者,我个人并不担心望月新一的利益冲突问题,不但因为数学界有一套相当完备的系统用以避免利益冲突,在选定编辑和审稿人时有良好的避嫌标准,更重要的是:他没有动机。他已经功成名就,不需要什么文章。数学这种东西,对就对,错就错,不存在编数据或者实验造假,一切细节都在文章里。要是错了,无论强行发表在什么期刊上,也终有一天会被发现,而一发现就无可抵赖,只能重新修补。但是他的理论绝不仅仅是一个“几乎无人能懂”的怪物而已。它所试图解决的根本数学问题,它背后的当代数学界的面貌,它反映出的做数学研究是怎样的状态,这里面还有太多的故事并不是、也不应该是只有几个人能懂。甚至也许可以说,这些故事能让人直观地感受到:现代数学是什么。破题望月新一的研究领域,是所谓的“远阿贝尔几何学”。如果一句话解释这个领域的话,我只能这样写:有理数的绝对伽罗华群,以至任意代数簇的平展基本群,它们“远离阿贝尔”的部分, ...

评审8年终获发表,数学天才望月新一证明abc猜想,全球只有十几个数学家读懂但争议未消

abc猜想,数学界悬而未决的重要猜想,它的证明过程经过8年的同行评审,终于要在期刊上发表了。论文作者是日本的天才数学家望月新一,他33岁起就在京都大学担任数学教授。这一次望月新一的证明,全篇超过600页,2012年就已发表,但足足经过了8年的同行评审才通过,期间开过多次研讨会——但依然有很多数学家无法理解。据说,这篇论文全球只有十几位数学家深入研究了证明过程。许多数学家根本无法指出证明过程是对是错,因为根本看不懂。4月3日,日本京都大学召开了新闻发布会,宣布望月新一证明了它。包括Nature等在内的权威科学传媒组织,也这一重要进展进行了报道。望月新一没有出席昨天的发布会,他的另外两位同事说,当他知道自己的论文被接收,终于松了一口气。多年来他从未在公众场合露面。但也不是没有争议,因为当初接收论文的期刊——日本的PRIMS,主编正是望月新一本人。如果他的证明是正确的,那么将彻底改变数论。同时也正因为如此,才有了学界长达8年的争论。什么是abc猜想?abc猜想,最初由法国数学家约瑟夫·奥斯特莱和大卫·马瑟,在1985年提出。并且一经提出,abc猜想就成为数论领域的重要猜想之一。只是和哥德巴赫 ...

望月新一与他天书般的论文,展现了纯数学与我们的距离

导语:一位日本数学家声称已经解决了数学领域最重要的问题之一。但是,几乎无人能懂他的证明,无从判断对错。2012年8月30日的早晨,望月新一悄悄地在自己的网站上发布了4篇论文,总计长达500多页,密密麻麻地布满了各种符号。它们是作者孤独工作了十多年后的成果,可能会在学术界引起爆炸性的影响。在文中,望月新一声称解决了abc猜想——一个27年来在数论领域一直悬而未决的问题,令所有其他数学家都束手无策。如果望月新一的证明是正确的,它将是本世纪最令人震撼的数学成果之一,或将彻底改变整数方程的研究。David Parkins不过,望月新一本人并未对自己的证明大做文章。他任职于日本京都大学数理解析研究所(RIMS),是一位令人尊敬的数学家。他没有向全世界的同行宣布自己的研究成果,只是将论文发布在网上,等待世界去发现。第一个注意到他的论文的可能是玉川安骑男(Akio Tamagawa)——望月新一在RIMS的同事。和其他研究人员一样,玉川安骑男知道望月新一多年来一直在潜心钻研abc猜想,并且已近成功。当天,玉川安骑男通过电子邮件把这个消息发给了他的合作者之一、诺丁汉大学数论理论家Ivan Fesenk ...

我在哪可以找到一个数学笔友?

我的提问:我是一个高中生,热爱数学并读了很多数学书。我总是喜欢跟别人讨论一些数学问题。但不幸的,我的学校看起来没有任何人真心喜欢数学,他们只在乎考试,并且当我跟他们谈论数学时他们总是忽视我。这可能是因为他们不理解我所说的东西。因此我经常感到非常孤独,这很痛苦。有人能帮帮我吗?我想找一个笔友来互相交流。热心网友回答:一些主意:联系一个本地(或非本地)的大学。你可以通过几种方法做到这个。第一种是联系招生部门或外联部门。根据你生活在世界的哪个地方,大学可能有一个部门或两个部门都有。外联部门更好,因为他们的工作是让机构内的人与机构外的人建立联系。这可能是数学专业的学生,但也可能是教授。本地的学校最好,因为理论上你有时可以直接过去和人当面交谈,但如果你附近没有本地学校,那么你可以联系全球各地的其他学校。你是哪里的?联系你学校的数学老师。如果你认识某个你特别喜欢的老师,那么此人可能会是一个好的出发点。他们可能跟其他学校的人有联系,能帮你寻找志同道合的人,或者他们可能直接担任你的顾问。如果你在找学生而不是讲师,这可能是最好的方法。去reddit。Stack exchange是一个问答型网站,因此通常 ...